Skip to content
AWS cost optimization dashboard showing dramatic spend reduction

The before/after Cost Explorer view that made our CFO smile

Last updated: April 2026 - Covers Graviton4, Aurora I/O-Optimized, gp3 volumes, S3 Intelligent-Tiering, and current Savings Plan pricing.

Where the Money Actually Goes

Before optimizing anything, you need to know where you're bleeding. Here's the typical cost breakdown for a SaaS startup running on AWS at the $5K-$15K/month range:

Service% of BillTypical MonthlyOptimization Potential
EC2 / ECS / EKS30-40%$3,000-$4,00040-60%
RDS / Aurora15-25%$1,500-$2,50030-50%
NAT Gateway10-15%$1,000-$1,50070-90%
S35-10%$500-$1,00030-50%
CloudWatch / Logs5-10%$500-$1,00050-80%
Data Transfer5-8%$500-$80030-50%
ElastiCache / OpenSearch5-8%$500-$80030-50%
Lambda / API Gateway2-5%$200-$50020-40%

Compute: The Biggest Lever

Graviton - Free 20% Savings

Graviton3/4 (ARM) instances are 20% cheaper than equivalent x86 instances and deliver 25-40% better price-performance. Most workloads (Node.js, Python, Go, Java 17+, .NET 6+) run on Graviton with zero code changes. Only native C/C++ extensions compiled for x86 need recompilation.

Instancex86 PriceGraviton PriceSavings
m7i.xlarge vs m7g.xlarge$0.2016/hr$0.1632/hr19%
c7i.2xlarge vs c7g.2xlarge$0.357/hr$0.2894/hr19%
r7i.xlarge vs r7g.xlarge$0.2520/hr$0.2040/hr19%

Savings Plans - Commit and Save 30-40%

Compute Savings Plans are the best deal in AWS. Commit to a $/hour spend for 1 or 3 years. Applies automatically to EC2, Fargate, and Lambda across all regions and instance families. No lock-in to specific instance types.

Start with 1-year No Upfront. It gives you 30% savings with zero risk. Only move to 3-year All Upfront (60% savings) once you have 6+ months of stable baseline usage data.

Spot Instances - 60-90% Off for Fault-Tolerant Work

Spot instances are spare EC2 capacity at 60-90% discount. Use them for: batch processing, CI/CD runners, dev/staging environments, stateless workers behind a queue. Never use Spot for databases, single-instance apps, or anything that can't handle a 2-minute interruption notice.

Database: Stop Overpaying for RDS

Aurora I/O-Optimized

If your Aurora I/O costs exceed 25% of your total Aurora bill, switch to I/O-Optimized. It eliminates per-I/O charges for a 30% higher instance price. For write-heavy workloads, savings range from 30-60%. It's a one-click change with no downtime.

RDS Reserved Instances

RDS Reserved Instances save 30-60% over on-demand. Unlike Compute Savings Plans, RDS RIs are locked to a specific instance type and region. Buy them only for your production database that you know won't change size for 12 months.

Right-size first, reserve second. Check your RDS CloudWatch metrics. If CPU averages <20% and memory usage is <50%, you're over-provisioned. Downsize, then buy the RI for the smaller instance.

Storage: Death by a Thousand GBs

S3 Intelligent-Tiering

S3 Intelligent-Tiering automatically moves objects between access tiers based on usage patterns. No retrieval fees, no lifecycle policy management. It costs $0.0025/1K objects/month for monitoring - worth it for any bucket over 1GB with mixed access patterns. Savings: 30-70% on storage costs.

gp3 Volumes - Free Upgrade from gp2

gp3 is 20% cheaper than gp2 and delivers 3,000 IOPS baseline (vs. gp2's size-dependent IOPS). Every gp2 volume should be converted to gp3. It's a live migration with no downtime. For a 500GB volume: gp2 = $50/month, gp3 = $40/month.

The NAT Gateway Tax

NAT Gateway charges $0.045/GB for data processing plus $0.045/hr ($32.40/month) per gateway. A moderately busy app pulling Docker images, calling external APIs, and syncing data can easily spend $500-$1,500/month on NAT alone. Multi-AZ deployments double this - you need one NAT Gateway per AZ.

Diagnosing NAT Spend

The hardest part is figuring out what is sending traffic through NAT. Use VPC Flow Logs filtered to the NAT Gateway's ENI to identify the top talkers:

# Enable VPC Flow Logs to CloudWatch (if not already)
aws ec2 create-flow-logs \
  --resource-type VPC --resource-ids vpc-abc123 \
  --traffic-type ALL --log-destination-type cloud-watch-logs \
  --log-group-name /vpc/flow-logs

# Query top destinations through NAT (CloudWatch Logs Insights)
# filter @logStream like "eni-NAT_GATEWAY_ENI"
# | stats sum(bytes) as totalBytes by dstAddr
# | sort totalBytes desc | limit 20

How to Slash NAT Costs

  • VPC Endpoints - S3 and DynamoDB Gateway Endpoints are free. Interface Endpoints for ECR, CloudWatch, STS, etc. cost $7.20/month each but eliminate NAT charges for those services.
  • ECR pull-through cache - Cache Docker images in your VPC to avoid repeated pulls through NAT.
  • Move to public subnets where possible - services that need internet access can use an Internet Gateway (free) instead of NAT.
  • NAT Instance - A t4g.nano ($3/month) can replace a NAT Gateway for dev/staging environments.

EBS Snapshots - The Hidden Hoarder

Old EBS snapshots accumulate silently. Each snapshot is incremental, but abandoned snapshots from deleted volumes still cost $0.05/GB/month. A team that creates daily snapshots and never cleans up can easily accumulate $200-$500/month in orphaned snapshots.

# Find snapshots older than 90 days
aws ec2 describe-snapshots --owner-ids self \
  --query "Snapshots[?StartTime<='2026-01-19'].{ID:SnapshotId,Size:VolumeSize,Date:StartTime}" \
  --output table

# Delete orphaned snapshots (no associated volume)
aws ec2 describe-snapshots --owner-ids self \
  --query "Snapshots[?!VolumeId].SnapshotId" --output text | \
  xargs -I {} aws ec2 delete-snapshot --snapshot-id {}

CloudWatch Logs - The $800/Month Surprise

CloudWatch Logs ingestion costs $0.50/GB. A verbose Node.js app logging every request at INFO level can generate 50GB/month = $25 ingestion + $750 storage if you never set retention. Fix: set retention to 7 days, ship to S3 for long-term storage ($0.023/GB), and reduce log verbosity in production.

Serverless Economics

Serverless (Lambda + API Gateway) is cheaper at low traffic and more expensive at high traffic. The crossover point:

Requests/MonthLambda + APIGWFargate (0.25 vCPU)Winner
100K$1.20$9.50Lambda
1M$12$9.50Fargate
10M$120$9.50Fargate
100M$1,200$38 (1 vCPU)Fargate

Lambda wins below ~500K requests/month. Above that, containers on Fargate or EC2 are dramatically cheaper. The exception: bursty, unpredictable workloads where you'd need to over-provision containers.

API Gateway - The Other Hidden Tax

API Gateway REST APIs cost $3.50/million requests. At 10M requests/month, that's $35 - but HTTP APIs cost $1.00/million. If you're using REST API features you don't need (request validation, WAF integration, usage plans), switch to HTTP APIs for a 71% cost reduction. For internal service-to-service calls, skip API Gateway entirely and invoke Lambda directly or use ALB ($0.008/LCU-hour).

Lambda ARM (Graviton): Lambda functions on ARM are 20% cheaper and run 10-34% faster. Add Architectures: [arm64] to your SAM/CloudFormation template. Most runtimes (Node.js, Python, Java, .NET) work without changes.

Tools for Cost Visibility

  • AWS Cost Explorer - Free, built-in. Group by service, tag, linked account. Set up daily/weekly email reports. The "Rightsizing Recommendations" tab alone is worth checking monthly.
  • AWS Budgets - Free alerts when spend exceeds thresholds. Set one for 80% and 100% of your target. Add a forecast budget too - it alerts when projected spend will exceed your target.
  • AWS Compute Optimizer - Free. Analyzes your EC2, EBS, Lambda, and ECS usage and recommends right-sized alternatives. Check it monthly.
  • Vantage - Third-party tool with better visualization, per-resource cost attribution, and Kubernetes cost allocation. Free tier covers most startups. The "Cost Reports" feature is what Cost Explorer should be.
  • Infracost - Cost estimates in your Terraform PRs. See the cost impact before you deploy. Integrates with GitHub Actions in 5 minutes.
  • AWS Trusted Advisor - Flags idle resources, underutilized instances, and missing reservations. Business/Enterprise support unlocks all checks; free tier gets basic checks.
  • Cost Anomaly Detection - Free ML-based service that alerts on unusual spend patterns. Set it up once and forget it - it catches runaway resources before they become $1K surprises.

The $10K → $2K Case Study

Real numbers from a B2B SaaS app: ~50K MAU, PostgreSQL database, Node.js API on ECS, React frontend on S3/CloudFront, background workers for email and data processing.

#ActionBeforeAfterSavings
1Switch ECS to Graviton (m7g)$2,400$1,920$480
21-yr Compute Savings Plan$1,920$1,344$576
3Right-size RDS (db.r6g.xl → db.r7g.large)$1,800$900$900
4RDS 1-yr Reserved Instance$900$585$315
5Switch Aurora to I/O-Optimized$600 (I/O fees)$0$600
6VPC Endpoints (S3, ECR, CloudWatch)$1,200 (NAT)$180$1,020
7S3 Intelligent-Tiering$450$180$270
8gp2 → gp3 all volumes$320$256$64
9CloudWatch log retention 30d → 7d + S3 archive$800$120$680
10Spot for CI/CD + staging$1,100$220$880
CategoryBeforeAfter
Compute$3,500$1,344
Database$3,300$1,485
Networking$1,200$180
Storage$770$436
Logging / Monitoring$800$120
CI/CD + Staging$1,100$220
Total$10,670$3,785
Timeline: Actions 1, 7, 8 took one afternoon. Actions 3, 6, 9 took a week of testing. Actions 2, 4 required finance approval. Action 10 required refactoring CI pipelines. Total: ~3 weeks of part-time work.

The Optimization Checklist

🔥 Quick Wins (Day 1 - no risk)

  1. Convert all gp2 → gp3 - zero downtime, instant 20% savings on EBS
  2. Enable S3 Intelligent-Tiering - set as default for all new buckets
  3. Set CloudWatch log retention - 7 days for dev, 30 days for prod
  4. Delete orphaned EBS snapshots - run the audit script above
  5. Delete unattached EBS volumes - Trusted Advisor flags these
  6. Set up AWS Budgets alerts - 80% and 100% of target spend

📈 Medium Effort (Week 1 - test in staging first)

  1. Switch to Graviton instances - test your app on ARM, then migrate
  2. Add VPC Endpoints - S3 Gateway (free), ECR, CloudWatch, STS
  3. Right-size RDS - check CPU/memory CloudWatch metrics for 2 weeks
  4. Right-size EC2/ECS - use Compute Optimizer recommendations
  5. Enable Aurora I/O-Optimized - if I/O > 25% of Aurora bill

🎯 Strategic (Month 1 - requires planning)

  1. Buy Compute Savings Plan - 1-year No Upfront for baseline
  2. Buy RDS Reserved Instance - for production database only
  3. Move CI/CD to Spot - GitHub Actions self-hosted runners on Spot
  4. Evaluate serverless vs. containers - based on your traffic patterns
  5. Implement cost tagging - tag by team, environment, feature for attribution
Pro tip: Run aws ce get-cost-and-usage grouped by service for the last 3 months. Export to a spreadsheet. Sort by cost descending. The top 5 line items are where 80% of your savings will come from.

The Bottom Line

Most AWS bills are 40-70% waste. The biggest levers are: right-sizing (stop paying for CPU you don't use), commitments (Savings Plans and RIs), NAT elimination (VPC Endpoints), and storage tiering. Start with Cost Explorer, identify your top 3 cost drivers, and attack them in order. You don't need a FinOps team - you need one engineer with a Cost Explorer tab open for a week.