AWS Cost Optimization: $10K/Month to $2K/Month
A real case study with 10 specific actions, before/after numbers, and the exact playbook to cut your AWS bill by 80%.
The before/after Cost Explorer view that made our CFO smile
Where the Money Actually Goes
Before optimizing anything, you need to know where you're bleeding. Here's the typical cost breakdown for a SaaS startup running on AWS at the $5K-$15K/month range:
| Service | % of Bill | Typical Monthly | Optimization Potential |
|---|---|---|---|
| EC2 / ECS / EKS | 30-40% | $3,000-$4,000 | 40-60% |
| RDS / Aurora | 15-25% | $1,500-$2,500 | 30-50% |
| NAT Gateway | 10-15% | $1,000-$1,500 | 70-90% |
| S3 | 5-10% | $500-$1,000 | 30-50% |
| CloudWatch / Logs | 5-10% | $500-$1,000 | 50-80% |
| Data Transfer | 5-8% | $500-$800 | 30-50% |
| ElastiCache / OpenSearch | 5-8% | $500-$800 | 30-50% |
| Lambda / API Gateway | 2-5% | $200-$500 | 20-40% |
Compute: The Biggest Lever
Graviton - Free 20% Savings
Graviton3/4 (ARM) instances are 20% cheaper than equivalent x86 instances and deliver 25-40% better price-performance. Most workloads (Node.js, Python, Go, Java 17+, .NET 6+) run on Graviton with zero code changes. Only native C/C++ extensions compiled for x86 need recompilation.
| Instance | x86 Price | Graviton Price | Savings |
|---|---|---|---|
| m7i.xlarge vs m7g.xlarge | $0.2016/hr | $0.1632/hr | 19% |
| c7i.2xlarge vs c7g.2xlarge | $0.357/hr | $0.2894/hr | 19% |
| r7i.xlarge vs r7g.xlarge | $0.2520/hr | $0.2040/hr | 19% |
Savings Plans - Commit and Save 30-40%
Compute Savings Plans are the best deal in AWS. Commit to a $/hour spend for 1 or 3 years. Applies automatically to EC2, Fargate, and Lambda across all regions and instance families. No lock-in to specific instance types.
Spot Instances - 60-90% Off for Fault-Tolerant Work
Spot instances are spare EC2 capacity at 60-90% discount. Use them for: batch processing, CI/CD runners, dev/staging environments, stateless workers behind a queue. Never use Spot for databases, single-instance apps, or anything that can't handle a 2-minute interruption notice.
Database: Stop Overpaying for RDS
Aurora I/O-Optimized
If your Aurora I/O costs exceed 25% of your total Aurora bill, switch to I/O-Optimized. It eliminates per-I/O charges for a 30% higher instance price. For write-heavy workloads, savings range from 30-60%. It's a one-click change with no downtime.
RDS Reserved Instances
RDS Reserved Instances save 30-60% over on-demand. Unlike Compute Savings Plans, RDS RIs are locked to a specific instance type and region. Buy them only for your production database that you know won't change size for 12 months.
Storage: Death by a Thousand GBs
S3 Intelligent-Tiering
S3 Intelligent-Tiering automatically moves objects between access tiers based on usage patterns. No retrieval fees, no lifecycle policy management. It costs $0.0025/1K objects/month for monitoring - worth it for any bucket over 1GB with mixed access patterns. Savings: 30-70% on storage costs.
gp3 Volumes - Free Upgrade from gp2
gp3 is 20% cheaper than gp2 and delivers 3,000 IOPS baseline (vs. gp2's size-dependent IOPS). Every gp2 volume should be converted to gp3. It's a live migration with no downtime. For a 500GB volume: gp2 = $50/month, gp3 = $40/month.
The NAT Gateway Tax
NAT Gateway charges $0.045/GB for data processing plus $0.045/hr ($32.40/month) per gateway. A moderately busy app pulling Docker images, calling external APIs, and syncing data can easily spend $500-$1,500/month on NAT alone. Multi-AZ deployments double this - you need one NAT Gateway per AZ.
Diagnosing NAT Spend
The hardest part is figuring out what is sending traffic through NAT. Use VPC Flow Logs filtered to the NAT Gateway's ENI to identify the top talkers:
# Enable VPC Flow Logs to CloudWatch (if not already)
aws ec2 create-flow-logs \
--resource-type VPC --resource-ids vpc-abc123 \
--traffic-type ALL --log-destination-type cloud-watch-logs \
--log-group-name /vpc/flow-logs
# Query top destinations through NAT (CloudWatch Logs Insights)
# filter @logStream like "eni-NAT_GATEWAY_ENI"
# | stats sum(bytes) as totalBytes by dstAddr
# | sort totalBytes desc | limit 20
How to Slash NAT Costs
- VPC Endpoints - S3 and DynamoDB Gateway Endpoints are free. Interface Endpoints for ECR, CloudWatch, STS, etc. cost $7.20/month each but eliminate NAT charges for those services.
- ECR pull-through cache - Cache Docker images in your VPC to avoid repeated pulls through NAT.
- Move to public subnets where possible - services that need internet access can use an Internet Gateway (free) instead of NAT.
- NAT Instance - A t4g.nano ($3/month) can replace a NAT Gateway for dev/staging environments.
EBS Snapshots - The Hidden Hoarder
Old EBS snapshots accumulate silently. Each snapshot is incremental, but abandoned snapshots from deleted volumes still cost $0.05/GB/month. A team that creates daily snapshots and never cleans up can easily accumulate $200-$500/month in orphaned snapshots.
# Find snapshots older than 90 days
aws ec2 describe-snapshots --owner-ids self \
--query "Snapshots[?StartTime<='2026-01-19'].{ID:SnapshotId,Size:VolumeSize,Date:StartTime}" \
--output table
# Delete orphaned snapshots (no associated volume)
aws ec2 describe-snapshots --owner-ids self \
--query "Snapshots[?!VolumeId].SnapshotId" --output text | \
xargs -I {} aws ec2 delete-snapshot --snapshot-id {}
CloudWatch Logs - The $800/Month Surprise
CloudWatch Logs ingestion costs $0.50/GB. A verbose Node.js app logging every request at INFO level can generate 50GB/month = $25 ingestion + $750 storage if you never set retention. Fix: set retention to 7 days, ship to S3 for long-term storage ($0.023/GB), and reduce log verbosity in production.
Serverless Economics
Serverless (Lambda + API Gateway) is cheaper at low traffic and more expensive at high traffic. The crossover point:
| Requests/Month | Lambda + APIGW | Fargate (0.25 vCPU) | Winner |
|---|---|---|---|
| 100K | $1.20 | $9.50 | Lambda |
| 1M | $12 | $9.50 | Fargate |
| 10M | $120 | $9.50 | Fargate |
| 100M | $1,200 | $38 (1 vCPU) | Fargate |
Lambda wins below ~500K requests/month. Above that, containers on Fargate or EC2 are dramatically cheaper. The exception: bursty, unpredictable workloads where you'd need to over-provision containers.
API Gateway - The Other Hidden Tax
API Gateway REST APIs cost $3.50/million requests. At 10M requests/month, that's $35 - but HTTP APIs cost $1.00/million. If you're using REST API features you don't need (request validation, WAF integration, usage plans), switch to HTTP APIs for a 71% cost reduction. For internal service-to-service calls, skip API Gateway entirely and invoke Lambda directly or use ALB ($0.008/LCU-hour).
Architectures: [arm64] to your SAM/CloudFormation template. Most runtimes (Node.js, Python, Java, .NET) work without changes.
Tools for Cost Visibility
- AWS Cost Explorer - Free, built-in. Group by service, tag, linked account. Set up daily/weekly email reports. The "Rightsizing Recommendations" tab alone is worth checking monthly.
- AWS Budgets - Free alerts when spend exceeds thresholds. Set one for 80% and 100% of your target. Add a forecast budget too - it alerts when projected spend will exceed your target.
- AWS Compute Optimizer - Free. Analyzes your EC2, EBS, Lambda, and ECS usage and recommends right-sized alternatives. Check it monthly.
- Vantage - Third-party tool with better visualization, per-resource cost attribution, and Kubernetes cost allocation. Free tier covers most startups. The "Cost Reports" feature is what Cost Explorer should be.
- Infracost - Cost estimates in your Terraform PRs. See the cost impact before you deploy. Integrates with GitHub Actions in 5 minutes.
- AWS Trusted Advisor - Flags idle resources, underutilized instances, and missing reservations. Business/Enterprise support unlocks all checks; free tier gets basic checks.
- Cost Anomaly Detection - Free ML-based service that alerts on unusual spend patterns. Set it up once and forget it - it catches runaway resources before they become $1K surprises.
The $10K → $2K Case Study
Real numbers from a B2B SaaS app: ~50K MAU, PostgreSQL database, Node.js API on ECS, React frontend on S3/CloudFront, background workers for email and data processing.
| # | Action | Before | After | Savings |
|---|---|---|---|---|
| 1 | Switch ECS to Graviton (m7g) | $2,400 | $1,920 | $480 |
| 2 | 1-yr Compute Savings Plan | $1,920 | $1,344 | $576 |
| 3 | Right-size RDS (db.r6g.xl → db.r7g.large) | $1,800 | $900 | $900 |
| 4 | RDS 1-yr Reserved Instance | $900 | $585 | $315 |
| 5 | Switch Aurora to I/O-Optimized | $600 (I/O fees) | $0 | $600 |
| 6 | VPC Endpoints (S3, ECR, CloudWatch) | $1,200 (NAT) | $180 | $1,020 |
| 7 | S3 Intelligent-Tiering | $450 | $180 | $270 |
| 8 | gp2 → gp3 all volumes | $320 | $256 | $64 |
| 9 | CloudWatch log retention 30d → 7d + S3 archive | $800 | $120 | $680 |
| 10 | Spot for CI/CD + staging | $1,100 | $220 | $880 |
| Category | Before | After |
|---|---|---|
| Compute | $3,500 | $1,344 |
| Database | $3,300 | $1,485 |
| Networking | $1,200 | $180 |
| Storage | $770 | $436 |
| Logging / Monitoring | $800 | $120 |
| CI/CD + Staging | $1,100 | $220 |
| Total | $10,670 | $3,785 |
The Optimization Checklist
🔥 Quick Wins (Day 1 - no risk)
- Convert all gp2 → gp3 - zero downtime, instant 20% savings on EBS
- Enable S3 Intelligent-Tiering - set as default for all new buckets
- Set CloudWatch log retention - 7 days for dev, 30 days for prod
- Delete orphaned EBS snapshots - run the audit script above
- Delete unattached EBS volumes - Trusted Advisor flags these
- Set up AWS Budgets alerts - 80% and 100% of target spend
📈 Medium Effort (Week 1 - test in staging first)
- Switch to Graviton instances - test your app on ARM, then migrate
- Add VPC Endpoints - S3 Gateway (free), ECR, CloudWatch, STS
- Right-size RDS - check CPU/memory CloudWatch metrics for 2 weeks
- Right-size EC2/ECS - use Compute Optimizer recommendations
- Enable Aurora I/O-Optimized - if I/O > 25% of Aurora bill
🎯 Strategic (Month 1 - requires planning)
- Buy Compute Savings Plan - 1-year No Upfront for baseline
- Buy RDS Reserved Instance - for production database only
- Move CI/CD to Spot - GitHub Actions self-hosted runners on Spot
- Evaluate serverless vs. containers - based on your traffic patterns
- Implement cost tagging - tag by team, environment, feature for attribution
aws ce get-cost-and-usage grouped by service for the last 3 months. Export to a spreadsheet. Sort by cost descending. The top 5 line items are where 80% of your savings will come from.
The Bottom Line
Most AWS bills are 40-70% waste. The biggest levers are: right-sizing (stop paying for CPU you don't use), commitments (Savings Plans and RIs), NAT elimination (VPC Endpoints), and storage tiering. Start with Cost Explorer, identify your top 3 cost drivers, and attack them in order. You don't need a FinOps team - you need one engineer with a Cost Explorer tab open for a week.