AWS Bedrock for SaaS - Monetize AI, Pass Costs Downstream & Profit
The complete playbook for building profitable AI products on Bedrock. Pricing tables, cost passthrough models, margin math, metering architecture, and code.
Bedrock gives you 50+ models from $0.035/MTok - the margin opportunity is in how you package and sell access
Bedrock Pricing - Your Cost of Goods Sold
Bedrock charges per token consumed - API calls themselves are free. Your COGS is the raw token cost. Here are the models that matter for SaaS:
Production-Ready Models (Best Quality/Cost Ratio)
| Model | Input $/MTok | Output $/MTok | Context | Best For |
|---|---|---|---|---|
| Nova Micro | $0.035 | $0.14 | 128K | Classification, extraction, simple tasks |
| Nova Lite | $0.06 | $0.24 | 300K | Summarization, Q&A, multimodal |
| Nova Pro | $0.80 | $3.20 | 300K | Complex reasoning, code, analysis |
| Claude Haiku 3 | $0.25 | $1.25 | 200K | Fast, cheap, good quality |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 200K | Premium tier - best overall quality |
| Llama 4 Scout | $0.27 | $0.36 | 10M | Long context, open-weight |
| Llama 3.1 8B | $0.30 | $0.60 | 128K | High-volume simple tasks |
| Mistral Large 3 | $0.50 | $1.50 | 128K | Balanced quality/cost |
Cost Reduction Levers
| Feature | Savings | How |
|---|---|---|
| Prompt Caching | Up to 90% | Cache reads cost 10% of input price. 25% premium on writes. |
| Batch Inference | 50% | Results within 24 hours. Great for async processing. |
| Flex Tier | 50% | Trades immediate processing for cost efficiency. |
| Intelligent Routing | Up to 30% | $1/1K requests. Auto-routes between models in same family. |
| Model Distillation | Up to 75% | Train smaller model on larger model's outputs. |
Monetization Models
1. Tiered Subscription + Usage Caps
The most common model. Predictable revenue with built-in upsell:
| Tier | Price | Included | Model Access | Your COGS |
|---|---|---|---|---|
| Free | $0 | 50 requests/day | Nova Micro only | ~$0.05/user/mo |
| Pro | $29/mo | 1,000 requests/day | Nova Pro + Haiku | ~$3-5/user/mo |
| Business | $99/mo | 5,000 requests/day | All models incl. Sonnet | ~$12-20/user/mo |
| Enterprise | Custom | Unlimited + SLA | All models + fine-tuned | Variable |
2. Credit System
Sell credit packs that map to underlying token consumption. This is how Jasper, Copy.ai, and Writesonic work:
Credit Pack Pricing:
100 credits = $10 ($0.10/credit)
500 credits = $39 ($0.078/credit)
2,000 credits = $99 ($0.050/credit)
Credit Consumption (hidden from user):
Simple task (Nova Micro, ~500 tokens) = 1 credit β costs you $0.00003
Standard task (Nova Pro, ~1K tokens) = 5 credits β costs you $0.004
Premium task (Sonnet, ~1K tokens) = 20 credits β costs you $0.018
At $0.10/credit, a "Premium task" earns $2.00 on $0.018 COGS = 99.1% margin
3. Usage-Based (Per-Unit Pricing)
Charge per document, per conversation, per analysis - abstract away tokens entirely:
| What You Charge | Example Price | Your Bedrock COGS | Gross Margin |
|---|---|---|---|
| Per document analyzed | $0.10 | ~$0.01-0.03 | 70-90% |
| Per conversation | $0.05 | ~$0.005-0.02 | 60-90% |
| Per image generated | $0.25 | ~$0.04-0.08 | 68-84% |
| Per 1K API calls | $5.00 | ~$0.50-1.50 | 70-90% |
4. Markup Model (API Resale)
Simplest model - buy tokens wholesale from Bedrock, sell at 3-10Γ markup through your API:
Your cost: Claude Sonnet 4.6 = $3.00 input / $15.00 output per MTok
You charge: $15.00 input / $60.00 output per MTok (5Γ markup)
Margin: 80%
With routing (70% to Haiku at $0.25/$1.25):
Blended cost: ~$1.08/$5.38 per MTok
You charge: $10.00/$40.00 per MTok
Margin: 89%
Margin Math - Real Numbers
Scenario: AI Document Processing SaaS
Assumptions:
- 10,000 paying users at $49/mo (Pro tier)
- Each user processes ~200 documents/month
- Each document: ~2K input tokens + ~500 output tokens
- Model: Nova Pro ($0.80/$3.20 per MTok) for 70%, Sonnet ($3/$15) for 30%
Monthly Revenue:
10,000 Γ $49 = $490,000
Monthly Bedrock COGS:
Total requests: 10,000 Γ 200 = 2,000,000 documents
Nova Pro (70% = 1,400,000 docs):
Input: 1.4M Γ 2K tokens = 2.8B tokens Γ $0.80/1M = $2,240
Output: 1.4M Γ 500 tokens = 700M tokens Γ $3.20/1M = $2,240
Subtotal: $4,480
Sonnet (30% = 600,000 docs):
Input: 600K Γ 2K tokens = 1.2B tokens Γ $3.00/1M = $3,600
Output: 600K Γ 500 tokens = 300M tokens Γ $15.00/1M = $4,500
Subtotal: $8,100
Total Bedrock: $12,580/month
Other COGS:
AWS infra (API GW, Lambda, DynamoDB, S3): ~$2,000
Stripe fees (2.9%): ~$14,210
Support/ops: ~$5,000
Total COGS: ~$33,790
Gross Profit: $490,000 - $33,790 = $456,210
Gross Margin: 93.1%
Bedrock as % of revenue: 2.6%
Margin Sensitivity by Model Choice
| Strategy | Bedrock COGS/mo | % of Revenue | Gross Margin |
|---|---|---|---|
| All Sonnet (no routing) | $30,000 | 6.1% | 87% |
| 70/30 routing (Nova Pro + Sonnet) | $12,580 | 2.6% | 93% |
| 90/10 routing (Nova Micro + Sonnet) | $3,400 | 0.7% | 96% |
| All Nova Micro | $420 | 0.09% | 97% |
Metering Architecture
The Stack: API Gateway β Lambda β Bedrock + Metering
βββββββββββββββ ββββββββββββββββ βββββββββββββββ
β Client ββββββΆβ API Gateway ββββββΆβ Lambda β
β (Your App) β β + API Key β β (Router) β
βββββββββββββββ ββββββββββββββββ ββββββββ¬βββββββ
β
ββββββββββββββββββββββββββββββΌβββββββββββββββββ
β β β
βββββββΌββββββ ββββββββββΌβββββββββ βββββΌββββ
β Bedrock β β DynamoDB β β SQS β
β (Invoke) β β (Usage Meter) β β(Async)β
ββββββββββββββ βββββββββββββββββββ βββββββββ
Token Metering Middleware (Python)
import boto3, json, time
from decimal import Decimal
bedrock = boto3.client("bedrock-runtime")
dynamodb = boto3.resource("dynamodb")
usage_table = dynamodb.Table("usage-meters")
# Model cost lookup (per 1M tokens)
MODEL_COSTS = {
"amazon.nova-micro-v1:0": {"input": 0.035, "output": 0.14},
"amazon.nova-pro-v1:0": {"input": 0.80, "output": 3.20},
"anthropic.claude-3-haiku": {"input": 0.25, "output": 1.25},
"anthropic.claude-sonnet-4": {"input": 3.00, "output": 15.00},
}
def invoke_and_meter(customer_id: str, model_id: str,
messages: list, system: str = "") -> dict:
"""Call Bedrock, count tokens, record usage."""
body = {"messages": messages, "max_tokens": 1024}
if system:
body["system"] = system
resp = bedrock.invoke_model(
modelId=model_id,
body=json.dumps(body),
contentType="application/json",
)
result = json.loads(resp["body"].read())
# Extract token counts from response metadata
usage = result.get("usage", {})
input_tokens = usage.get("input_tokens", 0)
output_tokens = usage.get("output_tokens", 0)
# Calculate cost
costs = MODEL_COSTS[model_id]
cost = (input_tokens * costs["input"] / 1_000_000 +
output_tokens * costs["output"] / 1_000_000)
# Record usage in DynamoDB
month_key = time.strftime("%Y-%m")
usage_table.update_item(
Key={"customer_id": customer_id, "month": month_key},
UpdateExpression="""
ADD input_tokens :inp, output_tokens :out,
total_cost :cost, request_count :one
""",
ExpressionAttributeValues={
":inp": input_tokens,
":out": output_tokens,
":cost": Decimal(str(round(cost, 6))),
":one": 1,
},
)
return result
Smart Model Router
def route_request(messages: list, tier: str = "pro") -> str:
"""Route to cheapest model that can handle the task."""
# Estimate complexity from message length
total_chars = sum(len(m["content"]) for m in messages)
if tier == "free":
return "amazon.nova-micro-v1:0"
if total_chars < 500: # Simple task
return "amazon.nova-micro-v1:0" # $0.035/$0.14
elif total_chars < 2000: # Medium task
return "anthropic.claude-3-haiku" # $0.25/$1.25
elif tier == "business": # Complex + premium tier
return "anthropic.claude-sonnet-4" # $3/$15
else:
return "amazon.nova-pro-v1:0" # $0.80/$3.20
Cutting Your COGS
- Model routing - Route 70-90% of requests to Nova Micro/Haiku. Only escalate to Sonnet/Opus for complex tasks. Saves 50-90%.
- Prompt caching - Cache system prompts and few-shot examples. Reads cost 10% of input price. Saves 50-90% on repeated context.
- Batch inference - For async features (reports, bulk analysis), use batch API for 50% discount.
- Flex tier - For latency-tolerant features, Flex tier is 50% off standard pricing.
- Intelligent Prompt Routing - $1/1K requests. Bedrock auto-routes between models in the same family. Up to 30% savings.
- S3 Vectors for RAG - Up to 90% cheaper than OpenSearch Serverless for Knowledge Bases.
- Output token limits - Set
max_tokensappropriately. Don't let Sonnet generate 4K tokens when 500 suffice.
AWS Marketplace Distribution
Listing on AWS Marketplace gives you access to enterprise budgets that are otherwise locked behind procurement:
| Benefit | Why It Matters |
|---|---|
| Consolidated billing | Customers pay through their existing AWS bill. No new vendor approval. |
| EDP burn-down | Enterprises with committed AWS spend can use it on your product. "It's already budgeted." |
| AWS co-sell | AWS sales reps get credit for Marketplace transactions - they'll recommend you. |
| 310K+ customers | Built-in distribution channel. |
Marketplace Fees
| Annual Revenue | AWS Fee |
|---|---|
| First $1M | 5% |
| $1M - $10M | 4% |
| $10M+ | 3% |
Best listing model for AI SaaS: SaaS Subscription with Usage - base subscription + metered overage. Report usage via the Metering API hourly.
The Complete Playbook
| Stage | Model | Why |
|---|---|---|
| Pre-PMF (0-100 users) | Simple markup or free tier | Validate demand. Don't optimize yet. |
| Early (100-1K) | Tiered subscription + usage caps | Predictable revenue. Protect margins. |
| Growth (1K-10K) | Subscription + credits/overage | Capture expansion revenue. |
| Scale (10K+) | Full usage-based + Marketplace | Maximize per-customer revenue. Enterprise access. |
The Bottom Line
Bedrock inference is cheap - Nova Micro at $0.035/MTok means a typical request costs fractions of a penny. The margin opportunity is enormous: well-architected SaaS products achieve 90%+ gross margins on AI features. The key is model routing (don't use Sonnet for everything), prompt caching (don't re-send the same context), and abstracting tokens into units your customers understand (documents, conversations, credits - not tokens). List on AWS Marketplace for enterprise distribution, run Stripe for self-serve, and target 70-80% blended gross margin.