How much does AWS Bedrock cost per API call?

API calls themselves are free - you only pay for tokens consumed. Prices range from $0.035/MTok (Nova Micro) to $15/MTok (Claude 3 Opus). For most SaaS products, Claude Sonnet 4.6 ($3/$15 per MTok) or Nova Pro ($0.80/$3.20) offer the best quality-to-cost ratio.

What margin should I target for AI SaaS?

Target 70-80% gross margin. If Bedrock inference costs you $0.002 per request, charge $0.01-0.02 per request (5-10x markup). Use model routing to keep COGS at 5-15% of revenue - route 70% of requests to cheap models like Nova Micro ($0.035/MTok).

Can customers pay for my Bedrock SaaS through their AWS bill?

Yes - list on AWS Marketplace. Customers pay through consolidated AWS billing, can burn down EDP commitments on your product, and AWS sales reps get credit for recommending you. Marketplace fee is 3-5% of revenue.

AWS Bedrock for SaaS - Monetize AI, Pass Costs Downstream & Profit (2026)

SaaS dashboard representing AI-powered business analytics

Bedrock gives you 50+ models from $0.035/MTok - the margin opportunity is in how you package and sell access

Last updated: April 2026 - Covers Bedrock Intelligent Prompt Routing, prompt caching, Flex tier (50% off), Nova models, Claude 4.x, Llama 4, and AWS Marketplace SaaS integration.

Bedrock Pricing - Your Cost of Goods Sold

Bedrock charges per token consumed - API calls themselves are free. Your COGS is the raw token cost. Here are the models that matter for SaaS:

Production-Ready Models (Best Quality/Cost Ratio)

Model	Input $/MTok	Output $/MTok	Context	Best For
Nova Micro	$0.035	$0.14	128K	Classification, extraction, simple tasks
Nova Lite	$0.06	$0.24	300K	Summarization, Q&A, multimodal
Nova Pro	$0.80	$3.20	300K	Complex reasoning, code, analysis
Claude Haiku 3	$0.25	$1.25	200K	Fast, cheap, good quality
Claude Sonnet 4.6	$3.00	$15.00	200K	Premium tier - best overall quality
Llama 4 Scout	$0.27	$0.36	10M	Long context, open-weight
Llama 3.1 8B	$0.30	$0.60	128K	High-volume simple tasks
Mistral Large 3	$0.50	$1.50	128K	Balanced quality/cost

The spread is 430×. Nova Micro at $0.035/MTok vs Claude 3 Opus at $15/MTok. Your margin lives in choosing the right model for each task - not in using the most expensive one for everything.

Cost Reduction Levers

Feature	Savings	How
Prompt Caching	Up to 90%	Cache reads cost 10% of input price. 25% premium on writes.
Batch Inference	50%	Results within 24 hours. Great for async processing.
Flex Tier	50%	Trades immediate processing for cost efficiency.
Intelligent Routing	Up to 30%	$1/1K requests. Auto-routes between models in same family.
Model Distillation	Up to 75%	Train smaller model on larger model's outputs.

Monetization Models

1. Tiered Subscription + Usage Caps

The most common model. Predictable revenue with built-in upsell:

Tier	Price	Included	Model Access	Your COGS
Free	$0	50 requests/day	Nova Micro only	~$0.05/user/mo
Pro	$29/mo	1,000 requests/day	Nova Pro + Haiku	~$3-5/user/mo
Business	$99/mo	5,000 requests/day	All models incl. Sonnet	~$12-20/user/mo
Enterprise	Custom	Unlimited + SLA	All models + fine-tuned	Variable

2. Credit System

Sell credit packs that map to underlying token consumption. This is how Jasper, Copy.ai, and Writesonic work:

Credit Pack Pricing:
  100 credits  = $10   ($0.10/credit)
  500 credits  = $39   ($0.078/credit)
  2,000 credits = $99  ($0.050/credit)

Credit Consumption (hidden from user):
  Simple task (Nova Micro, ~500 tokens)  = 1 credit   → costs you $0.00003
  Standard task (Nova Pro, ~1K tokens)   = 5 credits   → costs you $0.004
  Premium task (Sonnet, ~1K tokens)      = 20 credits  → costs you $0.018

At $0.10/credit, a "Premium task" earns $2.00 on $0.018 COGS = 99.1% margin

3. Usage-Based (Per-Unit Pricing)

Charge per document, per conversation, per analysis - abstract away tokens entirely:

What You Charge	Example Price	Your Bedrock COGS	Gross Margin
Per document analyzed	$0.10	~$0.01-0.03	70-90%
Per conversation	$0.05	~$0.005-0.02	60-90%
Per image generated	$0.25	~$0.04-0.08	68-84%
Per 1K API calls	$5.00	~$0.50-1.50	70-90%

4. Markup Model (API Resale)

Simplest model - buy tokens wholesale from Bedrock, sell at 3-10× markup through your API:

Your cost:  Claude Sonnet 4.6 = $3.00 input / $15.00 output per MTok
You charge: $15.00 input / $60.00 output per MTok (5× markup)
Margin:     80%

With routing (70% to Haiku at $0.25/$1.25):
Blended cost: ~$1.08/$5.38 per MTok
You charge:   $10.00/$40.00 per MTok
Margin:       89%

Margin Math - Real Numbers

Scenario: AI Document Processing SaaS

Assumptions:
  - 10,000 paying users at $49/mo (Pro tier)
  - Each user processes ~200 documents/month
  - Each document: ~2K input tokens + ~500 output tokens
  - Model: Nova Pro ($0.80/$3.20 per MTok) for 70%, Sonnet ($3/$15) for 30%

Monthly Revenue:
  10,000 × $49 = $490,000

Monthly Bedrock COGS:
  Total requests: 10,000 × 200 = 2,000,000 documents
  
  Nova Pro (70% = 1,400,000 docs):
    Input:  1.4M × 2K tokens = 2.8B tokens × $0.80/1M  = $2,240
    Output: 1.4M × 500 tokens = 700M tokens × $3.20/1M  = $2,240
    Subtotal: $4,480
  
  Sonnet (30% = 600,000 docs):
    Input:  600K × 2K tokens = 1.2B tokens × $3.00/1M   = $3,600
    Output: 600K × 500 tokens = 300M tokens × $15.00/1M  = $4,500
    Subtotal: $8,100
  
  Total Bedrock: $12,580/month

Other COGS:
  AWS infra (API GW, Lambda, DynamoDB, S3): ~$2,000
  Stripe fees (2.9%): ~$14,210
  Support/ops: ~$5,000
  Total COGS: ~$33,790

Gross Profit: $490,000 - $33,790 = $456,210
Gross Margin: 93.1%
Bedrock as % of revenue: 2.6%

Key insight: At scale, Bedrock inference is typically only 2-5% of revenue for well-architected SaaS products. The real costs are payment processing, support, and infrastructure - not AI inference.

Margin Sensitivity by Model Choice

Strategy	Bedrock COGS/mo	% of Revenue	Gross Margin
All Sonnet (no routing)	$30,000	6.1%	87%
70/30 routing (Nova Pro + Sonnet)	$12,580	2.6%	93%
90/10 routing (Nova Micro + Sonnet)	$3,400	0.7%	96%
All Nova Micro	$420	0.09%	97%

Metering Architecture

The Stack: API Gateway → Lambda → Bedrock + Metering

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│   Client     │────▶│ API Gateway  │────▶│   Lambda    │
│  (Your App)  │     │  + API Key   │     │  (Router)   │
└─────────────┘     └──────────────┘     └──────┬──────┘
                                                 │
                    ┌────────────────────────────┼────────────────┐
                    │                            │                │
              ┌─────▼─────┐            ┌────────▼────────┐  ┌───▼───┐
              │  Bedrock   │            │   DynamoDB      │  │  SQS  │
              │  (Invoke)  │            │ (Usage Meter)   │  │(Async)│
              └────────────┘            └─────────────────┘  └───────┘

Token Metering Middleware (Python)

import boto3, json, time
from decimal import Decimal

bedrock = boto3.client("bedrock-runtime")
dynamodb = boto3.resource("dynamodb")
usage_table = dynamodb.Table("usage-meters")

# Model cost lookup (per 1M tokens)
MODEL_COSTS = {
    "amazon.nova-micro-v1:0":    {"input": 0.035, "output": 0.14},
    "amazon.nova-pro-v1:0":      {"input": 0.80,  "output": 3.20},
    "anthropic.claude-3-haiku":  {"input": 0.25,  "output": 1.25},
    "anthropic.claude-sonnet-4": {"input": 3.00,  "output": 15.00},
}

def invoke_and_meter(customer_id: str, model_id: str,
                     messages: list, system: str = "") -> dict:
    """Call Bedrock, count tokens, record usage."""
    
    body = {"messages": messages, "max_tokens": 1024}
    if system:
        body["system"] = system

    resp = bedrock.invoke_model(
        modelId=model_id,
        body=json.dumps(body),
        contentType="application/json",
    )
    
    result = json.loads(resp["body"].read())
    
    # Extract token counts from response metadata
    usage = result.get("usage", {})
    input_tokens = usage.get("input_tokens", 0)
    output_tokens = usage.get("output_tokens", 0)
    
    # Calculate cost
    costs = MODEL_COSTS[model_id]
    cost = (input_tokens * costs["input"] / 1_000_000 +
            output_tokens * costs["output"] / 1_000_000)
    
    # Record usage in DynamoDB
    month_key = time.strftime("%Y-%m")
    usage_table.update_item(
        Key={"customer_id": customer_id, "month": month_key},
        UpdateExpression="""
            ADD input_tokens :inp, output_tokens :out,
                total_cost :cost, request_count :one
        """,
        ExpressionAttributeValues={
            ":inp": input_tokens,
            ":out": output_tokens,
            ":cost": Decimal(str(round(cost, 6))),
            ":one": 1,
        },
    )
    
    return result

Smart Model Router

def route_request(messages: list, tier: str = "pro") -> str:
    """Route to cheapest model that can handle the task."""
    
    # Estimate complexity from message length
    total_chars = sum(len(m["content"]) for m in messages)
    
    if tier == "free":
        return "amazon.nova-micro-v1:0"
    
    if total_chars < 500:  # Simple task
        return "amazon.nova-micro-v1:0"      # $0.035/$0.14
    elif total_chars < 2000:  # Medium task
        return "anthropic.claude-3-haiku"     # $0.25/$1.25
    elif tier == "business":  # Complex + premium tier
        return "anthropic.claude-sonnet-4"    # $3/$15
    else:
        return "amazon.nova-pro-v1:0"         # $0.80/$3.20

Cutting Your COGS

Model routing - Route 70-90% of requests to Nova Micro/Haiku. Only escalate to Sonnet/Opus for complex tasks. Saves 50-90%.
Prompt caching - Cache system prompts and few-shot examples. Reads cost 10% of input price. Saves 50-90% on repeated context.
Batch inference - For async features (reports, bulk analysis), use batch API for 50% discount.
Flex tier - For latency-tolerant features, Flex tier is 50% off standard pricing.
Intelligent Prompt Routing - $1/1K requests. Bedrock auto-routes between models in the same family. Up to 30% savings.
S3 Vectors for RAG - Up to 90% cheaper than OpenSearch Serverless for Knowledge Bases.
Output token limits - Set max_tokens appropriately. Don't let Sonnet generate 4K tokens when 500 suffice.

AWS Marketplace Distribution

Listing on AWS Marketplace gives you access to enterprise budgets that are otherwise locked behind procurement:

Benefit	Why It Matters
Consolidated billing	Customers pay through their existing AWS bill. No new vendor approval.
EDP burn-down	Enterprises with committed AWS spend can use it on your product. "It's already budgeted."
AWS co-sell	AWS sales reps get credit for Marketplace transactions - they'll recommend you.
310K+ customers	Built-in distribution channel.

Marketplace Fees

Annual Revenue	AWS Fee
First $1M	5%
$1M - $10M	4%
$10M+	3%

Best listing model for AI SaaS: SaaS Subscription with Usage - base subscription + metered overage. Report usage via the Metering API hourly.

The Complete Playbook

Stage	Model	Why
Pre-PMF (0-100 users)	Simple markup or free tier	Validate demand. Don't optimize yet.
Early (100-1K)	Tiered subscription + usage caps	Predictable revenue. Protect margins.
Growth (1K-10K)	Subscription + credits/overage	Capture expansion revenue.
Scale (10K+)	Full usage-based + Marketplace	Maximize per-customer revenue. Enterprise access.

The Bottom Line

Bedrock inference is cheap - Nova Micro at $0.035/MTok means a typical request costs fractions of a penny. The margin opportunity is enormous: well-architected SaaS products achieve 90%+ gross margins on AI features. The key is model routing (don't use Sonnet for everything), prompt caching (don't re-send the same context), and abstracting tokens into units your customers understand (documents, conversations, credits - not tokens). List on AWS Marketplace for enterprise distribution, run Stripe for self-serve, and target 70-80% blended gross margin.

AWS Bedrock for SaaS - Monetize AI, Pass Costs Downstream & Profit

📑 Table of Contents