Skip to content
SaaS dashboard representing AI-powered business analytics

Bedrock gives you 50+ models from $0.035/MTok - the margin opportunity is in how you package and sell access

Last updated: April 2026 - Covers Bedrock Intelligent Prompt Routing, prompt caching, Flex tier (50% off), Nova models, Claude 4.x, Llama 4, and AWS Marketplace SaaS integration.

Bedrock Pricing - Your Cost of Goods Sold

Bedrock charges per token consumed - API calls themselves are free. Your COGS is the raw token cost. Here are the models that matter for SaaS:

Production-Ready Models (Best Quality/Cost Ratio)

ModelInput $/MTokOutput $/MTokContextBest For
Nova Micro$0.035$0.14128KClassification, extraction, simple tasks
Nova Lite$0.06$0.24300KSummarization, Q&A, multimodal
Nova Pro$0.80$3.20300KComplex reasoning, code, analysis
Claude Haiku 3$0.25$1.25200KFast, cheap, good quality
Claude Sonnet 4.6$3.00$15.00200KPremium tier - best overall quality
Llama 4 Scout$0.27$0.3610MLong context, open-weight
Llama 3.1 8B$0.30$0.60128KHigh-volume simple tasks
Mistral Large 3$0.50$1.50128KBalanced quality/cost
The spread is 430Γ—. Nova Micro at $0.035/MTok vs Claude 3 Opus at $15/MTok. Your margin lives in choosing the right model for each task - not in using the most expensive one for everything.

Cost Reduction Levers

FeatureSavingsHow
Prompt CachingUp to 90%Cache reads cost 10% of input price. 25% premium on writes.
Batch Inference50%Results within 24 hours. Great for async processing.
Flex Tier50%Trades immediate processing for cost efficiency.
Intelligent RoutingUp to 30%$1/1K requests. Auto-routes between models in same family.
Model DistillationUp to 75%Train smaller model on larger model's outputs.

Monetization Models

1. Tiered Subscription + Usage Caps

The most common model. Predictable revenue with built-in upsell:

TierPriceIncludedModel AccessYour COGS
Free$050 requests/dayNova Micro only~$0.05/user/mo
Pro$29/mo1,000 requests/dayNova Pro + Haiku~$3-5/user/mo
Business$99/mo5,000 requests/dayAll models incl. Sonnet~$12-20/user/mo
EnterpriseCustomUnlimited + SLAAll models + fine-tunedVariable

2. Credit System

Sell credit packs that map to underlying token consumption. This is how Jasper, Copy.ai, and Writesonic work:

Credit Pack Pricing:
  100 credits  = $10   ($0.10/credit)
  500 credits  = $39   ($0.078/credit)
  2,000 credits = $99  ($0.050/credit)

Credit Consumption (hidden from user):
  Simple task (Nova Micro, ~500 tokens)  = 1 credit   β†’ costs you $0.00003
  Standard task (Nova Pro, ~1K tokens)   = 5 credits   β†’ costs you $0.004
  Premium task (Sonnet, ~1K tokens)      = 20 credits  β†’ costs you $0.018

At $0.10/credit, a "Premium task" earns $2.00 on $0.018 COGS = 99.1% margin

3. Usage-Based (Per-Unit Pricing)

Charge per document, per conversation, per analysis - abstract away tokens entirely:

What You ChargeExample PriceYour Bedrock COGSGross Margin
Per document analyzed$0.10~$0.01-0.0370-90%
Per conversation$0.05~$0.005-0.0260-90%
Per image generated$0.25~$0.04-0.0868-84%
Per 1K API calls$5.00~$0.50-1.5070-90%

4. Markup Model (API Resale)

Simplest model - buy tokens wholesale from Bedrock, sell at 3-10Γ— markup through your API:

Your cost:  Claude Sonnet 4.6 = $3.00 input / $15.00 output per MTok
You charge: $15.00 input / $60.00 output per MTok (5Γ— markup)
Margin:     80%

With routing (70% to Haiku at $0.25/$1.25):
Blended cost: ~$1.08/$5.38 per MTok
You charge:   $10.00/$40.00 per MTok
Margin:       89%

Margin Math - Real Numbers

Scenario: AI Document Processing SaaS

Assumptions:
  - 10,000 paying users at $49/mo (Pro tier)
  - Each user processes ~200 documents/month
  - Each document: ~2K input tokens + ~500 output tokens
  - Model: Nova Pro ($0.80/$3.20 per MTok) for 70%, Sonnet ($3/$15) for 30%

Monthly Revenue:
  10,000 Γ— $49 = $490,000

Monthly Bedrock COGS:
  Total requests: 10,000 Γ— 200 = 2,000,000 documents
  
  Nova Pro (70% = 1,400,000 docs):
    Input:  1.4M Γ— 2K tokens = 2.8B tokens Γ— $0.80/1M  = $2,240
    Output: 1.4M Γ— 500 tokens = 700M tokens Γ— $3.20/1M  = $2,240
    Subtotal: $4,480
  
  Sonnet (30% = 600,000 docs):
    Input:  600K Γ— 2K tokens = 1.2B tokens Γ— $3.00/1M   = $3,600
    Output: 600K Γ— 500 tokens = 300M tokens Γ— $15.00/1M  = $4,500
    Subtotal: $8,100
  
  Total Bedrock: $12,580/month

Other COGS:
  AWS infra (API GW, Lambda, DynamoDB, S3): ~$2,000
  Stripe fees (2.9%): ~$14,210
  Support/ops: ~$5,000
  Total COGS: ~$33,790

Gross Profit: $490,000 - $33,790 = $456,210
Gross Margin: 93.1%
Bedrock as % of revenue: 2.6%
Key insight: At scale, Bedrock inference is typically only 2-5% of revenue for well-architected SaaS products. The real costs are payment processing, support, and infrastructure - not AI inference.

Margin Sensitivity by Model Choice

StrategyBedrock COGS/mo% of RevenueGross Margin
All Sonnet (no routing)$30,0006.1%87%
70/30 routing (Nova Pro + Sonnet)$12,5802.6%93%
90/10 routing (Nova Micro + Sonnet)$3,4000.7%96%
All Nova Micro$4200.09%97%

Metering Architecture

The Stack: API Gateway β†’ Lambda β†’ Bedrock + Metering

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Client     │────▢│ API Gateway  │────▢│   Lambda    β”‚
β”‚  (Your App)  β”‚     β”‚  + API Key   β”‚     β”‚  (Router)   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
                                                 β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚                            β”‚                β”‚
              β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”
              β”‚  Bedrock   β”‚            β”‚   DynamoDB      β”‚  β”‚  SQS  β”‚
              β”‚  (Invoke)  β”‚            β”‚ (Usage Meter)   β”‚  β”‚(Async)β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”˜

Token Metering Middleware (Python)

import boto3, json, time
from decimal import Decimal

bedrock = boto3.client("bedrock-runtime")
dynamodb = boto3.resource("dynamodb")
usage_table = dynamodb.Table("usage-meters")

# Model cost lookup (per 1M tokens)
MODEL_COSTS = {
    "amazon.nova-micro-v1:0":    {"input": 0.035, "output": 0.14},
    "amazon.nova-pro-v1:0":      {"input": 0.80,  "output": 3.20},
    "anthropic.claude-3-haiku":  {"input": 0.25,  "output": 1.25},
    "anthropic.claude-sonnet-4": {"input": 3.00,  "output": 15.00},
}

def invoke_and_meter(customer_id: str, model_id: str,
                     messages: list, system: str = "") -> dict:
    """Call Bedrock, count tokens, record usage."""
    
    body = {"messages": messages, "max_tokens": 1024}
    if system:
        body["system"] = system

    resp = bedrock.invoke_model(
        modelId=model_id,
        body=json.dumps(body),
        contentType="application/json",
    )
    
    result = json.loads(resp["body"].read())
    
    # Extract token counts from response metadata
    usage = result.get("usage", {})
    input_tokens = usage.get("input_tokens", 0)
    output_tokens = usage.get("output_tokens", 0)
    
    # Calculate cost
    costs = MODEL_COSTS[model_id]
    cost = (input_tokens * costs["input"] / 1_000_000 +
            output_tokens * costs["output"] / 1_000_000)
    
    # Record usage in DynamoDB
    month_key = time.strftime("%Y-%m")
    usage_table.update_item(
        Key={"customer_id": customer_id, "month": month_key},
        UpdateExpression="""
            ADD input_tokens :inp, output_tokens :out,
                total_cost :cost, request_count :one
        """,
        ExpressionAttributeValues={
            ":inp": input_tokens,
            ":out": output_tokens,
            ":cost": Decimal(str(round(cost, 6))),
            ":one": 1,
        },
    )
    
    return result

Smart Model Router

def route_request(messages: list, tier: str = "pro") -> str:
    """Route to cheapest model that can handle the task."""
    
    # Estimate complexity from message length
    total_chars = sum(len(m["content"]) for m in messages)
    
    if tier == "free":
        return "amazon.nova-micro-v1:0"
    
    if total_chars < 500:  # Simple task
        return "amazon.nova-micro-v1:0"      # $0.035/$0.14
    elif total_chars < 2000:  # Medium task
        return "anthropic.claude-3-haiku"     # $0.25/$1.25
    elif tier == "business":  # Complex + premium tier
        return "anthropic.claude-sonnet-4"    # $3/$15
    else:
        return "amazon.nova-pro-v1:0"         # $0.80/$3.20

Cutting Your COGS

  1. Model routing - Route 70-90% of requests to Nova Micro/Haiku. Only escalate to Sonnet/Opus for complex tasks. Saves 50-90%.
  2. Prompt caching - Cache system prompts and few-shot examples. Reads cost 10% of input price. Saves 50-90% on repeated context.
  3. Batch inference - For async features (reports, bulk analysis), use batch API for 50% discount.
  4. Flex tier - For latency-tolerant features, Flex tier is 50% off standard pricing.
  5. Intelligent Prompt Routing - $1/1K requests. Bedrock auto-routes between models in the same family. Up to 30% savings.
  6. S3 Vectors for RAG - Up to 90% cheaper than OpenSearch Serverless for Knowledge Bases.
  7. Output token limits - Set max_tokens appropriately. Don't let Sonnet generate 4K tokens when 500 suffice.

AWS Marketplace Distribution

Listing on AWS Marketplace gives you access to enterprise budgets that are otherwise locked behind procurement:

BenefitWhy It Matters
Consolidated billingCustomers pay through their existing AWS bill. No new vendor approval.
EDP burn-downEnterprises with committed AWS spend can use it on your product. "It's already budgeted."
AWS co-sellAWS sales reps get credit for Marketplace transactions - they'll recommend you.
310K+ customersBuilt-in distribution channel.

Marketplace Fees

Annual RevenueAWS Fee
First $1M5%
$1M - $10M4%
$10M+3%

Best listing model for AI SaaS: SaaS Subscription with Usage - base subscription + metered overage. Report usage via the Metering API hourly.

The Complete Playbook

StageModelWhy
Pre-PMF (0-100 users)Simple markup or free tierValidate demand. Don't optimize yet.
Early (100-1K)Tiered subscription + usage capsPredictable revenue. Protect margins.
Growth (1K-10K)Subscription + credits/overageCapture expansion revenue.
Scale (10K+)Full usage-based + MarketplaceMaximize per-customer revenue. Enterprise access.

The Bottom Line

Bedrock inference is cheap - Nova Micro at $0.035/MTok means a typical request costs fractions of a penny. The margin opportunity is enormous: well-architected SaaS products achieve 90%+ gross margins on AI features. The key is model routing (don't use Sonnet for everything), prompt caching (don't re-send the same context), and abstracting tokens into units your customers understand (documents, conversations, credits - not tokens). List on AWS Marketplace for enterprise distribution, run Stripe for self-serve, and target 70-80% blended gross margin.