Skip to main content
Truvisory
Cloudflare-Native

The Real Cost of Cloudflare Workers vs AWS Lambda for an AI App

Tony Adams 10 min read

For the AI workloads most founders are actually shipping in 2026 — a request hits your endpoint, your code waits 1–3 seconds on an LLM or model API, you stream the response back — Cloudflare Workers runs about $5–$10/month at 5 million requests, while AWS Lambda plus the supporting AWS services it actually needs (API Gateway, CloudWatch, NAT Gateway, S3 egress) runs $350–$460/month for the same job. That’s not a 20% delta. It’s roughly 40–50x, and the entire spread comes from one structural decision: Workers bills CPU time, Lambda bills wall-clock × memory. If your app spends most of its time waiting on an inference call, Workers charges near zero for the wait and Lambda charges for every millisecond of it.

This is the dollars-and-cents piece. The architecture side — cold starts, V8 isolates vs. Firecracker, runtime limits, how each platform actually runs inference — lives in the Workers vs Lambda architecture comparison, and the reserved-vs-serverless GPU economics live in the GPU cost math. What follows is just the bill.

~40x
Workers vs Lambda monthly bill delta for the same IO-bound AI workload — $7 vs $463 — Worked example, May 2026 primary-source pricing

The one line that drives everything

  • Cloudflare Workers bills CPU time. Time spent waiting on a fetch(), a KV read, or a model API call doesn’t count. The average Worker uses ~2.2 ms of CPU per request. Standard plan: $5/month base, including 10M requests + 30M CPU-ms, then $0.30 per additional million requests and $0.02 per million CPU-ms — with no charge for duration.
  • AWS Lambda bills wall-clock duration × memory. $0.20 per million requests + $0.0000166667 per GB-second (x86, us-east-1). A function allocated 1 GB that runs 2 seconds pays 2 GB-seconds — even if 1.95 of those seconds are just await fetch(modelEndpoint).

That difference is the whole article. Everything below is what it does to a real bill.

Two things made Lambda strictly worse in 2025. First, the August 1, 2025 INIT-phase change: cold-start initialization used to be free on the most common Lambda configuration (on-demand ZIP functions on managed runtimes); as of that date AWS bills it. For LLM-adjacent functions importing big SDKs (LangChain, boto3, the OpenAI or Anthropic client), INIT can run 500 ms–2 s — now a recurring dollar cost on top of the latency cost. Workers has no analogous charge; V8 isolates have no billable INIT phase. Second, the 128 MB / 5-minute reality: you can’t load a 4 GB model into a 128 MB isolate, and you can’t run more than 5 minutes of in-process CPU. If your design needs the model resident in the function, Workers is off the table at any price — go price Lambda, Fargate, or a GPU service. But most production AI apps don’t load the model in-process — it lives at OpenAI, Anthropic, Bedrock, Workers AI, or a hosted endpoint, and your function is a thin, IO-bound orchestrator that fits comfortably in 128 MB. That’s the workload Workers is built for, and the one the numbers below describe.

The worked monthly bill

Price a concrete app: 5 million requests/month, each calling an external LLM that takes ~1.5 s wall-clock, with ~15 ms of actual CPU per request (parse, build prompt, format, log) — the other 1,485 ms is network wait. Lambda sized at 1 GB. The app serves 2 TB/month of model outputs and assets from object storage and stores 100 GB. It needs low p95 latency, so we price Lambda with Provisioned Concurrency to keep the user-experience comparison fair (the architecture piece explains why Workers needs no such thing). We exclude the model token bill itself — that’s a pass-through on both platforms and the subject of gpu-math. This is the platform bill.

// Workers stack — monthly bill, 5M IO-bound requests
Line item Calculation Monthly
Workers Paid base$5 flat$5.00
Requests5M − 10M included$0.00
CPU time5M × 15 ms = 75M − 30M included = 45M × $0.02/M$0.90
R2 storage100 GB × $0.015$1.50
R2 egress (2 TB)zero, always$0.00
AI Gateway (caching, logs, observability)free on Paid$0.00
Total~$7.40
// Lambda stack — monthly bill, same workload
Line item Calculation Monthly
Requests(5M − 1M free) × $0.20/M$0.80
Duration5M × 1.5 s × 1 GB = 7.5M − 0.4M free = 7.1M GB-s × $0.0000166667$118.33
Provisioned Concurrency10 × 1 GB × 730 hr$109.50
API Gateway (HTTP API)5M × $1.00/M$5.00
CloudWatch Logs (~30 GB)30 × $0.50/GB$15.00
NAT Gateway (VPC)$0.045/hr × 730 + processing$35.10
S3 storage100 GB × $0.023$2.30
S3 egress(2,048 − 100 free) × $0.09$175.32
S3 requests (~5M GET)$2.00
Total~$463

Drop Provisioned Concurrency and live with cold-start latency (and the new INIT billing) and the Lambda stack falls to ~$354 — same order of magnitude, still ~40–50x the Workers bill.

Two things matter more than the totals. Lambda duration ($118) is the thesis in one line — we’re billing 1.5 s × 1 GB on every one of 5 million requests, and 1.485 of those seconds is pure network wait; Workers’ equivalent is $0.90 because it only bills the 15 ms of real CPU. And the supporting-service tax is structural: API Gateway, CloudWatch, NAT Gateway, and S3 egress add ~$230/month here — more than Lambda itself. As LeanOps’ 2026 analysis put it, “80% of your ‘Lambda’ bill was never Lambda.” Cloudflare’s equivalent line items are mostly $0.00 because egress is free, AI Gateway is free, and the platform doesn’t need a NAT Gateway to reach the internet.

Independent validation cuts both ways. Cloudflare’s own Baselime post-mortem reported that network I/O accounted for over 70% of Lambda wall time — billed as compute — and that moving the data-receptor layer to Workers would cut a $790/day AWS bill to roughly $25/day, a >95% reduction, “primarily driven by the Workers pricing model, since Workers charge for CPU time.” Flag the bias — that’s a Cloudflare blog about a Cloudflare acquisition — but the mechanic is structural and our worked example reproduces a 40–50x delta from first principles. Vendor-neutral cost tooling firm Vantage found the same shape and its inverse: Workers ~10–200% cheaper for IO-bound workloads depending on scale, but Lambda ~20–25% cheaper for a genuinely CPU-bound image-recognition workload (2.5 s of real CPU per task). CPU-time billing is a disadvantage when you’re actually using the CPU the whole time.

The egress swing factor

You saw the line item — $175 vs. $0 for 2 TB. The underlying 2026 math:

// R2 vs S3 — storage, egress, and per-operation pricing (May 2026)
Provider Storage Egress Class A / B ops
Cloudflare R2$0.015/GB-mo$0.00/GB$4.50/M (A) / $0.36/M (B)
AWS S3 Standard$0.023/GB-mo$0.09/GB (first 10 TB, after 100 GB free)~$5/M PUT / ~$0.40/M GET

Egress matters more for an AI app than a generic SaaS: every generated image, audio file, PDF, or large RAG-context blob you serve to a user is paid egress on S3 and free on R2, and any time you pull S3 data to non-AWS compute (your edge, a partner SaaS, an analytics provider) you pay $0.09/GB on the way out. Cloudflare frames egress fees as a “tax on architectural choice” — that’s marketing language, but the structural critique holds and the dollars are real. The architectural lock-in angle lives at the pillar; here it’s just a line on the bill.

The AWS hidden-cost tax

None of these show up when you punch numbers into the Lambda pricing calculator, and together they’re where AWS bills go off the rails:

  • NAT Gateway — if your Lambda is in a VPC (to reach RDS, OpenSearch, anything private) and also calls the internet (Stripe, Twilio, an LLM provider), every byte traverses NAT at $0.045/hour plus $0.045/GB. AWS best practice is one per Availability Zone for HA, so a 3-AZ deployment is ~$98/month baseline before a byte flows.
  • CloudWatch Logs — every invocation writes START/END/REPORT lines plus your own logging; 50–100 GB/month is realistic on a chatty function, billed from $0.50/GB in the first tier, with storage continuing forever unless you set retention (the default is “Never Expire”).
  • API Gateway — $1.00/M (HTTP API) or $3.50/M (REST) in the first tier; Function URLs remove it entirely for simple endpoints, which many teams don’t realize.
  • Inter-AZ transfer — $0.01/GB each direction whenever you cross AZs (Lambda in one AZ hitting RDS in another).

The comprehensibility gap is the real point. A production Cloudflare bill for this app has ~4 line items; the AWS equivalent has Lambda, API Gateway, CloudWatch (several sub-lines), NAT Gateway (hourly + per-GB), S3 (storage + requests + egress), and data transfer. AWS built Cost Explorer and Cost Anomaly Detection because the bill is unintuitive enough to need them.

Where each pricing model wins

Workers wins when the workload is IO-bound (waiting on an LLM API — essentially every app calling a model someone else hosts; CPU-time billing makes this 10–50x cheaper), egress is meaningful (zero R2 egress eats the S3 bill), you need low latency without paying for Provisioned Concurrency, traffic is bursty (no idle cost), or you want global distribution without re-architecting (Cloudflare runs in 335+ cities within 50 ms of 95% of the population; Lambda needs CloudFront + Lambda@Edge or multi-region duplication to match it).

Lambda wins or is necessary when the workload is genuinely CPU-bound and runs the CPU continuously (Vantage’s image-recognition case, Lambda 20–25% cheaper), you need more than 128 MB of in-process memory (model loading, large in-process data — Workers can’t, at any price), you need more than 5 minutes of CPU per invocation (Lambda gives 15), you’re deep in AWS already (re-architecture and data-extraction egress may exceed the savings), or you have a Compute Savings Plan / EDP commitment (up to ~17% off Lambda duration and Provisioned Concurrency, though request charges get no discount).

Architecture and performance trade-offs are in the Workers vs Lambda comparison; reserved-vs-serverless GPU economics are in gpu-math. This piece stays on the platform billing.

The decision, by workload

Pricing a new IO-bound AI app (the model lives elsewhere): build the orchestrator on Workers Paid, put assets and embeddings backups on R2 (egress savings alone justify it), and put AI Gateway in front of every model call for free caching and observability. Trigger to reconsider: a single function legitimately needing >128 MB or >5 min CPU — peel that off to Lambda or a container and keep the orchestrator on Workers.

Already on Lambda and bleeding: instrument first (segment Lambda vs. API Gateway vs. CloudWatch vs. NAT vs. S3 — if supporting services exceed Lambda, you’re the case study), take the quick AWS-side wins (HTTP API or Function URLs over REST, CloudWatch retention, VPC endpoints over NAT, memory right-sizing), and migrate when an IO-bound function’s bill clears ~$200/month — the ROI is days, not months.

CPU-bound workload (using the CPU the whole time): don’t migrate to Workers for cost; it’ll be similar or worse. Migrate only for global distribution, no cold starts, or simpler ops — and price it honestly. Trigger to revisit: if you swap in-process inference for a hosted model API, the workload becomes IO-bound and Workers economics kick in.

For any AI app on either platform, the single largest line item is usually the model token bill, not the platform bill — put a caching layer in front of every paid model call, because cache hits are free dollars wherever your compute runs. (gpu-math covers that side.)

Frequently asked

Is the 40–50x delta real or marketing?
It's a worked example using primary-source pricing applied to a realistic IO-bound workload (5M req/month, 1.5 s wait, 15 ms CPU). The wider the gap between CPU time and wall-clock, the bigger the advantage. For a CPU-bound shape the delta inverts and Lambda is ~20–25% cheaper.
Doesn't the Lambda free tier change the math?
Only at tiny scale. At 5M requests you exhaust the 400K GB-second free tier in the first ~53,000 requests at 1 GB × 1.5 s. It's a rounding error on a real bill.
Does SnapStart fix the INIT cost?
It largely eliminates billable INIT for supported runtimes (Java, .NET, Python) and is the right move if you're staying on Lambda — but it doesn't touch the wall-clock × memory billing of the INVOKE phase, which is where the IO-bound cost problem actually lives.
Can I run Workers in front of AWS Bedrock?
Yes, and it's a sensible pattern — Workers' CPU-time billing for the orchestrator, AWS's model catalog for inference. You pay AWS egress on the response leaving AWS (usually small), and AI Gateway caching can recoup it on cacheable prompts.
Won't Cloudflare raise prices once everyone migrates?
Maybe — AWS did exactly that with INIT billing. Re-verify pricing before any commitment. But today's IO-bound delta is large enough that even a meaningful Cloudflare hike wouldn't close it.

Working with Truvisory

Need an operator’s read on whether to migrate, refactor, or stay put? Truvisory ships working software on Cloudflare-native architecture for founders who’d rather pay for an engineer than a strategy deck. We do the bill math against your actual traffic shape — IO-bound or CPU-bound — and ship the migration in weeks, not quarters.

Truvisory is a Denver-based AI and automation consultancy run by a senior operator — a combat veteran and former PE-backed operating executive — who ships working software, not strategy decks. Cloudflare-native by default, for both AI delivery and the back-office automation where the ROI lives.

If you’re weighing a Cloudflare-native AI build — mid-market or federal — start with a scoping call. For the wider platform argument, the Cloudflare pillar is the index.