§ Cloudflare-Native

Cloudflare-native AI, by an engineer who lives on the platform.

We design, build, and ship production AI systems on Cloudflare's edge — Workers, Workers AI, Agents SDK, Durable Objects, Vectorize, AI Gateway, R2, D1, Queues, Containers — at sub-50ms latency in 330+ cities.

Start a Cloudflare Discovery →See HotCopy in action →

~50msp50

to 95% of users

330+

cities

~20%

of the web

avg GPU util · industry

§ 01 / Why Cloudflare for AI in 2026

Three reasons the math works.

REASON / 01

Pay-per-inference, not pay-per-idle-GPU.

Per Cast AI's 2026 State of Kubernetes Optimization Report, average GPU utilization is just 5% across 23,000 production clusters. "At 5% utilization, the math doesn't work." Workers AI bills for actual inference — not reserved capacity.

5%industry GPU util

REASON / 02

Edge by default.

Data centers in 330+ cities deliver ~50ms latency to about 95% of internet users — on the same infrastructure proxying ~20% of the web. Region-pin for sovereignty when federal mission requires it.

330+cities · sub-50ms

REASON / 03

State and compute in one primitive.

Durable Objects give each agent its own SQL database, persistent memory, and hibernation — no separate data tier, no cold start, no glue code. Multi-agent systems become a primitive, not an architecture project.

1×primitive · agent + state

§ 02 / Reference architectures

Four patterns we ship, end to end.

Pattern · ARAG

Retrieval-Augmented Generation over private data

Hybrid semantic+keyword retrieval, AI-Gateway-fronted models, R2-backed source corpus, audit trail end to end.

// Use case: agency knowledge bases, contract repositories, customer support corpora

WorkerEdge entry

→

VectorizeHybrid search

→

AI GatewayGuardrails

→

Workers AIInference

Backed byR2 corpusD1 metadataAudit log

Pattern · BAGENT

Stateful agent with tool use

Each agent is a Durable Object with its own SQL state and lifecycle. MCP tool surface for clean external integrations. Hibernates when idle. The full agents & MCP playbook →

// Use case: customer agents, internal assistants, long-running automations

Agents SDKLifecycle

↔

Durable ObjectSQL · Memory

↔

MCPTool surface

Calls out toWorkers AIOpenAI / Anthropic / GeminiExternal tools

Pattern · CRLM

Recursive multi-agent orchestrator

Kimi k2.6 (1T MoE) as the orchestrator; small Gemma scout workers run decomposed sub-tasks. Derived from MIT CSAIL's Recursive Language Models work (arXiv:2512.24601). This is the HotCopy architecture.

// Use case: code transformation, deep document understanding, complex multi-step research

Kimi k2.6Orchestrator · 1T MoE

SpawnsScout · GemmaScout · GemmaScout · Gemma+N

SubstrateWorkers AIDurable ObjectsQueues

Pattern · DFED

Federal-friendly edge deployment

Same Cloudflare primitives, hardened: AI Gateway guardrails, region-pinning for data sovereignty, immutable audit logging, role-based access, FedRAMP-aware deployment patterns.

// Use case: agency RAG, intra-agency assistants, OMB AI Action Plan rollouts

Zero TrustAuth

→

WorkerRegion-pinned

→

AI GatewayGuardrails · log

→

Workers AIInference

Audit + sovereigntyImmutable log → R2Region-pin · USRole-based accessCMMC posture

§ 03 / Data control

Sovereignty isn't a checkbox. Here's the mechanism.

“Region-pinning” is three concrete controls, not a slogan:

Regional Services

Pins where High and Moderate impact data is processed, without losing edge performance.

Metadata Boundary

Keeps all government data inside the defined FedRAMP region.

FIPS-validated encryption

Every hop between edge and core, always.

We compose these into the deployment so CUI stays where your authorization says it stays.

Data compliance →

§ 04 / Post-quantum

Quantum-safe by default — no config, no penalty.

Every site and API we serve through Cloudflare is protected against “harvest now, decrypt later” with TLS 1.3 + ML-KEM — automatically, no configuration changes. Post-quantum protection extends to Zero Trust access and the Secure Web Gateway, so encrypted traffic stays inspectable as you migrate. Built ahead of NIST's 2030–2035 deprecation deadlines.

PQC topic page →PQC solution brief →

§ 05 / AI security architecture

Guardrails, not vibes.

The AI Gateway is the control plane between your application and any model provider:

Threat detection

Prompt injection, model poisoning, and excessive-use abuse, caught at the proxy.

Bi-directional data control

What the user submits and what the model returns, both inspected and redacted.

MCP server portal

Accessible servers behind one URL, with OAuth authorization and least-privilege access per agent.

Credentials at the edge

API keys and secrets never touch the client; rotation stays simple.

RAG helps the model know more. MCP helps it do more. The Gateway makes both safe.

AI security →AI Blueprint →

§ 06 / FedRAMP architecture

Every service. Every location. No enclave.

Some vendors carve out a special FedRAMP enclave and make you wait for capabilities to land in it. Cloudflare runs the same software in every data center, including the FedRAMP processing locations — so the authorized Cloudflare for Government environment carries nearly the entire platform, not a stripped-down subset. For federal engagements we provision there; commercial engagements run on the standard network. Either way: no rip-and-replace, no capability lag.

FedRAMP architecture brief →

§ 07 / Capability matrix

Every Cloudflare primitive we work in, with its real use.

Workers

// edge HTTP entry · routing · middleware

Workers AI

// pay-per-inference LLM & embedding

Agents SDK

// agent lifecycle & orchestration

Durable Objects

// per-agent SQL state · hibernation

Vectorize

// vector search · hybrid retrieval

AI Gateway

// guardrails · cost · cache · obs

AI Search

// semantic search-as-a-service

// object storage · zero egress

// edge SQLite · serverless DB

// global config · feature flags

Queues

// durable async work

Containers

// long-running side workloads

Hyperdrive

// accelerated legacy DB access

Browser Rendering

// headless browser · scraping

Workflows

// durable multi-step pipelines

Sandbox

// isolated code execution

§ 08 / Live proof

Run a Worker. From the closest edge.

This page hits a Truvisory®-deployed Cloudflare Worker on first paint. The latency you see is your latency to the closest of 330+ Cloudflare data centers — typically the same region you live in.

→Edge ping with location, RTT, and colo
→Live JSON from a Workers AI inference call
→Source on GitHub — verbatim, deployable in 90 seconds

View source →Start a Discovery →

truvisory-edge.workers.dev● live

// 1. Ping the closest Cloudflare edge

› curl https://edge.truvisory.com/whoami

{

"colo": "DEN",

"city": "Denver, CO, US",

"rtt_ms": 38,

"region": "WNAM",

"ts": "2026-05-05T14:22:11Z"

}

// 2. Inference on Workers AI · Llama 3.3

› POST /infer {"q":"summarize CMMC L2 in one line"}

→ "CMMC L2 = 110 NIST 800-171 controls, third-party assessed for CUI."

// model: @cf/meta/llama-3.3-70b · 41ms p50 · cached: false

›

§ Field notes

Cloudflare field notes

Architecture decisions from active builds on the edge.

// Pillar guideMay 2026

Why We Build AI on Cloudflare: The Mid-Market and Federal Case for a Cloudflare-Native AI Stack

Why the Cloudflare Developer Platform is the best default backbone for production AI in 2026 — for mid-market operators and federal buyers. The business case across cost, speed, security, and ownership, with honest limits on where hyperscalers still win.

The pillar of our Cloudflare-vs-hyperscalers series — start here, then go deep on inference, latency, and dollars.

10 min readRead →

May 2026

§ Field notes

AI observability field notes

See it, cost it, trust it — instrumenting AI on Cloudflare with AI Gateway, evals, and an audit trail.

// Pillar guideJun 2026

AI Observability, Cost, and Evaluation on Cloudflare: How AI Gateway Stops You Flying Blind

Ship an AI feature and you go blind on three things: what the model is doing, what it costs, and whether it's any good. Here's how Cloudflare AI Gateway and native analytics instrument all three.

Start with the observability-cost-evals hub, then go deep on each layer.

8 min readRead →

Jun 2026

Cloudflare AI Gateway as Your Observability Layer: Every LLM Request, Logged and Queryable

9 min readRead →

Jun 2026

Controlling AI Model Costs on Cloudflare: The Levers That Actually Reduce Token Spend

9 min readRead →

Jun 2026

Cloudflare Audit Logs for AI: A Tamper-Evident, Compliance-Grade Record of Every AI Request

12 min readRead →

§ 09 / Partner posture

Honest about today. Transparent about the path.

Today

Cloudflare-Native Engineer

What we are right now.

Daily-driver builder on Workers, Workers AI, Durable Objects, Vectorize, AI Gateway, R2, D1, Agents SDK, MCP. Production deployments on file (HotCopy, PresEngage). The engineer-on-the-platform claim is true and defensible.

Pursuing

▣ Cloudflare ASDP · Application Services

What we're earning, not claiming.

Cloudflare's ASDP designation requires rigorous technical validation of security, performance, and reliability. We're in the process — and we won't surface a partner badge on the site that hasn't been earned. When it lands, you'll see it.

§ 10 / FAQ

What teams ask before they commit to the platform.

Why build AI on Cloudflare instead of AWS or Azure?

For inference and application workloads — not model training — one integrated edge platform beats stitching together a dozen hyperscaler services. You get pay-per-inference instead of idle-GPU billing, sub-50ms latency in 330+ cities, and state plus compute in one primitive. The math works where 5% average GPU utilization makes reserved capacity a losing bet.

Is Cloudflare lock-in a risk?

Less than the hyperscaler alternative. Workers are standards-based, R2 is S3-compatible, and model routing is provider-agnostic. The real asterisk is Cloudflare-specific bindings — Durable Objects, D1, Vectorize — which would be a port, not a copy, if you ever leave. We keep your application logic and data model portable and say so up front.

What does Cloudflare-native cost versus a Lambda-style stack?

Structurally less for inference and application workloads: zero egress fees, no idle-GPU reservations, and one comprehensible bill instead of charges spread across compute, storage, data transfer, and a separate vector database. For sustained GPU training it's the wrong tool — use a hyperscaler with reserved capacity there.

Do you migrate existing workloads onto Cloudflare?

Yes, incrementally. We typically front your current system with Workers and Hyperdrive to accelerate legacy database access, move the highest-value paths to the edge, and leave the rest in place until it earns the move. No rip-and-replace, and the code lands in your repo at each step.

What parts of the stack do you actually use?

Daily drivers are Workers, Workers AI, Agents SDK, Durable Objects, Vectorize, AI Gateway, R2, D1, and Queues — with Hyperdrive, Containers, and Workflows where the workload calls for them. The §07 capability matrix lists every primitive we work in with its real use; we pick the smallest set that ships your system.

Cloudflare-native AI, by an engineer who lives on the platform.

Three reasons the math works.

Pay-per-inference, not pay-per-idle-GPU.

Edge by default.

State and compute in one primitive.

Four patterns we ship, end to end.

Retrieval-Augmented Generation over private data

Stateful agent with tool use

Recursive multi-agent orchestrator

Federal-friendly edge deployment

Sovereignty isn't a checkbox. Here's the mechanism.

Regional Services

Metadata Boundary

FIPS-validated encryption

Quantum-safe by default — no config, no penalty.

Guardrails, not vibes.

Threat detection

Bi-directional data control

MCP server portal

Credentials at the edge

Every service. Every location. No enclave.

Every Cloudflare primitive we work in, with its real use.

Run a Worker. From the closest edge.

Cloudflare field notes

Why We Build AI on Cloudflare: The Mid-Market and Federal Case for a Cloudflare-Native AI Stack

Cloudflare Workers vs AWS Lambda for AI Inference (2026)

Why Edge AI Beats Centralized Inference for User-Facing Features

The Real Cost of Cloudflare Workers vs AWS Lambda for an AI App

AI observability field notes

AI Observability, Cost, and Evaluation on Cloudflare: How AI Gateway Stops You Flying Blind

Cloudflare AI Gateway as Your Observability Layer: Every LLM Request, Logged and Queryable

Controlling AI Model Costs on Cloudflare: The Levers That Actually Reduce Token Spend

Cloudflare Audit Logs for AI: A Tamper-Evident, Compliance-Grade Record of Every AI Request

Honest about today. Transparent about the path.

What we are right now.

What we're earning, not claiming.

What teams ask before they commit to the platform.