Skip to main content
Truvisory
§ Cloudflare-Native

Cloudflare-native AI, by an engineer who lives on the platform.

We design, build, and ship production AI systems on Cloudflare's edge — Workers, Workers AI, Agents SDK, Durable Objects, Vectorize, AI Gateway, R2, D1, Queues, Containers — at sub-50ms latency in 330+ cities.

~50ms p50
to 95% of users
330+
cities
~20%
of the web
5%
avg GPU util · industry
§ 01 / Why Cloudflare for AI in 2026

Three reasons the math works.

REASON / 01

Pay-per-inference, not pay-per-idle-GPU.

Per Cast AI's 2026 State of Kubernetes Optimization Report, average GPU utilization is just 5% across 23,000 production clusters. "At 5% utilization, the math doesn't work." Workers AI bills for actual inference — not reserved capacity.

5% industry GPU util
REASON / 02

Edge by default.

Data centers in 330+ cities deliver ~50ms latency to about 95% of internet users — on the same infrastructure proxying ~20% of the web. Region-pin for sovereignty when federal mission requires it.

330+ cities · sub-50ms
REASON / 03

State and compute in one primitive.

Durable Objects give each agent its own SQL database, persistent memory, and hibernation — no separate data tier, no cold start, no glue code. Multi-agent systems become a primitive, not an architecture project.

primitive · agent + state
§ 02 / Reference architectures

Four patterns we ship, end to end.

Pattern · ARAG

Retrieval-Augmented Generation over private data

Hybrid semantic+keyword retrieval, AI-Gateway-fronted models, R2-backed source corpus, audit trail end to end.

// Use case: agency knowledge bases, contract repositories, customer support corpora
WorkerEdge entry
VectorizeHybrid search
AI GatewayGuardrails
Workers AIInference
Backed by R2 corpus D1 metadata Audit log
Pattern · BAGENT

Stateful agent with tool use

Each agent is a Durable Object with its own SQL state and lifecycle. MCP tool surface for clean external integrations. Hibernates when idle. The full agents & MCP playbook →

// Use case: customer agents, internal assistants, long-running automations
Agents SDKLifecycle
Durable ObjectSQL · Memory
MCPTool surface
Calls out to Workers AI OpenAI / Anthropic / Gemini External tools
Pattern · CRLM

Recursive multi-agent orchestrator

Kimi k2.6 (1T MoE) as the orchestrator; small Gemma scout workers run decomposed sub-tasks. Derived from MIT CSAIL's Recursive Language Models work (arXiv:2512.24601). This is the HotCopy architecture.

// Use case: code transformation, deep document understanding, complex multi-step research
Kimi k2.6Orchestrator · 1T MoE
Spawns Scout · Gemma Scout · Gemma Scout · Gemma +N
Substrate Workers AI Durable Objects Queues
Pattern · DFED

Federal-friendly edge deployment

Same Cloudflare primitives, hardened: AI Gateway guardrails, region-pinning for data sovereignty, immutable audit logging, role-based access, FedRAMP-aware deployment patterns.

// Use case: agency RAG, intra-agency assistants, OMB AI Action Plan rollouts
Zero TrustAuth
WorkerRegion-pinned
AI GatewayGuardrails · log
Workers AIInference
Audit + sovereignty Immutable log → R2 Region-pin · US Role-based access CMMC posture
§ 03 / Data control

Sovereignty isn't a checkbox. Here's the mechanism.

“Region-pinning” is three concrete controls, not a slogan:

Regional Services

Pins where High and Moderate impact data is processed, without losing edge performance.

Metadata Boundary

Keeps all government data inside the defined FedRAMP region.

FIPS-validated encryption

Every hop between edge and core, always.

We compose these into the deployment so CUI stays where your authorization says it stays.

§ 04 / Post-quantum

Quantum-safe by default — no config, no penalty.

Every site and API we serve through Cloudflare is protected against “harvest now, decrypt later” with TLS 1.3 + ML-KEM — automatically, no configuration changes. Post-quantum protection extends to Zero Trust access and the Secure Web Gateway, so encrypted traffic stays inspectable as you migrate. Built ahead of NIST's 2030–2035 deprecation deadlines.

§ 05 / AI security architecture

Guardrails, not vibes.

The AI Gateway is the control plane between your application and any model provider:

Threat detection

Prompt injection, model poisoning, and excessive-use abuse, caught at the proxy.

Bi-directional data control

What the user submits and what the model returns, both inspected and redacted.

MCP server portal

Accessible servers behind one URL, with OAuth authorization and least-privilege access per agent.

Credentials at the edge

API keys and secrets never touch the client; rotation stays simple.

RAG helps the model know more. MCP helps it do more. The Gateway makes both safe.

§ 06 / FedRAMP architecture

Every service. Every location. No enclave.

Some vendors carve out a special FedRAMP enclave and make you wait for capabilities to land in it. Cloudflare runs the same software in every data center, including the FedRAMP processing locations — so the authorized Cloudflare for Government environment carries nearly the entire platform, not a stripped-down subset. For federal engagements we provision there; commercial engagements run on the standard network. Either way: no rip-and-replace, no capability lag.

§ 07 / Capability matrix

Every Cloudflare primitive we work in, with its real use.

Workers
// edge HTTP entry · routing · middleware
Workers AI
// pay-per-inference LLM & embedding
Agents SDK
// agent lifecycle & orchestration
Durable Objects
// per-agent SQL state · hibernation
Vectorize
// vector search · hybrid retrieval
AI Gateway
// guardrails · cost · cache · obs
AI Search
// semantic search-as-a-service
R2
// object storage · zero egress
D1
// edge SQLite · serverless DB
KV
// global config · feature flags
Queues
// durable async work
Containers
// long-running side workloads
Hyperdrive
// accelerated legacy DB access
Browser Rendering
// headless browser · scraping
Workflows
// durable multi-step pipelines
Sandbox
// isolated code execution
§ 08 / Live proof

Run a Worker. From the closest edge.

This page hits a Truvisory®-deployed Cloudflare Worker on first paint. The latency you see is your latency to the closest of 330+ Cloudflare data centers — typically the same region you live in.

  • Edge ping with location, RTT, and colo
  • Live JSON from a Workers AI inference call
  • Source on GitHub — verbatim, deployable in 90 seconds
truvisory-edge.workers.dev ● live
// 1. Ping the closest Cloudflare edge
curl https://edge.truvisory.com/whoami
{
  "colo": "DEN",
  "city": "Denver, CO, US",
  "rtt_ms": 38,
  "region": "WNAM",
  "ts": "2026-05-05T14:22:11Z"
}
// 2. Inference on Workers AI · Llama 3.3
POST /infer {"q":"summarize CMMC L2 in one line"}
"CMMC L2 = 110 NIST 800-171 controls, third-party assessed for CUI."
// model: @cf/meta/llama-3.3-70b · 41ms p50 · cached: false
§ 09 / Partner posture

Honest about today. Transparent about the path.

Today
Cloudflare-Native Engineer

What we are right now.

Daily-driver builder on Workers, Workers AI, Durable Objects, Vectorize, AI Gateway, R2, D1, Agents SDK, MCP. Production deployments on file (HotCopy, PresEngage). The engineer-on-the-platform claim is true and defensible.

Pursuing
▣ Cloudflare ASDP · Application Services

What we're earning, not claiming.

Cloudflare's ASDP designation requires rigorous technical validation of security, performance, and reliability. We're in the process — and we won't surface a partner badge on the site that hasn't been earned. When it lands, you'll see it.

§ 10 / FAQ

What teams ask before they commit to the platform.

Why build AI on Cloudflare instead of AWS or Azure?

For inference and application workloads — not model training — one integrated edge platform beats stitching together a dozen hyperscaler services. You get pay-per-inference instead of idle-GPU billing, sub-50ms latency in 330+ cities, and state plus compute in one primitive. The math works where 5% average GPU utilization makes reserved capacity a losing bet.

Is Cloudflare lock-in a risk?

Less than the hyperscaler alternative. Workers are standards-based, R2 is S3-compatible, and model routing is provider-agnostic. The real asterisk is Cloudflare-specific bindings — Durable Objects, D1, Vectorize — which would be a port, not a copy, if you ever leave. We keep your application logic and data model portable and say so up front.

What does Cloudflare-native cost versus a Lambda-style stack?

Structurally less for inference and application workloads: zero egress fees, no idle-GPU reservations, and one comprehensible bill instead of charges spread across compute, storage, data transfer, and a separate vector database. For sustained GPU training it's the wrong tool — use a hyperscaler with reserved capacity there.

Do you migrate existing workloads onto Cloudflare?

Yes, incrementally. We typically front your current system with Workers and Hyperdrive to accelerate legacy database access, move the highest-value paths to the edge, and leave the rest in place until it earns the move. No rip-and-replace, and the code lands in your repo at each step.

What parts of the stack do you actually use?

Daily drivers are Workers, Workers AI, Agents SDK, Durable Objects, Vectorize, AI Gateway, R2, D1, and Queues — with Hyperdrive, Containers, and Workflows where the workload calls for them. The §07 capability matrix lists every primitive we work in with its real use; we pick the smallest set that ships your system.