Cloudflare-Native

Cloudflare AI Gateway as Your Observability Layer: Every LLM Request, Logged and Queryable

Tony Adams9 min readJune 2026

A call to a model provider is, by default, a black box. You send a request, you get a response, and unless you built your own instrumentation you have no idea what prompt actually went out, how many tokens it burned, how long it took, whether it errored, or how often you’re paying to ask the same question twice. Cloudflare AI Gateway turns that black box into a complete operational view — every request’s prompt and response, its token counts, its latency, an estimated cost, its status, and whether it hit cache — surfaced in a dashboard, stored as queryable per-request logs, and exportable as OTel traces, all from a one-line change to your SDK’s base URL. It’s a genuinely strong observability layer for the “see it” job. It also has real edges: it observes your LLM API calls specifically, not your whole application; its cost figure is an estimate; and its logging has a storage ceiling with a silent failure mode you need to design around.

This is the foundational spoke of our AI observability, cost, and evaluation cluster — the “see it” layer that the cost, evaluation, and routing pieces all build on.

// THE SHORT VERSION

Point your SDK’s base URL at AI Gateway and observability is on by default: a dashboard, per-request logs, custom-metadata segmentation, a queryable analytics API, and OpenTelemetry trace export — across more than twenty model providers, normalized into one schema.
The dashboard shows requests, token usage, an estimated cost, errors, and cache-hit rate. Latency lives one level down, in the per-request logs, alongside the full prompt and response.
The single most important operational gotcha: each gateway stores a finite number of logs, and when it fills, new logs silently stop saving — and stop exporting — until you delete old ones or turn on automatic deletion. Configure around it or you’ll go dark without noticing.
Know the boundaries. The cost number is an estimate, not your bill; seeing requests is not the same as knowing they’re good; and this is an LLM-traffic tool, not a full application-monitoring or security layer.

It’s genuinely one line

AI Gateway is a proxy that sits between your application and your model providers, and you turn it on by changing where your existing SDK points — from the provider’s endpoint to the gateway’s. It’s been generally available (GA) since Developer Week in 2024. After that one change, every request flows through a control plane that records it, with no further code; logging and analytics are on by default for each gateway. You can route through a provider-specific endpoint, through an OpenAI-compatible endpoint, or through the newer REST API on Cloudflare’s own API host — and it normalizes more than twenty providers, so OpenAI, Anthropic, Google, Workers AI, and the rest land in one dashboard with one log shape.

What the dashboard shows you

The headline view tracks five things, filterable by time and broken down by model and provider: total requests, token usage, an estimated cost, errors, and the percentage of responses served from cache. That’s the at-a-glance health of your AI traffic — enough to spot a spike in errors, a runaway in token volume, or a cache that isn’t earning its keep.

One thing worth knowing up front, because it trips people up: latency is not one of those headline metrics. Request duration is captured per request, down in the logs, where it’s also a filter. So the dashboard answers “how much, how often, how healthy,” and the logs answer “what exactly happened on this call.” For programmatic access — custom dashboards, alerting — there’s a GraphQL analytics API over the same dataset that powers the dashboard, queryable by model, provider, gateway, and time. One honest gap: Cloudflare does not publish an analytics-retention window, so I won’t invent one — the logs, by contrast, are capped by count rather than by time, which the limits section gets to.

The logs are the real asset

Each request is logged with its user prompt, the model’s response, the provider and model, a timestamp, the request status, token usage, the cost estimate, and the duration. In the dashboard you can filter that set by status, cache hit, provider, model, cost thresholds, token counts, duration, feedback, and — usefully — by custom metadata. Two headers give you control over what’s stored: cf-aig-collect-log toggles logging for a single request (set it false to drop the entry entirely, or send it to capture one log when gateway logging is otherwise off), and cf-aig-collect-log-payload controls only whether the raw request and response bodies are stored — set it false and you still keep the metadata (tokens, model, provider, status, cost, duration) without retaining the prompt and completion text. That second header matters for teams who want operational metrics without persisting sensitive payloads.

Every request also comes back with a log ID (cf-aig-log-id), and that ID is the thread that ties observability to evaluation: you attach a score or human feedback to a specific call by its log ID. That bridge — turning a logged request into an evaluated one — is the subject of the evals spoke, so I’ll point there rather than build it here.

Custom metadata is how you segment

By default the logs tell you what happened across all your traffic. To answer whose traffic, or which feature’s, you tag requests with custom metadata via the cf-aig-metadata header — a small JSON object of string, number, or boolean values, up to five entries per request. Stamp each request with a user ID, a team, an environment, or a feature flag, and you can then filter and analyze your logs by those dimensions: cost per customer, error rate by feature, token volume by environment. It’s the difference between “our AI spend is up” and “our AI spend is up because this one feature for this one segment regressed,” and it’s the lever most teams underuse.

The request lifecycle, and tracing it

For teams already running distributed tracing, AI Gateway exports trace spans for AI requests in OpenTelemetry format, following the GenAI semantic conventions — model, provider, input and output tokens, the prompt and completion, and cost, plus your custom metadata as span attributes. You can propagate trace context so the gateway’s spans nest inside your application’s existing traces, giving you one connected picture from your app down through the model call.

The one limitation to flag plainly: the export is OTLP over JSON only. Backends that require OTLP protobuf — Datadog is the notable one — won’t work with it; Honeycomb, Braintrust, and Langfuse are documented as compatible. If your observability stack is protobuf-only, that’s a real constraint to know before you plan around it.

Composing with native Workers observability

AI Gateway observes your model traffic; it does not observe your application code. The complete picture pairs it with Cloudflare’s native platform tools. Workers Logs captures your Worker’s own logs and system events. Workers Tracing instruments your code’s I/O and subrequests in OpenTelemetry — worth noting that, as of March 1, 2026, tracing spans bill as observability events under the same quota and pricing as Workers Logs, so it’s no longer free at volume. And Analytics Engine lets you write your own custom metrics for the business-level dimensions neither the gateway nor the Worker captures automatically. The division of labor is clean: the gateway sees the LLM calls, Workers observability sees the code around them, and together they give you end-to-end visibility that neither provides alone.

Getting logs out

When you need logs in your own systems — for retention beyond the gateway, for your SIEM, or for an audit archive — Logpush exports them to external object storage. It’s a Workers Paid feature, includes ten million log events a month with a small per-million charge beyond that, and allows up to four export jobs with a one-megabyte-per-log cap; the exported logs are encrypted with a public key you supply. That export path is also the bridge to a compliance-grade audit trail — pairing it with immutable, tamper-evident storage — but that’s a deliberate design with its own requirements, and it lives in the Cloudflare audit logs spoke rather than here. One interaction to carry forward, though, connects directly to the limits below: when a gateway hits its log ceiling, logs stop exporting through Logpush too.

The limits, and the one that will bite you

This is the part that earns trust, so here it is without softening. Each gateway stores up to ten million logs on the paid plan (a hundred thousand per account on the free plan), and the behavior when you hit that ceiling is the gotcha: in Cloudflare’s own words, “new logs will stop being saved.” No error in your application, no failed request — the calls keep working, they just stop being recorded, and they stop exporting via Logpush. You can go operationally blind and not find out until you go looking.

10M logs

per-gateway storage ceiling on the paid plan (100K per account on free) — when it fills, new logs silently stop saving and stop exporting via Logpush— Cloudflare AI Gateway limits docs

The mitigation is built in — automatic log deletion drops the oldest logs once you hit the limit, keeping a rolling window — but it’s off until you turn it on, so turning it on (or setting a per-gateway storage limit) is step one of running this in production.

The rest of the limits are more mundane but worth a glance: a free tier that holds far fewer logs before you must delete or upgrade, a ten-megabyte cap per log, a rate limit of five hundred logs per second per gateway, and a cap on gateways per account (ten free, twenty paid). None are show-stoppers; all are worth knowing before you architect around them.

And the conceptual boundaries, stated so you don’t mistake this tool for one it isn’t:

The cost number is an estimate. It’s computed from token counts, and Cloudflare is explicit that you should treat it as an estimate and reconcile against your provider’s dashboard for the real figure. It’s excellent for spotting trends and runaway spend; it is not your invoice. Turning estimates into actual cost control — caching, routing, model selection — is the cost spoke’s job, and for the separate question of infrastructure cost (the Workers-versus-Lambda billing math), see our existing cost comparison.
Observability is not evaluation. This is the one that matters most. Seeing every request — its tokens, its latency, its errors — tells you what happened, not whether the answer was any good. Quality is a different discipline with different tooling, and skipping it is how regressions ship silently. The evals spoke covers it.
It adds a proxy hop. Routing through the gateway adds latency — third-party integration docs put it in the ten-to-fifty-millisecond range, often partly offset by cache hits — but it’s real, and latency engineering is the caching-and-latency spoke’s subject.
It’s LLM-scoped, not a full APM or security tool. It observes model API calls, not your entire application or infrastructure, and it is not a security or egress firewall — it doesn’t police what your code reaches out to. Pair it with Workers observability for the app layer, and treat securing AI as its own subject.
Several adjacent features are beta, and consolidating observability, routing, and caching on one provider is convenient while also concentrating a dependency. Both are worth weighing honestly.

The per-provider and per-model breakdowns the dashboard gives you are also the raw material for routing decisions — but the mechanics of failover and multi-provider routing are the routing spoke’s territory; here they’re just another thing you can see.

Concrete patterns

The integration itself is the base-URL swap — point the OpenAI SDK at https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/openai (or use the REST API on Cloudflare’s API host for new builds) and logging begins. To segment, add a metadata header — cf-aig-metadata: {"team":"support","user":12345,"env":"prod"} — and filter your logs by those keys. To keep operational metrics without storing sensitive text, send cf-aig-collect-log-payload: false. To pull custom metrics programmatically, query the aiGatewayRequestsAdaptiveGroups GraphQL dataset by model, provider, and time window. And to connect a logged request to a quality score, patch it by its cf-aig-log-id — the handoff to evaluation. Keep the metadata deliberate and the payload-collection setting matched to your data-retention posture, and you have a production-grade operational view in an afternoon.

Frequently asked

What does AI Gateway actually show me?

Per request: the prompt, the response, the model and provider, status, token counts, an estimated cost, and duration — in a dashboard with requests, tokens, cost, errors, and cache-hit rate up top, and full detail in queryable logs. It's the operational view of your model traffic.

Is latency in the dashboard?

No — that's the common surprise. The five headline metrics are requests, tokens, cost, errors, and cache-hit rate. Request duration is captured per request in the logs, where it's also a filter.

What happens when I hit the log limit?

New logs silently stop saving, and stop exporting via Logpush, while your application keeps working normally — so you can go blind without an error. Turn on automatic log deletion (or set a storage limit) to keep a rolling window. This is the one limit to configure before you ship.

Can I keep metrics without storing the actual prompts and responses?

Yes. Send cf-aig-collect-log-payload: false and the gateway logs the metadata — tokens, model, provider, status, cost, duration — without retaining the request and response bodies. Useful when you want operational visibility but not payload retention.

Does this replace my application monitoring?

No. AI Gateway observes your LLM API calls; it doesn't see your application code or infrastructure, and it isn't a security tool. Pair it with Cloudflare's native Workers observability for the full picture.

Working with Truvisory

If you’ve shipped AI and you can’t see what it’s doing, see how we build instrumented, observable AI systems on Cloudflare — with the logging configured correctly from the first deploy, including the limit that quietly bites teams who don’t.

Truvisory is a Denver-based AI and automation consultancy run by a senior operator — a combat veteran and former PE-backed operating executive — who ships working software, not strategy decks. Cloudflare-native by default, for both AI delivery and the back-office automation where the ROI lives.

Tony Adams is the founder of Truvisory®. He builds Cloudflare-native AI systems for federal and commercial clients. SBA-verified SDVOSB and VOSB, SAM.gov-registered.

Cloudflare AI Gateway as Your Observability Layer: Every LLM Request, Logged and Queryable

It’s genuinely one line

What the dashboard shows you

The logs are the real asset

Custom metadata is how you segment

The request lifecycle, and tracing it

Composing with native Workers observability

Getting logs out

The limits, and the one that will bite you

Concrete patterns

Frequently asked

Working with Truvisory

More in this series

AI Observability, Cost, and Evaluation on Cloudflare: How AI Gateway Stops You Flying Blind

Cloudflare Audit Logs for AI: A Tamper-Evident, Compliance-Grade Record of Every AI Request

Latency Engineering for AI on Cloudflare: Cache the Hot Path, Stream the Rest, Route to Faster Models

Controlling AI Model Costs on Cloudflare: The Levers That Actually Reduce Token Spend

AI Evals on Cloudflare: How to Measure Whether Your AI Is Actually Good

Multi-Provider AI Routing on Cloudflare: Fallback, Retries, and BYOK That Keep Your App Up

One email a month. Not a vendor blog.