Cloudflare · Architecture

Durable Objects are the missing primitive for production agents.

Tony Adams 11 min read March 2026

Most of what gets called “agent infrastructure” in 2026 is plumbing that exists to compensate for the absence of one specific primitive: a stateful, durable, addressable execution context that is cheap enough to run one per agent instance.

If you have built a multi-agent system on Kubernetes or on a generic serverless platform, you have personally written some version of every workaround for the missing primitive. You have built a queue to hand work between agents. You have built a state store to remember what an agent was doing across requests. You have built a scheduler to wake agents up. You have built a routing layer to send a specific request to a specific agent’s state. You have built reconnection logic for WebSocket clients to find their way back to the same agent instance after the load balancer rerolls. You have, in other words, spent a substantial percentage of your engineering budget rebuilding what Durable Objects give you for free.

I have shipped three production agent systems on Durable Objects in the last six months. The pattern has held in all three. The data tier doesn’t exist as a separate concern. The queue doesn’t exist as a separate concern. The scheduler doesn’t exist as a separate concern. The infrastructure stack collapses to the application code, and the application code gets dramatically smaller. This essay is the architectural argument for why that happens, and what it looks like in practice.

The shape of the missing primitive

The agent problem, abstracted, is this: a long-running computational entity with an identity, internal state that persists across calls, the ability to receive events from multiple sources (HTTP requests, WebSocket messages, scheduled triggers, inbound email, other agents), the ability to wake itself up on a schedule, and the ability to go dormant when nothing is happening so you’re not paying to keep it warm.

In actor-model terms, this is just an actor. In object-oriented terms, it’s a long-lived object instance. In operating-system terms, it’s a process with a known PID. None of those abstractions are exotic. The thing that makes them hard at production scale is that you want millions of them, addressed by name, available globally, with state that survives the host machine being rebooted, and you do not want to pay to keep them all warm in memory.

Standard cloud primitives don’t give you this. A pod doesn’t have durable state by default. A function-as-a-service invocation is stateless and amnesiac. A database row has state but no compute. A queue has compute-on-arrival but no addressable identity. You end up gluing these together — a pod with a sidecar reading from a queue, persisting to a database, registered in a service mesh — and the glue is the work that consumes the engineering team.

A Durable Object is a Cloudflare Worker that uniquely combines compute with storage. Each one has a globally unique name. Each one has its own private, embedded SQLite database (up to 10 GB on Workers Paid). Each one is automatically provisioned geographically close to where it is first requested, starts up quickly when needed, and shuts down when idle. They migrate among healthy servers without the application caring. There is no node pool, no cluster, no scheduler to configure, and no warm-pool capacity to reserve. You write a class, give each real-world thing a stable name, and the platform handles everything else.

That is the missing primitive. The reason it matters is that it changes what shape of system one person can build.

What the Agents SDK adds on top

The raw Durable Object primitive is general-purpose. The Cloudflare Agents SDK is an opinionated wrapper for the agent-shaped subset of what Durable Objects can do, and it is where most production agent code should live.

The class hierarchy is worth understanding because it tells you what’s free and what you write. The full hierarchy is DurableObject > Server > Agent > AIChatAgent. DurableObject gives you compute-plus-storage and an addressable identity. Server, from the partyserver package, adds URL routing, named-instance addressing, an onStart lifecycle callback, and WebSocket session management. Agent extends Server with scheduling (one-time, cron, and on-delay), an MCP client for outbound tool calls, AsyncLocalStorage-based context propagation, email handling via Cloudflare Email Routing, and a built-in SQL table for managed schedules. AIChatAgent extends Agent with the streaming chat patterns most production agents need.

If you are building a chat agent, you start with AIChatAgent and you are 80% done before you write a line of business logic. If you are building a workflow agent or a background-processing agent, you start with Agent. If you need fully custom behavior, you go down to Server or directly to DurableObject. The descent into lower layers is rare in practice. The default opinions are correct for most agent workloads, which is what you want from an SDK: be right by default, get out of the way when you need it to.

A specific feature of the Agent class is worth calling out because it solves a problem that has cost me weeks on previous platforms. Durable Objects allow only one alarm at a time per object. The Agent class works around this by managing multiple schedules in a SQL table (cf_agents_schedules) and using a single Durable Object alarm to fire whichever schedule is next. Cron schedules automatically reschedule themselves after execution. One-time schedules delete themselves. You write this.schedule(...) and the entire scheduler is handled. There is no scheduler service to run. There is no scheduler service to monitor. There is no scheduler service to debug at 2 AM.

That pattern — “the platform absorbs the variance, the application code shrinks” — repeats throughout the SDK. WebSocket hibernation is handled. Reconnection routing is handled. State synchronization to connected clients is handled. The application code is what you write because there’s nobody else to write it. Everything else is the platform.

What I’ve shipped on this primitive

Three production systems over the last six months, briefly, because the pattern matters more than the specifics:

Per-deck AI agents for a SaaS product. Each presentation deck in the platform gets its own Durable Object, addressed by deck ID. The DO holds the per-deck conversation state, embeddings index pointers, RAG namespace assignment, and the audience interaction history for that deck. When a presenter starts the deck, the DO wakes. When audience members SMS in questions, the DO routes them through the appropriate retrieval pipeline. When the presenter ends the session, the DO hibernates. The data isolation guarantee is architectural, not policy — a different deck is a different DO with a different SQLite database, full stop. There is no shared multi-tenant database to leak across. The deployment is one Worker and one DO class. There is no separate API service, no Redis, no per-tenant database, no queue. The whole system is the DO and the Workers AI calls it makes.

The chat surface below is an illustration of exactly that shape — one Durable Object, addressed by a deck ID, holding its own SQLite-backed session state. It is a static demo, not a live agent, but the architecture it sketches is the one shipping in production:

DOagent · deck-4471● Live · SQLite-backed

YouWhat state are you holding for this session right now?

// Durable ObjectEverything for session deck-4471 lives in my own SQLite database — the conversation transcript, the RAG namespace pointer, and the audit log of every model call. Nobody else can touch it; a different deck is a different Durable Object.

YouIf I close the tab and come back in three days?

// Durable ObjectI hibernate when idle and wake on reconnect. The SQLite database sits in durable storage the whole time, so the session resumes from exactly where it stopped. Resumability is a property of the architecture, not a feature I had to build.

A recursive coding agent CLI. Each user session is a DO, addressed by session ID. The orchestrator runs in the DO. The scouts are fan-out Workers AI calls that the orchestrator dispatches in parallel. The DO holds the full session transcript, the current task state, the tool authorization tokens, the file modification log, and the rollback points. The user can quit the CLI and resume the session three days later — the DO has been hibernated, comes back online when the CLI reconnects, and the session continues from exactly where it left off, because the SQLite database has been sitting there the entire time. Resumability is not a feature I built. It is a property of the architecture.

A federal-pilot document-processing agent. Each document submitted by an end user becomes a DO, addressed by document ID. The DO holds the parsing state, the OCR results, the per-page redaction decisions, the audit log of every model call made against the document, and the human-review status. The audit log is the deliverable — every inference, with prompt, model, latency, and output, written to the DO’s SQLite database at the moment it happens. When the agency wants the audit trail for any document, they query the DO. The audit trail is a property of the runtime, not a separate compliance system bolted on the side.

Three systems. Same primitive. The infrastructure code I wrote across all three would fit in a hundred lines, generously. The remaining code is application logic.

Why this collapses the architecture stack

Let me make the architecture argument explicitly, because the “we ship faster” claim is downstream of a specific structural property.

In a traditional production agent architecture, the data tier is a separate system. You run a database. You connect to the database from the agent process. You manage the connection pool. You handle the schema migrations. You worry about transaction isolation. You worry about read replicas. You back the database up. You restore it when it breaks. The database is somebody’s job — at scale, it’s a team’s job.

On Durable Objects, the data tier is the agent. Each DO has its own embedded SQLite database, isolated by construction, transactionally consistent with the agent’s own execution, and persisted to durable storage by the platform. You issue a SQL statement against the DO’s own storage.sql handle and you are querying an instance-private database that nobody else can touch. The connection pool doesn’t exist. The migration story is whatever your application does on startup. The transaction isolation is single-writer-per-DO, which is the strongest possible guarantee. The replication is the platform’s problem.

The queue tier collapses similarly. In a traditional system, you have a queue between services so that work can be handed off durably. On Durable Objects, work directed at a specific agent is just an RPC call against that DO’s name. The DO is single-threaded by default, so requests serialize against the DO naturally. You don’t need a queue to coordinate “exactly one worker handling this thing at a time” — the runtime is already that.

The scheduler tier collapses similarly. The Agent class’s built-in schedule management means you write this.schedule('0 9 * * 1', 'sendWeeklyReport') and the scheduling is done. There is no cron service. There is no scheduler control plane. There is no missed-trigger investigation.

The session-affinity tier collapses similarly. WebSocket clients connect by Durable Object name. Reconnection finds the same DO because the addressing is the same. There is no sticky-session configuration. There is no Redis-backed session lookup. There is no “which pod is this user on” question.

What’s left, when those four layers collapse, is the application code. Which is what should have been the whole job all along.

The honest tradeoffs

I would not be writing this if Durable Objects were unconditionally the right answer. They are not. The tradeoffs that matter:

Single-threading per object. Each DO is single-threaded. If your “agent” is actually a high-throughput aggregator that needs to handle a thousand concurrent requests against shared state, the single-DO model is going to bottleneck. The answer is usually to shard the workload across many DOs and have a coordinator (often another DO) route requests to them. If your mental model is “one big database with a hot row,” you need to rethink the data model before you reach for Durable Objects.

Cold-start latency. A DO that has been hibernated takes a beat to wake. For most agent workloads this is fine — agents are not latency-sensitive at the millisecond level. For some workloads (live voice agents, interactive editing) the cold-start can be noticeable, and you architect around it by keeping high-traffic DOs warm with light periodic activity.

Vendor concentration. Durable Objects run on Cloudflare. The primitive does not have a drop-in equivalent on AWS or GCP today, though the actor model exists in various forms (Orleans, Akka, etc.) elsewhere. If your procurement requirements demand multi-cloud, you will need an abstraction layer above the primitive, which gives back some of what the primitive was buying you. For most commercial workloads this is not a real constraint. For some federal workloads it is, and I write to that constraint explicitly when I’m structuring the procurement (portability of the application logic and the data model, even when the runtime is single-vendor).

Operational visibility. “Where is my agent right now and what is it doing” is an operational question that the platform handles for you in normal operation, but when something goes wrong, the debugging story is different from “ssh into the box and look.” You lean on logging, on tail-streamed traces, on the platform’s observability tools. This is mostly a learning curve, not a real limitation, but it is real and worth budgeting time for.

None of those tradeoffs change the architectural claim. They are the cost of admission to the architectural model, and for production agent workloads in my experience the cost is paid back inside the first month of shipping.

The pattern, generalized

If I had to compress the architectural lesson of the last six months into a single sentence, it would be: the right unit of deployment for a production agent is the agent, not the service that hosts the agent.

Every conventional production architecture has you deploying the host — the API service, the worker pool, the database cluster — and then arranging for the agents to live somewhere inside those hosts. That’s backward. The agent is the noun. The host should be invisible. Durable Objects make the host invisible, and that’s why the architecture collapses to the application code.

Three systems shipped on this primitive in six months. The data tier doesn’t exist as a separate concern. The queue doesn’t exist as a separate concern. The scheduler doesn’t exist as a separate concern. What’s left is the work.

This is what production agent infrastructure looks like in 2026. The teams that have figured this out are going to keep getting faster. The teams that are still standing up Kubernetes clusters to run their agents are going to spend the next two years explaining why their AI cost per feature keeps climbing.

The primitive has been there the whole time. Most of the industry just hasn’t picked it up yet.

Tony Adams is the founder of HotCopy and Truvisory®. He builds Cloudflare-native AI systems for federal and commercial clients. Verified SDVOSB and VOSB, SAM.gov-registered.