AI Compute Is a DER, Not a Crisis

The dominant framing of AI's collision with the grid goes something like this: AI compute is exploding, hyperscaler load curves are climbing into territory utility planners didn't sketch a decade ago, and the response is supply-side. Build more generation. Expedite interconnections. Hope the grid catches up. Almost every recent industry headline reads from this script.

It's not wrong. It's just incomplete.

The framing treats AI compute as fixed demand. The premise is that whatever energy the GPUs need to keep doing their space heater trick, the grid has to deliver. From there, every conversation is about supply: how to bring more on, how fast, with what generation mix, at what cost. The grid is the variable; the load is the constant.

There's a different framing available, and it's worth taking seriously: AI compute isn't fixed demand. It's a controllable resource. Treat it as a Distributed Energy Resource, and the architecture conversation changes substantially.

The reframe

DERs in grid vocabulary are generation and storage on the edges of the network — solar PV, batteries, EV charging, demand response, behind-the-meter generation. What unifies them is that they're active, dispatchable participants in grid balancing rather than passive endpoints. They show up in operator screens. They get coordinated. They're part of the system, not external to it.

AI compute fits the same template, with three properties most current discourse skips over:

It's controllable. Workloads can be throttled, batched, deferred, or paused. Not all of them, not all the time, but a meaningful fraction.
It's geographically distributed. Hyperscale fleets span dozens of regions across multiple time zones, climate zones, and utility footprints. The same workload can run in any of them.
It's increasingly co-located with storage. Battery installations on data-center sites are becoming standard for power-quality and resilience. Those batteries are bidirectional-capable.

A grid that has 50 GW of dispatchable AI compute coordinating with 30 GW of on-site battery storage looks structurally different from one trying to feed 50 GW of fixed AI load. The bits in the hyperscaler aren't supply — but they're the next-best thing: a load that can be shaped to match supply, with storage right next to it.

Three controllability vectors

The flexibility isn't generic. AI workloads come in shapes with very different time and place tolerances:

Time-shifting. Batch workloads — training runs, batch inference, evaluation suites, periodic recompilation of large models — are latency-insensitive by design. They can defer to off-peak hours, run during cheap-energy windows, or schedule to consume excess renewable generation that would otherwise be curtailed. This is the lowest-friction lever and the one with the most immediate value to a grid operator.

Geographic-shifting. Latency-tolerant inference — search, content moderation, ranking, recommendation, much of the long-tail of "AI-assisted" web requests — can route across data centers based on current grid conditions. This is uniquely cloud-shaped. Most demand response is "use less or use later"; cloud compute has the additional dimension of "use elsewhere." A request that lands on a constrained region in California can route to one with energy headroom in Oregon or Washington with single-digit-millisecond impact on user experience. The architecture for this exists — it's just optimized today for cost and latency, not grid coordination.

Magnitude-shifting. When neither time nor place is flexible, the workload itself can sometimes flex — running with smaller batch sizes, lower model precision, or temporarily reduced concurrency during grid stress. This is the most invasive lever, but for a hyperscaler with thousands of concurrent jobs, even modest magnitude flexibility aggregates into meaningful capacity.

The bidirectional-storage punchline lands here. When on-site batteries can flow energy back to the grid during peak stress and recharge during off-peak, the hyperscaler stops being a load with a UPS and becomes a coordinated participant — a virtual power plant whose substrate happens to be silicon.

The workload taxonomy

For this reframe to be more than a thoughtful slide, the controllability has to be modeled honestly. Most AI fleets break out something like:

A fraction is real-time inference with latency budgets in tens of milliseconds. Not very flexible.
A larger fraction is batch processing — inference, evaluation, content generation pipelines. Highly flexible.
A meaningful fraction is training, often the largest concentrated power draw. Time-flexible at the day-or-week granularity, but committed once a job is running.

The mix matters. A platform that's mostly real-time inference has limited DER capability. A platform that's mostly training and batch has substantial capability. The composition of the fleet — which is largely a function of the customers being served, not the operator's choice — determines how much flexibility is actually on the table.

What would have to change

For "AI compute as DER" to graduate from reframing to operational reality, several things have to converge:

Provider exposure. Cloud providers have to expose workload-flexibility signals to customers, and customers have to opt into them. The pricing surface has to reflect grid conditions, not just instance types and reserved capacity.
Schedulers that understand grid conditions. Workload orchestration today is optimized for cost, latency, and locality. Grid-aware scheduling is a new dimension that has to land in the dispatcher logic of major orchestration platforms.
DR programs that include compute as a participant class. Existing demand-response programs were built for industrial loads, HVAC, and EV charging. They have to be extended (or duplicated) to recognize compute as a class with its own characteristics.
Coordination infrastructure between hyperscalers and grid operators. The signaling layer between a data center's workload scheduler and a utility's dispatch system doesn't exist at scale. It's not technically hard to build — but it's a coordination problem across organizations with very different operating tempos and incentive structures.

None of these is a research-level problem. They're engineering and policy problems, which makes them tractable but slow.

What this reframe does and doesn't do

What it doesn't do is solve the AI demand crisis. Compute as DER doesn't add generation, doesn't extend transmission, doesn't shorten interconnection queues. The supply-side problem stays a supply-side problem.

What it does do is change the shape of the problem. Instead of one-way — how much generation must we build to meet fixed AI demand? — it becomes two-way: how do we balance a system where compute is itself a controllable participant? That's a meaningfully easier system to operate than one where compute is fixed.

Here's a third through-line in the AI-grid story. The familiar two are AI driving demand higher, and AI getting applied inside grid operations to manage that pressure. This one — what this post argues for — is AI compute as a resource the grid can dispatch, not just a load it has to feed. The grid isn't just powering AI. It can also put it to work.

The reframe#

Three controllability vectors#

The workload taxonomy#

What would have to change#

What this reframe does and doesn't do#

The reframe

Three controllability vectors

The workload taxonomy

What would have to change

What this reframe does and doesn't do