My AI Stack Will Soon Cost Too Much

I ran a script on my AI usage for the first time since March.

$3,203.89 USD. 67 days. $47 a day.

I stared at that number for a while.

On April 20 I spent $346 in a single day. I don't remember what I was doing. The data shows it was mostly development work — agents running long context windows, heavy generation. Whatever I was building that day, I was probably in the zone.

That's what it costs to run a personal AI agent stack at my current level. And until Anthropic announced they were moving programmatic users to consumption-based pricing, I'd never looked. But this is going to now impact alot of people building on top of Claude.

I was paying $170 a month for a subscription and using it for considerably more than $170 worth of work.

So I dug a little deeper into the numbers to help understand more about how I am using AI agents. 65% of my spend in those 67 days went to Opus — Anthropic's most capable and most expensive model. I'd been defaulting to it for nearly everything.

The reasoning felt sound at the time. My tasks are asynchronous. I drop something into the todo list, an agent picks it up, and I come back to the result later. Whether the response takes 30 seconds or 5 minutes doesn't matter because I'm not watching it happen. So I reached for the best model and stopped thinking about it.

The thing I missed is that not all Opus calls are buying the same thing.

AI models charge differently for reading versus generating. Input tokens — the context the model reads — are cheaper. Output tokens — what the model writes back — are more expensives. And that asymmetry matters a lot depending on the task.

Writing a long-form article like this one or building a HTML page - thousands of output tokens. Running Opus on tasks might be inefficient. A capable cheaper model may produce comparable output as additional reasoning power may not actually move the needle.

Drafting a difficult email is completely different. The output might be 200 words. But getting it right requires genuine reasoning — reading tone, choosing what to say and what to leave out, matching register to relationship. Short output, high thinking cost.

Another data point that stuck with me as I went through it: Haiku usage, $9.45 across the entire 67 days. The cheapest, fastest model barely shows up in my system. It has a place, I just never gave it one.

Which model gets called is decided by the stack above it

Building Sky System has taught me that agentic systems have three distinct layers before you reach the LLM or the model.

At the top is the application — in my case, a human and AI interaction layer centred around the todo list. Tasks enter the system, agents pick them up, and work happens against a task rather than a conversation thread. The application defines what gets done, and who does it (human or agents).

Get notified when the next article drops

Below that is the agent platform: the infrastructure that manages agents, memory, communication, and routing. This is where I've done most of my customisation. Sky System runs persistent agents that accumulate context across sessions and temporary worker agents for one-off tasks. I've built out tool provisioning — each agent gets access only to what it needs. Skills that can be loaded on demand. Instruction files that shape how different agents think and respond.

Two things I've built here took the most thought. The first is ToME — Sky's self-learning model of me. It builds and refines a picture of how I think, what I prioritise, how I communicate. Updated through interaction rather than manual configuration. The second is a Knowledge Wiki — a self-curating layer where Sky maintains and organises concepts, context, and decisions that might matter across different tasks and agents. All of that sits at the platform level.

Below that is the agent runtime — the execution layer that runs each agent's turn, handles tool calls, and manages the context window. Mine is the Claude Agent SDK, largely out of the box. That's the layer I haven't needed to touch yet.

And at the bottom: the LLM model. Opus, by default, for nearly everything.
The model is downstream of all of this. Which highlights that picking the model is only one factor of overall costs.

The pricing shift is changing that calculus. The agent runtime — the layer I haven't touched yet — is now the most consequential decision in the stack. Anthropic is moving programmatic users to consumption-based pricing. The runtime itself starts showing up on the bill.

❝

I've been running the numbers on what that looks like. I'm not confident I can run at $1,500 a month in API costs. But I also know what the last 67 days of productivity felt like. I'd struggle to give that back. So I need to find a way to make the numbers hold.

There are a few options I'm thinking through at the moment.

The most obvious is model routing and a more flexible Agentic runtime layer — right model for the right task. It also allows me to leverage Claude models where appropriate, but possibly much more cost effective models from other providers at the same time.

The second is local models. A lot of my work is asynchronous. Tasks run while I'm doing other things and surface results when I get back to them. If latency doesn't matter, the model probably doesn't need to be fast or expensive either. The quality question is still open.

The third is rethinking how a single task gets worked. Instead of one expensive call producing a finished result, cheaper models could approach the problem from multiple angles — generating, critiquing, improving — before anything costly touches it. More calls, lower unit cost, possibly better output. The economics might invert in interesting ways.

None of these are mutually exclusive. The real answer is probably some combination, and the audit will tell me which levers actually move the number.

The subsidised era is ending. For anyone building on top of these models at real scale, the bill is now a design input.

The audit will tell me how many of those $346 days were worth it.

Get notified when the next article drops

This is Sky System #3. The previous articles: "I'm watching my agent watch me" and "I put AI on the list".

Sky System (Sky) is the bespoke AI multi-agent stack I'm building for myself. Part productivity tool, part live experiment in how humans and AI actually collaborate. This series is the running notebook: what's working, what isn't, what's surprising me.

Join a waitlist for access to Sky System

My AI Stack Will Soon Cost Too Much

Which model gets called is decided by the stack above it

Keep Reading

No Playbook

Home