The Harness Is the Moat: AI's emerging moat
If Frontier models are commoditizing, see why the buying decision moved to the runtime, the harness, and routing, and who captures the value.
The features buyers are evaluating have shifted from the model to the runtime: the harness, routing layer, eval system, and distribution. The weights are getting cheaper and more interchangeable by the month. Everything wrapped around them is where the moats are emerging and where value is captured.
That’s the throughline tying together a busy week: Microsoft shipping seven of its own frontier models and naming a rival framework more than two dozen times, Anthropic winding down the all-you-can-eat token era, Salesforce posting triple-digit growth on an agent product, and a 95-minute film that cost half a million dollars to generate. Different stories, same lesson. The model is the electricity. Nobody gets rich selling electricity.
1. Microsoft’s new AI strategy isn’t OpenAI
At Build, Microsoft did two things at once. It unveiled seven in-house models under the MAI family, including its first homegrown reasoning model, and it built its always-on agent, Scout, on top of OpenClaw, the open framework Satya Nadella had called a “virus” three months earlier. Read the keynote and the subtext is loud: Nadella wants every company to graduate from consuming frontier models to building with them, and Microsoft is going first. The pitch is “your own model, in your environment, on your data, under your control.”
What’s interesting to us is who got name-checked and who didn’t. The open framework came up constantly. The longtime partner barely registered. When the most strategic company in software starts treating your model as one swappable input among several, that tells you where it thinks the durable value sits. Hint: not in the weights.
This is the part people keep missing. We’re now in phase three. Phase one was the model. Phase two was the harness, code generation being the first real one. Phase three is distribution: how fast you get an audience and scale, and you have to do it while you’re still building the product, not after. Product-led growth still matters. It just isn’t the moat anymore.
2. Anthropic ends the all-you-can-eat token buffet
The Golden Corral era is closing. Anthropic is moving its headless and agent-SDK usage off the bundled plans and onto separate, metered, API-rate credit pools. If you’ve been running anything autonomous through a third-party harness or the agent SDK and assuming it draws from your flat subscription, that assumption just expired.
Here’s the thing nobody wants to say out loud: the subsidies were never as generous as the internet claimed, and they’re getting thinner. There’s no magic margin to give away. GPUs are expensive, the labs are lighting money on fire, and a flat-fee buffet only works until enough people eat. Interactive use self-limits because a human is sitting there watching. Unattended agent jobs do not, which is exactly why they got their own meter.
We’ve watched this movie before. It’s cloud, beat for beat. The buzz, the “you can do anything” phase, then the bill arrives and suddenly you’re standing up a FinOps org, buying savings plans, hunting zombie VMs, and feeding everything into a cost dashboard. The technology isn’t bad. It just needs a whole second layer of systems and guardrails to run it sanely. We’re about to get the AI version of Cost Explorer, and somebody is going to build a very good business selling it.
3. The token-maxxing reckoning is here
For months the feeds have been wall-to-wall with “how to spend fewer tokens.” That conversation just stopped being theoretical. When Uber blew through its entire 2026 AI budget in four months, the response was a $1,500 monthly cap per engineer, per tool, tracked separately for Claude Code and Cursor. That’s not a ban. It’s budget governance arriving the way it always does, late and all at once.
We think the caps are healthy. You always gated your expensive infrastructure. You’d never hand an intern unlimited access to your Spark clusters to run Hadoop jobs overnight, because the overruns were real. AI got a temporary FOMO exemption from that discipline, and it was never going to last.
The fix isn’t austerity, it’s routing. Stop sending every request to the frontier. Set your default to a mid-tier model, send maybe 20% to the top, and use deterministic workflows where a workflow will do. If you’re asking how to do something in Linux, that’s a Google search, not an Opus call. The emerging pattern we keep hearing about: a cheaper open-source worker model does the bulk, then calls a frontier model as an advisor or reviewer on a tool call, rather than letting the expensive model run unsupervised. Harvey runs open-source workers and calls a top model in as a tool. Replit and Lovable, both brutally cost-sensitive, lean on the same trick: generate with a cheaper model, then point a second model at the output to check and fix it.
But here’s our worry, and it’s the cloud parallel again. The routers have intelligence, sure. How do you know the router is optimizing for your business and not just for cost? To trust it you need an eval system, people who understand the eval system, and people who understand what good output even looks like so they can confirm the evals are tuned. Congratulations: we just reinvented model drift from traditional ML inference. We’re stuck in the same loop, one layer up.
4. Salesforce proves the headless agent business is real
Agentforce crossed $1.2 billion in ARR, up 205% year over year, the first time it cleared a billion. Stack that against the combined ARR of basically every AI-native startup and it’s a genuinely impressive number from a company whose core is growing single digits. Salesforce is pivoting hard into agents, and Service Cloud, Zendesk, HubSpot, and everyone else are right behind it.
So here’s the question that matters for GTM: are these agent-focused, headless platforms going to grow single digits, double digits, or triple digits? Salesforce just answered with triple. The future is headless. You’ll still have a UI and seats for humans, but you now have to assume agents are coming through the API and through MCP, in volume.
Now sit with the scoping conversation that creates. How many agents do you have? How many need access to Salesforce and your database? What’s the concurrent-call load? Ask most orgs today and you’ll get a blank stare, because the systems probably aren’t even throwing off the telemetry you’d need to answer. We talk endlessly about observability, and we still don’t know if these stacks emit the signal required to make an informed decision. We are not there yet. That gap is a business waiting to be built.
5. A new 100% AI film is a $400,000 argument for the harness
A 95-minute fully AI-generated film called Hell Grind reportedly cost about $500,000 to make, of which roughly $400,000 was compute. (It was shown at an industry event in Cannes, not the official festival program, despite some breathless early coverage.) The number we can’t stop thinking about isn’t the budget. It’s that the average prompt ran around 3,000 words, and the team needed real cinematography knowledge to write them. Shot composition, sequencing, why you can’t run two close-ups back to back.
That is the whole thesis in one production. The startup behind it doesn’t even make the video model; it uses Google’s Veo 3 and builds the layer that keeps lighting and characters consistent across a feature. The model was the commodity input. The harness, plus the human who knew what to ask for, was the product. You can’t hand a toddler Claude Code and get a business, and you can’t hand us a camera rig and get a blockbuster. The orchestrator, the subject-matter expert, is still load-bearing.
If AI is a utility, how fun is selling electricity?
Here’s the comparison everyone keeps circling back to. LLMs are a general-purpose technology, like electricity. Fundamentally transformative, genuinely life-changing, but how many of us dream of quitting our jobs to go sell electricity? It isn’t fun, and it isn’t differentiated. So the question for every operator this year is: are you selling electricity, or are you the electrician who wires the building, the one who brings domain knowledge to solve a specific problem in a way that’s never been done before?
Because the macro hasn’t moved to save you. We did not get twice as many buyers or twice the budget this year. GDP grows around 2%, which means next year’s pie is roughly 2% bigger, full stop. So you either take revenue from someone else, find real productivity, or figure out how to stand out from the noise. The building of software got cheap. Distribution didn’t. The harness is the moat, and the value goes to whoever owns the runtime and knows what to do with it.
Keep your head down and keep building. There are still real, unsexy problems to solve in this supposedly pre-AGI world, and the people who solve them are the ones who’ll capture the value. We’ll see you next week.
Fringe Lines covers the business layer of the AI industry for builders and GTM operators. Reply and tell us where you think the value is getting captured.




