Intent Doesn't Parse

Apr 07, 2026

Stripe’s AI agents ship 1,300 pull requests every week. The secret has nothing to do with AI.

Six years ago, Stripe’s developer productivity team wrote comprehensive docs, blessed paths for common tasks, and robust CI/CD. When AI agents arrived, they followed the same documentation that human engineers use. No parallel infrastructure. No special prompts.

Steve Kaliski, the engineer behind Stripe’s Minions system, explains this in Lenny Rachitsky’s “How I AI” series like it’s the most boring observation in the world. Of course the agents use the same docs. What else would they use?

That “of course” is doing a lot of work. Because most organizations treat agent infrastructure as a separate problem from developer experience. Separate teams. Separate budgets. Separate docs. Stripe’s insight is that the separation is the problem.

The new hire who never leaves onboarding

The reason Stripe’s docs work for agents isn’t that they’re “good.” Plenty of companies have good docs. The reason is specific: Stripe’s docs were written for someone who knows nothing.

Every blessed path assumes the reader just joined. Here’s the repo. Here’s how you add an API field. Here’s what CI checks before your code ships. The docs don’t assume you were in the room when the architecture was decided. They don’t assume you know which Slack channel has the context. They externalize what most teams leave implicit.

AI agents are the extreme version of that new hire. Zero institutional memory. Zero hallway conversations. Zero sense of “we tried that in Q3 and it didn’t work.” An agent reads what you wrote down and nothing else.

Stripe’s docs passed the agent test because they’d already been passing the new-hire test for years. The investment that mattered most was made before agents existed in their workflow.

One detail from the same conversation that deserves its own paragraph. Every one of those 1,300 PRs still requires human review. The bottleneck didn’t vanish. It moved. Code production is cheap now. Judgment is the constraint. Good DX didn’t eliminate expensive work; it relocated it.

Where intent disappears

I build MCP servers. Each one exposes tools that AI agents discover and call. The agent never sees the Go code, the handler logic, or the tests. It sees a name, a description, and an input schema. That’s the entire relationship.

Here’s a tool description from my Miro MCP server:

Create a connector line between two items. Styles: straight, elbowed (default), curved. Caps: none, arrow, stealth, diamond, filled_diamond, oval, filled_oval, triangle, filled_triangle.

It used to be better. The original version had USE WHEN clauses, parameter guidance, the works. Then I trimmed it for token efficiency. Shorter descriptions, smaller payloads, faster tool loading. A reasonable trade-off, except I never measured what I lost. The knowledge that made the tool selectable disappeared into a git diff, and I didn’t notice because I still had it in my head.

When Claude encounters this description, it has a decision to make: connector or shape? The description doesn’t help. Both “draw a line” and “connect two boxes” are reasonable user requests. Both could plausibly map to either tool. So the agent guesses.

On my test board, there are three shapes in a row: “Start Test”, “Pass?”, and “Success.” I asked Claude to draw a line from Start Test to Success. The connector description says nothing about when to use it. The shape description, which already had USE WHEN clauses, explicitly mentioned “draw a...” requests. Claude picked miro_create_shape and placed a long, thin rectangle between the two boxes. Visually, it looked like a line. In the data model, it was a rectangle. Not a connector. Not part of the diagram’s logical structure. The board looked right. The data was wrong.

I blamed the model. The blame was misplaced. The description was technically correct and operationally useless.

Murphy Trueman, writing about design systems, identified the same pattern in a different domain. Designers create buttons in Figma and see buttons. The design system sees structured data. The designer’s intent is embedded in layers of implicit structure that only become visible when someone who wasn’t in the room tries to consume them.

MCP tools have the same hidden architecture. The server author sees a function call. The agent sees a data object: name as semantic label, description as intent signal, input schema as parameter contract, required-versus-optional as constraint boundary. Yes, the JSON schema contributes too; it constrains what’s valid. But the description determines whether the agent shows up at all. Selection happens before validation. The code behind the description might as well not exist.

The gap between what you know your tool does and what the description communicates to a consumer with zero context; that’s where agent failures live. I call it “intent doesn’t parse.” Your intent is real. It just didn’t survive the journey from your head to the description string.

Trueman has a D&D analogy for this: a Dungeon Master who shuts down player actions because “it’s not in the book” doesn’t just fail that session. Players learn to stop attempting anything creative. The documented world shrinks to the documented edges. Vague tool descriptions train agents the same way. Over a session, the agent learns which tools produce predictable results and avoids the rest. Ambiguity doesn’t just cause errors. It narrows the agent’s world.

What the agent reads when it gets it right

A different tool on the same server:

Create a shape on a Miro board.
USE WHEN: User says “add a rectangle”, “draw a circle”, “create a box for X”
SHAPE TYPES:
Basic: rectangle, round_rectangle, circle, triangle, rhombus
Flow: parallelogram, trapezoid, pentagon, hexagon, star
Flowchart: flow_chart_predefined_process, wedge_round_rectangle_callout
PARAMETERS:
board_id: Required. Get from list_boards or find_board
shape: Shape type (required, default: rectangle)
content: Text inside shape
color: Fill color (e.g., “#FF5733” or color name)
x, y: Position (default: 0, 0)
width, height: Size (default: 200, 200)
RETURNS: Item ID, shape type, position, size, and view link.
RELATED: For flowchart-specific stencil shapes (experimental API), use miro_create_flowchart_shape instead.

These two tools sit in the same file, definitions.go, thirty-eight lines apart. The connector used to look like this too. I trimmed it for token efficiency and improved other tools in the same month without going back to restore it. Two weeks between commits. The gap between “I optimized this” and “I should fix that” is where selection debt accumulates.

The difference isn’t length. It’s explicitness. USE WHEN tells the agent which user utterances should route here. SHAPE TYPES tells it what’s legal. PARAMETERS tells it what to bring and where to find prerequisites. RELATED tells it where to go instead.

The shape description passes the new-hire test. Someone who has never seen the codebase can read it and act correctly. The connector description doesn’t. Both are mine.

The realization that changed how I write tool descriptions: I’m not documenting a function. I’m writing onboarding material for a colleague who will never ask a follow-up question. Every implicit assumption is a place where that colleague will guess. And guessing compounds. When you have dozens of tools across several servers, each ambiguous description makes every other tool slightly harder to choose correctly. That compound tax is what I’ve started calling selection debt: the accumulated cost of descriptions that are accurate but opaque.

Write for the reader who can’t ask

Stripe didn’t set out to build agent infrastructure. They set out to make their platform legible to someone who just walked in the door.

I didn’t set out to build agent-friendly tool descriptions. I set out to stop Claude from picking the wrong tool.

Same mechanism. Same fix. Write for the reader who has no context, and you accidentally write for every future consumer, including the ones that don’t exist yet.

Good DX = Good Agent DX. Not because agents are special. Because agents are the most demanding version of the reader you should have been writing for all along.

Pick one tool description. Read it as someone who has never seen your codebase, has infinite patience, and cannot ask a clarifying question. Does the intent survive? That’s your health check.

Olga Safonova

Discussion about this post

Ready for more?