The model vs the harness

When people compare AI tools, the conversation often starts with “which model does it use?”

The model is one component in a larger system, and usually not the component that explains the differences you actually feel when using the tool. It’s more correct to distinguish between the model vs the harness.

What’s a model?

The model is the LLM itself (Opus 4.6, GPT-5.3, Composer 1.5, etc). It takes a sequence of tokens in and produces a sequence of tokens out. It has no memory between calls. It can’t browse your filesystem, run your tests, or decide when to ask a clarifying question. It’s a stateless function: f(context) -> completion

Models differ from each other in meaningful ways (reasoning quality, instruction following, code generation strength), but those differences are smaller than most people assume, and they’re shrinking over time.

What’s a harness?

The harness is everything around the model that turns a raw completion engine into a useful tool. This includes:

Context management: What gets stuffed into the model’s limited context window before each call? Which files, which conversation history, which documentation? This is the most consequential design decision in any AI tool.
Tool use orchestration: Can the model read files, write files, run shell commands, call APIs? How are those capabilities exposed and sequenced?
Agentic loop design: When the model produces output, does the harness feed that output back in for another pass? Under what conditions does it stop? How does it recover from errors?
Retrieval and indexing: How does the harness find relevant code across a large codebase? Does it use embeddings, AST parsing, dependency graphs, or just grep?
User interaction patterns: When does the tool ask for confirmation vs. act autonomously? How does it present diffs? How does it handle multi-file changes?

Claude Code and Cursor can both use Opus 4.6 (for example) as the underlying model. But they feel like different tools because the two products have made different harness decisions: how they manage context, what tools they expose to the model, how much autonomy the agentic loop gets, how they present changes to you, and when they ask for confirmation or act on their own.

Every difference you notice between them traces back to the harness, not the engine underneath.

When does the difference matter?

Evaluating tools: When someone says “Tool X is better than Tool Y,” ask what about it makes it better. Often the answer is a harness decision (better context selection, smarter retrieval, tighter feedback loops), not a model difference. Knowing this helps you evaluate tools more precisely.

Building with AI: When you integrate AI into your product or internal tools, you’re building harnesses. The model selection matters, but the quality of what you build around it matters more. The leverage comes from how you construct prompts, what context you include (or exclude), how you handle errors and edge cases, how you integrate human feedback, etc.

Staying model-agnostic: If you understand that most of the value comes from the harness, you also understand why tight coupling to a single model provider is high risk. Good harness design lets you swap models as the landscape shifts.

The model is the engine. The harness is the application. Most of the interesting engineering — and most of what determines whether a tool is actually good — lives in the harness.

#What’s a model?

#What’s a harness?

#When does the difference matter?

Links here

What’s a model?

What’s a harness?

When does the difference matter?