NOTE

Goodhart's law and engineering metrics

#software-engineering (40)#metrics (1)

Engineering leaders reach for numbers because numbers feel legible. The trouble is that people are clever about anything that shows up on a review, a roadmap slide, or a quarterly business review.

When a measure becomes a target, it ceases to be a good measure.

That line is the usual shorthand for Goodhart’s law. The idea itself came out of macroeconomics. Charles Goodhart argued in the mid-1970s that stable statistical relationships in UK monetary policy stopped behaving once policymakers leaned on them as control levers. The tighter the regime treated a quantity as the instrument, the faster markets and institutions routed around it. Goodhart’s original formulation was blunter. “Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.” Later writers widened the point. Keith Hoskin called it out in a 1996 chapter on accountability, and Marilyn Strathern quoted that same “measure becomes a target” wording while writing about audit in British universities in 1997, tying it back to Goodhart’s work on monetary instruments. The English Wikipedia article on Goodhart’s law walks through that chain with pointers into the primary literature.

None of that requires you to care about central banking to see the pattern at work in software organizations. The usage skew in the Pareto principle is a cousin problem. A few inputs drive most outcomes until you promote one of them to the scoreboard.

Velocity and story points. Story points are a planning aid, not a productivity unit. Once a team’s “throughput” becomes a target, estimates inflate, work gets sliced so tickets look smaller, and refactors quietly slip off the board. You still have a chart. You no longer have an honest picture of risk or delivery.

Code coverage. Coverage answers a narrow question about which lines ran during tests. Treat a percentage as a mandate and you get tests that touch code without asserting behavior, generated fixtures that exist to lift the needle, and a false sense of safety about the paths that actually matter in production.

OKRs. A good key result names an outcome you would recognize if it happened without anyone telling you. A brittle key result names an output you can tick off on a spreadsheet. Teams under pressure hit the spreadsheet version first. Ship count goes up, customer pain stays flat, and the quarter still reads green.

The fix is not “never measure.” It is to hold measures lightly, pair them with qualitative review, and redesign the scoreboard when you catch people (including yourself) optimizing the ink instead of the system. Goodhart’s point was never that indicators lie. It is that targets change behavior, and the indicator you liked yesterday may not survive that pressure tomorrow.