Let me save you some time: your KPIs don't measure what you think they measure. They can't. And the sooner you accept that, the better your systems thinking will get.
This isn't nihilism. It's the entry point to DMAIC done right.
The Uncomfortable Truth About Measurement
Here's the thing about KPIs that most measurement frameworks won't tell you: every metric is a proxy. It's a shadow cast by a system you can't fully see. And the gap between what you measure and what actually matters? That's where the real work lives.
If you don't have a model of your system's error sources—one that makes domain-science-specific sense—then of course your KPIs aren't truth. They're approximations of approximations. A velocity metric in a software team isn't measuring productivity; it's measuring a behavior that someone hoped would correlate with productivity. Conversion rate isn't measuring customer value; it's measuring a decision point that your analytics stack can see.
The problem isn't that KPIs are imperfect. The problem is pretending they're perfect.
DMAIC as a Learning Framework
Most people treat DMAIC (Define, Measure, Analyze, Improve, Control) as a process improvement checklist. Define the problem, measure the current state, analyze root causes, implement improvements, establish controls. Done.
But here's the reframing: DMAIC is a learning framework disguised as an improvement framework.
The Define phase isn't about getting the "right" KPIs. It's about understanding the gap between what you can measure and what actually matters. It's about surfacing the assumptions baked into your metrics and interrogating whether those assumptions survive contact with reality.
If you skip this—or rush through it—you're not doing DMAIC. You're doing theatrical improvement. You're measuring the things that are easy to measure and hoping they're the things that matter.
Where You Should Spend Your Time
Early in a project, roughly 80% of your energy should go toward understanding what your metrics are missing. Not refining them. Not optimizing them. Not adding more of them. Understanding what they don't capture.
This is the Define cycle of DMAIC iterations. Not a one-time phase you complete and move past, but a return-to-baseline that you keep circling back to as you learn more about the system.
What questions to ask:
- What would an ideal measurement look like if you had perfect information?
- What's the gap between that ideal and what you can actually observe?
- What assumptions are you making about what your metrics represent?
- What error sources does your measurement model ignore?
- What would have to be true for your KPIs to mean what you think they mean?
If you can't answer these questions, you don't understand your system well enough to improve it. You might improve something—but it probably won't be the thing you intended.
The Gap Is the Signal
Here's a pattern I've seen repeatedly: the organizations that get measurement right are the ones that treat their KPIs as hypotheses, not ground truth.
They ask: "If this metric is going up, what do we think is happening? What else could be happening? What would falsify that interpretation?"
They track leading indicators and lagging indicators and try to understand why they diverge.
They care about the shape of the gap between measurement and reality—not because they can close it perfectly, but because understanding its shape tells them something about the system they're trying to improve.
A Practical Example: Software
Imagine you're measuring deployment frequency as a KPI. (DORA metrics, right?) The goal is to improve software delivery performance.
But what does "deployment frequency" actually measure?
- It measures how often code ships. It doesn't measure:
- Whether the code does what users need
- Whether the team is burning out to achieve that frequency
- Whether the deployments are safe or risky
- Whether the features are valuable or just voluminous
The metric is useful if you understand its limits. It becomes dangerous when you treat it as the goal itself.
The Define phase work here isn't "choose better metrics." It's: understand what deployment frequency misses. What would you need to observe to know if high deployment frequency is actually a good thing? How does it interact with other measures? What error sources (team pressure to ship, gaming the metric, deploying unfinished work) does your model account for—or not?
A Practical Example: Manufacturing
Now consider yield. Every manufacturing organization tracks it. "We hit 94% yield this quarter." Great. But what does yield actually measure?
At the line level, yield is the ratio of good units to total units. Seems straightforward. But here's what it misses:
- Scrap that never gets counted. If your inspection process lets defects through, your "yield" includes units that will fail at the customer.
- Rework that hides the real failure rate. A unit that takes three passes to get right still shows up as yield. It's not scrap. But it consumed three times the labor and materials.
- The upstream pressure you're creating. When the coating department optimizes for their yield metric, they might apply thicker coats than necessary. Their yield goes up. The machining department downstream? Their tool wear increases. Their cycle times go up. Their yield drops. But coating's numbers look great.
This is the departmental silo problem in KPI form. Each function optimizes for the metric they own. The system as a whole degrades. The Define phase work here is understanding how local optimization creates global suboptimization—and what your measurement model is systematically ignoring.
What would you need to know to actually improve yield? Not just measure it:
- Where does the defect actually originate? (Often not where it's detected.)
- What upstream decisions are pushing costs downstream?
- What's the total cost of quality—including rework, inspection, warranty claims, customer trust?
- What would a system-level yield metric look like, and what data would you need to compute it?
The organizations that get this right stop asking "how do we improve our yield?" and start asking "what is yield actually telling us, and what is it hiding?"
The Point
George Box told us that all models are wrong, but some are useful. What he forgot to tell us is which ones. That's the work. That's the Define phase. Not picking metrics—interrogating them.
The useful models are the ones you've interrogated. The ones where you've done the hard work of understanding:
- What they capture and what they miss
- What error sources your model doesn't account for
- What assumptions you're smuggling in without noticing
- What the gap between measurement and reality looks like
- What local optimization is doing to the system as a whole
This is what the Define phase is actually for. Not getting the metrics "right"—getting honest about how wrong they are and what that means for your improvement work.
If you're spending your Define phase selecting KPIs without interrogating them, you're skipping the most important part. The metrics will tell you something. But it probably won't be what you think.
The best measurement frameworks don't eliminate uncertainty—they make it visible. DMAIC done right is a practice in epistemic humility, not a promise of perfect metrics.
Next in this series: When departments optimize for different metrics, you don't get a better system—you get organizational dysfunction. The psychology of team dynamics in problem-solving constructs matters as much as the metrics themselves.