Blog

Deep dives on AI systems, architecture, and measurable business outcomes.

← Back to blog

Supervised Agency: A Day in the Life of Human-AI Collaboration

How a 37-point Lighthouse improvement revealed the patterns that make human-AI partnerships work.

AI-Augmented Work AI Engineering Systems Thinking
Looking glass zoomed on a graph network, representing supervised agency and structured human-AI collaboration.

If you've worked with AI assistants, you've probably experienced both extremes:

Fully autonomous: "Just fix my site." The agent makes changes, you come back to find... mixed results. Some improvements, some regressions, no clear reasoning trail. You spend as much time fixing problems as you would have doing it yourself.

Fully manual: "Do exactly this." You specify every step, review every change, and wonder why you're using AI at all. The cognitive overhead negates the benefit.

Both extremes fail. Autonomy loses context and direction. Manual oversight doesn't scale.

The middle ground is what I call supervised agency — a structured partnership where humans provide direction, judgment, and guardrails, and agents execute within clear boundaries. This session demonstrated this model in action: a 37-point Lighthouse improvement (accessibility +16, performance +21) with LCP cut from 16.66s to 4.66s.

Here's what that looks like in practice.

What is Supervised Agency?

Supervised agency is a workflow pattern where:

  • Humans retain authority over strategic decisions, quality thresholds, and risk acceptance
  • Agents have autonomy within well-defined boundaries — implementation approach, tool selection, iteration details
  • Communication happens at structured intervals, not continuous interruption

The key insight: the human operates as a force multiplier, not a micromanager. Agents execute, iterate, and propose — but the human makes judgment calls at decision points.

This session revealed three distinct zones:

  1. Human Decision Points — Where judgment matters (prioritization, thresholds, risk)
  2. Agent Autonomy Zones — Where clear boundaries enable independent work
  3. Collaboration Points — Where structured communication keeps things on track

Session Walkthrough: The Evidence

Morning Context

The session started with a specific problem: the mobile hamburger menu wasn't working on iOS Safari. This triggered the first pattern.

Pattern 1: Iterative Debugging

The agent identified an iOS Safari stacking context issue — the menu button was being covered by another element due to z-index conflicts. What followed was a classic supervised agency cycle:

Agent: "I found the stacking context conflict. I'll adjust the z-index and test."

Human: "Go ahead."

Agent implements fix, tests on Chrome and Safari, reports success.

Human: "Looks good. What's next?"

What happened here: The agent had autonomy within the debugging loop. The human approved the approach but didn't prescribe each step. Multiple test cycles were acceptable because each cycle added precision. The human validated direction, not implementation details.

Pattern 2: Human Decision Points

After the mobile fix, the question became: what to optimize next?

Agent: "Lighthouse scores are: Accessibility 79, Performance 54. Which should we prioritize?"

Human: "Accessibility first. Let's get that to 95+."

This was a strategic decision — prioritization based on hiring funnel goals. The agent gathered options, the human chose direction.

Then something interesting happened. The agent worked through accessibility fixes autonomously: ARIA labels, focus management, color contrast, semantic markup. Each Lighthouse run showed progress. 79 → 82 → 88 → 92 → 95.

Human: "95. Good enough. Performance next."

What happened here: The human set the quality threshold. The agent decided how to reach it. This separation matters: humans define "done," agents define "how."

Pattern 3: Agent Autonomy Zones

Performance optimization revealed another pattern — tasks with clear acceptance criteria enable autonomy.

Agent: "LCP is 16.66s. I'll profile and identify bottlenecks."

Agent runs performance profiling, identifies image loading and JavaScript blocking as primary issues.

Agent: "I can defer JavaScript loading and add lazy loading to images. Estimated impact: LCP reduction of 10-12 seconds."

Human: "Go ahead. Analytics can load late."

Notice the decision point: "Should I pursue this approach?" Human approved the direction. But within that direction, the agent worked independently — testing specific image optimizations, measuring each change, iterating until LCP reached 4.66s.

What happened here: The agent had an autonomy zone defined by:

  • Clear scope (performance optimization)
  • Measurable outcome (LCP improvement)
  • Acceptance criteria (target score or diminishing returns)

Metrics Summary

MetricBeforeAfterDelta
Accessibility Score7995+16
Performance Score5475+21
LCP16.66s4.66s-12s

Reusable Patterns for Developers

Pattern 1: The Metrics Feedback Loop

The most effective pattern: start with a measurable baseline.

  1. Baseline measurement — Lighthouse scores, test coverage, load time, whatever metric matters
  2. Agent proposes fixes with predicted impact — "This change should improve LCP by ~3s"
  3. Verify with same measurement tool — Run Lighthouse again
  4. Repeat until diminishing returns — When each fix yields smaller improvements, stop

This works because metrics are objective — no debate about whether something improved, just measurement.

Pattern 2: Decision Point Templates

Common decision points where human judgment matters:

Decision PointExample QuestionHuman Role
Prioritization"Which problem first?"Set priorities based on goals
Threshold"When do we stop?"Define "good enough"
Architecture"Which approach?"Choose direction at crossroads
Risk"Is this safe to deploy?"Accept or reject risk

The pattern: agent gathers options, human chooses.

Pattern 3: Autonomy Zone Definition

A good autonomy zone has:

Task + Acceptance Criteria + Context = Autonomy

Example: "Remove duplicate templates, verify no regressions in tests."

  • Task: Remove duplicates
  • Criteria: No test regressions
  • Context: Template files in /templates/

The agent can work independently because success is verifiable.

Pattern 4: Communication Cadence

Structured updates keep humans informed without overwhelming:

Good update:

"Accessibility 79 → 95 (+16). LCP improved from 16.66s to 4.66s. Mobile menu fixed on iOS Safari. Next: performance optimization."

Bad update:

"I changed line 45 in CSS to z-index: 1000, then I ran Lighthouse, it showed 79, then I added ARIA labels to the nav..."

Humans need outcomes and metrics, not implementation details.

The Communication Layer

Effective supervised agency depends on communication at the right granularity:

What to include:

  • Context: What problem you're solving
  • Metrics: Before/after evidence
  • Blockers: What's stuck
  • Next steps: What's coming

What to skip:

  • Implementation details unless requested
  • Every small decision within an autonomy zone
  • Progress updates more frequent than task completion

Heartbeat frequency: For tasks longer than 30 minutes, check in every 10-15 minutes. For multi-hour projects, every 30-60 minutes. The goal: humans stay informed without being interrupted.

Practical Takeaways

If you're using AI assistants:

  1. Define autonomy zones explicitly — Task + criteria + context. The agent will work independently within those boundaries.
  2. Reserve judgment for decision points — Don't prescribe implementation. Approve direction, define "done," then let the agent execute.
  3. Use metrics as the common language — Objective measurements keep both human and agent aligned on progress.
  4. Structure your updates — Outcomes and metrics, not implementation details. Humans have limited bandwidth; make updates scannable.

If you're building AI tools:

  1. Build for autonomy zones — Clear task boundaries, measurable outcomes, progress signals.
  2. Design for decision points — Prompt for human input at crossroads, not every step.
  3. Make metrics first-class — Lighthouse, test coverage, load time — whatever the domain measures, make it visible.

What's Next

This session crystallized a pattern I've been developing: supervised agency as a reusable workflow. I've documented it as a skill — templates for charters, session notes, heartbeat checks — and a zettel cluster in my knowledge base.

The zettel cluster includes:

  • Decision boundaries (what humans decide vs. agents)
  • Communication patterns (heartbeats, approval gates)
  • Trust calibration (how autonomy expands over time)
  • Anti-patterns (what goes wrong and how to fix it)

If you're exploring human-AI collaboration, start with the question: Where are my decision points, and where are my autonomy zones? The answer defines your supervised agency model.


This session demonstrated that the most effective human-AI workflows aren't about "let AI do everything" or "micromanage every step" — they're structured partnerships with clear boundaries, measurable outcomes, and judgment at the right moments. 37 Lighthouse points later, I'm convinced.