Agent-Human Task Routing
How to decide which tasks to delegate to agents versus keeping for human engineers based on risk, complexity, and context availability.
Overview
Agent-Human Task Routing is a decision framework for systematically assigning development tasks to the right executor — fully autonomous agents, human-assisted agents, or human engineers working alone. The framework applies the Hybrid Engineering pillar from the Agentic Development Handbook to build a routing matrix that evaluates each task on four dimensions: spec completeness, pattern coverage, blast radius, and novelty.
Without a routing framework, teams default to gut feel. Some engineers over-delegate, sending ambiguous or high-risk work to agents that burn through Token Budget without producing usable output. Others under-delegate, manually implementing tasks that agents could handle in a fraction of the time. A structured routing approach maximizes Operator Leverage Ratio by directing each task to the executor best equipped to handle it.
The routing decision happens before agent execution begins — during the Daily Flow Sync or the weekly Spec Engineering Block — and feeds directly into the Live Spec assignment process. The Core Nucleus (Context Architect, Flow Manager, and lead Agent Operator) collaborates on routing decisions, with the Flow Manager tracking outcomes to refine the matrix over time.
Problem
Teams adopting Agentic Engineering face a persistent allocation problem: which tasks should agents handle, and which should humans handle?
-
Over-delegation wastes tokens and time. When agents receive tasks with vague specs, missing context, or high novelty, they produce low-quality output that requires extensive human correction. The Correction Ratio spikes. Rescue Mission frequency increases. The team spends more time fixing agent output than they would have spent writing the code themselves.
-
Under-delegation wastes human capacity. When engineers manually implement well-patterned, fully specified tasks — CRUD endpoints, component variations, test generation, documentation — they leave agent capacity unused. The Operator Leverage Ratio stagnates because humans are doing work that agents handle reliably.
-
No feedback loop. Without tracking which routing decisions led to good outcomes and which did not, the team cannot improve over time. The same misrouted tasks recur. The same engineers keep rescuing the same categories of stuck agents.
-
Inconsistent decisions. Different team members apply different mental models for task assignment. One engineer delegates aggressively; another barely uses agents at all. The team lacks a shared vocabulary for discussing delegation boundaries.
Solution
Implement a four-dimensional routing matrix that scores each task and maps the score to one of three execution modes:
- Auto-agent — Agent executes autonomously with standard Eval Harness gates. Minimal human involvement beyond final review.
- Assisted-agent — Agent executes with active human monitoring. The Agent Operator reviews output at checkpoints and provides mid-task corrections.
- Human-only — Human engineer implements the task directly. Agent may assist with subtasks (test generation, documentation) but does not own the implementation.
The four routing dimensions are:
| Dimension | Question | Low Score (1) | High Score (5) |
|---|---|---|---|
| Spec Completeness | How complete is the Live Spec? | Vague requirements, no acceptance criteria | Full behavioral contract, testable acceptance criteria, defined scope |
| Pattern Coverage | Does the codebase contain similar implementations? | No precedent, first-of-its-kind | Multiple Golden Samples exist for this exact pattern |
| Blast Radius | What is the impact if the output is wrong? | Breaks critical paths, affects production data, security implications | Isolated change, easily reverted, no downstream dependencies |
| Novelty | How much creative judgment is required? | Requires architectural decisions, trade-off analysis, ambiguous requirements | Straightforward application of known patterns and rules |
Score each dimension from 1 to 5. Sum the scores:
| Total Score | Routing Decision |
|---|---|
| 16-20 | Auto-agent |
| 10-15 | Assisted-agent |
| 4-9 | Human-only |
Implementation
Code Examples
// scripts/task-router.ts
import { readFileSync } from "fs";
import { parse } from "yaml";
interface RoutingConfig {
task_classifications: {
auto_agent: TaskType[];
assisted_agent: TaskType[];
human_only: TaskType[];
};
}
interface TaskType {
name: string;
spec_completeness: number;
pattern_coverage: number;
blast_radius: number;
novelty: number;
notes?: string;
}
interface TaskScore {
specCompleteness: number;
patternCoverage: number;
blastRadius: number;
novelty: number;
}
type RouteDecision = "auto-agent" | "assisted-agent" | "human-only";
function loadConfig(path: string): RoutingConfig {
const raw = readFileSync(path, "utf-8");
return parse(raw) as RoutingConfig;
}
function scoreToRoute(score: TaskScore): RouteDecision {
if (score.blastRadius <= 1) return "human-only";
if (score.specCompleteness <= 2) return "human-only";
const total =
score.specCompleteness +
score.patternCoverage +
score.blastRadius +
score.novelty;
if (total >= 16) return "auto-agent";
if (total >= 10) return "assisted-agent";
return "human-only";
}
function findSimilarTask(
description: string,
config: RoutingConfig
): { route: RouteDecision; match: TaskType } | null {
const allTasks = [
...config.task_classifications.auto_agent.map((t) => ({
...t,
route: "auto-agent" as RouteDecision,
})),
...config.task_classifications.assisted_agent.map((t) => ({
...t,
route: "assisted-agent" as RouteDecision,
})),
...config.task_classifications.human_only.map((t) => ({
...t,
route: "human-only" as RouteDecision,
})),
];
const descLower = description.toLowerCase();
const match = allTasks.find((t) =>
t.name.toLowerCase().split(" ").some((word) => descLower.includes(word))
);
if (match) {
return { route: match.route, match };
}
return null;
}
// Usage
const config = loadConfig("task-routing-config.yaml");
const suggestion = findSimilarTask("Add new CRUD endpoint for orders", config);
if (suggestion) {
console.log(`Suggested route: ${suggestion.route}`);
console.log(`Based on: ${suggestion.match.name}`);
console.log(`Notes: ${suggestion.match.notes || "None"}`);
} else {
console.log("No matching classification found. Score manually.");
}# routing-log.yaml — Track decisions for retrospective analysis
entries:
- date: "2026-02-20"
task: "Add pagination to /api/products endpoint"
scores:
spec_completeness: 5
pattern_coverage: 5
blast_radius: 4
novelty: 5
route: "auto-agent"
outcome: "passed-first-attempt"
correction_needed: false
- date: "2026-02-20"
task: "Implement OAuth2 PKCE flow"
scores:
spec_completeness: 4
pattern_coverage: 2
blast_radius: 1
novelty: 2
route: "human-only"
outcome: "completed-by-human"
notes: "Security-critical path, blast radius override"
- date: "2026-02-21"
task: "Refactor UserService to repository pattern"
scores:
spec_completeness: 4
pattern_coverage: 3
blast_radius: 3
novelty: 3
route: "assisted-agent"
outcome: "completed-after-2-corrections"
correction_needed: true
notes: "Agent missed edge case in error handling"Considerations
- • **Higher token efficiency.** By routing only well-specified, well-patterned tasks to autonomous agents, teams avoid burning [[token-budget]] on tasks where agents are likely to fail. Token spend correlates with successful output rather than wasted retries.
- • **Fewer rescue missions.** Systematic routing prevents the most common cause of [[rescue-mission]] escalations — agents receiving tasks they are not equipped to handle given the available context. When rescue missions do occur, the routing log helps diagnose why.
- • **Humans focus on high-value work.** Engineers spend their time on architectural decisions, novel problem-solving, and security-critical implementations — work where human judgment adds the most value. Routine implementations flow to agents.
- • **Measurable improvement over time.** The routing log creates a feedback loop. Teams can see which task types consistently succeed as auto-agent and which consistently require human intervention. Over months, the routing matrix becomes increasingly accurate.
- • **Shared delegation vocabulary.** The four-dimension scoring system gives the team a common language for discussing task assignment. "This task scores 2 on pattern coverage" is more actionable than "I do not think the agent can handle this."
- • **Initial classification effort.** Building the task classification catalog and calibrating scores requires a time investment during the first two weeks. Start with 5-10 common task types and expand gradually.
- • **Edge cases resist clean scoring.** Some tasks score well on three dimensions but poorly on one — a well-specified, well-patterned task with a high blast radius. The hard rules (blast radius override) help, but edge cases still require human judgment during the Daily Flow Sync.
- • **Routing boundaries shift as context improves.** A task type that starts as "human-only" may become "assisted-agent" after the team builds relevant [[golden-samples]] and context. Review the classification catalog monthly to keep it current.
- • **Over-reliance on the matrix.** The routing matrix is a decision aid, not a replacement for engineering judgment. If an experienced engineer's instinct conflicts with the matrix score, investigate why. The instinct may be incorporating a factor the matrix does not capture.
- • **Requires consistent scoring.** Different team members may score the same task differently. Calibrate by scoring 10 recent tasks together as a team during the first Spec Engineering Block. Discuss disagreements to align on scoring standards.