Gate-Based Governance
How to implement automated and human-in-the-loop quality gates for agent-generated code.
Overview
Gate-Based Governance is a quality control pattern that routes agent-generated code through a tiered system of automated checks and human review points before it reaches the main codebase. Instead of reviewing all agent output with the same scrutiny, this pattern separates routine validations (which machines handle well) from judgment-heavy decisions (which require human attention), directing each to the appropriate gate.
The pattern implements the Agentic Development Handbook's Gate-Based Governance pillar. Automated gates — powered by an Eval Harness — catch formatting violations, test failures, type errors, and architectural conformance issues. Human In The Loop gates catch design trade-offs, security implications, and decisions that require business context. Together, they ensure consistent quality without creating bottlenecks.
Problem
Teams that introduce AI agents into their development workflow face a governance dilemma:
- No gates at all. Agent code goes directly into pull requests. Reviewers are overwhelmed by volume and inconsistent quality. Architectural violations, security issues, and subtle logic errors slip through because human reviewers cannot scale to match agent output speed.
- One-size-fits-all review. Every agent-generated change gets the same manual review process. Trivial formatting fixes consume the same reviewer attention as security-critical authentication changes. Reviewers burn out and start rubber-stamping.
- After-the-fact review only. Quality checks happen after the agent has finished. By that point, an architectural violation may be deeply embedded in the implementation. Rework is expensive. The agent may have built five components on top of a flawed foundation.
- No escalation path. When an automated check fails, there is no defined process for what happens next. Does the agent retry? Does a human intervene? Does the task get reassigned? Without a clear escalation ladder, failures stall the workflow.
Solution
Implement a tiered gate system with four components:
-
Automated gates — Fast, deterministic checks that run on every agent output. These include linting, type checking, test execution, security scanning, and architectural conformance validation. Configure these as part of the Eval Harness with pass/fail criteria drawn from Live Spec acceptance criteria.
-
Agent retry loop — When an automated gate fails, the agent receives the failure diagnostics and attempts to fix the issue. Set a retry limit (typically 2-3 attempts) to prevent infinite loops.
-
Human In The Loop gates — Changes that pass automated gates but meet certain criteria (security-critical paths, architectural decisions, high blast radius) are flagged for human review before merging. These gates focus human attention where it has the most impact.
-
Escalation ladder — A defined four-phase process for handling failures that automated retries cannot resolve. This follows the handbook's Four-Phase Escalation Ladder: automated retry, parameter adjustment, Blocker Flag and pause, Rescue Mission with human intervention.
The Eval Harness is the engine behind the automated gates. It reads acceptance criteria from the Live Spec, runs the configured checks, and produces a pass/fail report. The Context Architect and Evaluation Engineer collaborate to define which checks run at each gate.
Implementation
Code Examples
# .github/workflows/agent-gates.yml
name: Agent Output Gates
on:
pull_request:
branches: [main]
# Only run on PRs labeled as agent-generated
types: [labeled]
jobs:
automated-gates:
if: contains(github.event.pull_request.labels.*.name, 'agent-generated')
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Gate 1 — Lint and Format
run: |
npm run lint
npm run format:check
- name: Gate 2 — Type Check
run: npm run typecheck
- name: Gate 3 — Unit Tests
run: npm run test:unit -- --coverage
- name: Gate 4 — Security Scan
run: |
npm audit --audit-level=high
npx semgrep --config p/typescript src/
- name: Gate 5 — Architecture Conformance
run: node scripts/check-architecture.js
hitl-gate-triage:
needs: automated-gates
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Check HITL triggers
id: hitl
run: |
node scripts/check-hitl-triggers.js \
--pr=${{ github.event.pull_request.number }}
- name: Request human review
if: steps.hitl.outputs.required == 'true'
uses: actions/github-script@v7
with:
script: |
await github.rest.pulls.requestReviewers({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: ${{ github.event.pull_request.number }},
reviewers: ${{ steps.hitl.outputs.reviewers }}
});// scripts/check-architecture.ts
import { readFileSync, readdirSync } from "fs";
import { join } from "path";
interface ConformanceRule {
name: string;
check: (filePath: string, content: string) => string | null;
}
const rules: ConformanceRule[] = [
{
name: "no-direct-db-access-in-components",
check: (filePath, content) => {
if (filePath.includes("/components/") && content.includes("prisma")) {
return `Components must not access the database directly. Use a service layer. Found in ${filePath}`;
}
return null;
},
},
{
name: "no-circular-imports",
check: (filePath, content) => {
if (
filePath.includes("/services/") &&
content.includes("from '../components/")
) {
return `Services must not import from components. Found in ${filePath}`;
}
return null;
},
},
{
name: "test-co-location",
check: (filePath, content) => {
if (
filePath.endsWith(".tsx") &&
!filePath.endsWith(".test.tsx") &&
filePath.includes("/components/")
) {
const testPath = filePath.replace(".tsx", ".test.tsx");
try {
readFileSync(testPath);
} catch {
return `Missing co-located test file for ${filePath}`;
}
}
return null;
},
},
];
function checkConformance(directory: string): string[] {
const violations: string[] = [];
function walk(dir: string) {
for (const entry of readdirSync(dir, { withFileTypes: true })) {
const fullPath = join(dir, entry.name);
if (entry.isDirectory() && entry.name !== "node_modules") {
walk(fullPath);
} else if (entry.name.endsWith(".ts") || entry.name.endsWith(".tsx")) {
const content = readFileSync(fullPath, "utf-8");
for (const rule of rules) {
const violation = rule.check(fullPath, content);
if (violation) violations.push(`[${rule.name}] ${violation}`);
}
}
}
}
walk(directory);
return violations;
}
const violations = checkConformance("src/");
if (violations.length > 0) {
console.error("Architectural conformance violations found:");
violations.forEach((v) => console.error(` - ${v}`));
process.exit(1);
} else {
console.log("All architectural conformance checks passed.");
}// scripts/check-hitl-triggers.ts
interface HITLTrigger {
condition: string;
reason: string;
reviewer: string;
}
const triggers: HITLTrigger[] = [
{
condition: "auth",
reason: "Security-critical path modified",
reviewer: "security-lead",
},
{
condition: "migration",
reason: "Database migration detected",
reviewer: "dba-team",
},
{
condition: "package.json",
reason: "Dependencies modified",
reviewer: "tech-lead",
},
];
function evaluateTriggers(changedFiles: string[]): HITLTrigger[] {
return triggers.filter((trigger) =>
changedFiles.some((file) => file.includes(trigger.condition))
);
}Considerations
- • **Consistent quality at scale.** Automated gates catch the same classes of issues every time, regardless of how many agents are producing code. Human reviewers do not need to check for formatting, type errors, or known security patterns.
- • **Focused human attention.** [[human-in-the-loop]] gates direct reviewer effort to decisions that genuinely require human judgment — architectural trade-offs, security implications, business logic correctness. This reduces reviewer fatigue and improves review quality.
- • **Measurable governance.** Gate pass rates, retry success rates, and escalation rates provide concrete data on agent output quality and governance effectiveness. Teams can identify systemic issues and track improvement over time.
- • **Clear escalation path.** The Four-Phase Escalation Ladder eliminates ambiguity about what happens when something fails. Every team member knows the process, reducing ad-hoc interventions and blocked work.
- • **Early failure detection.** Gates catch issues before they compound. An architectural violation caught at the gate level is far cheaper to fix than one discovered after five dependent components have been built on top of it.
- • **Initial gate configuration effort.** Defining which checks run, what thresholds to set, and which HITL triggers to configure requires upfront investment. Start with a minimal set and expand based on data from the first few weeks.
- • **False positive management.** Overly strict automated gates produce false positives that slow the workflow and erode trust. Monitor the false positive rate and tune thresholds regularly. An advisory (non-blocking) tier helps identify where gates are too aggressive.
- • **Balancing speed and thoroughness.** Too many gates slow the development loop. Too few gates let quality issues through. The right balance depends on the codebase, the team's risk tolerance, and the maturity of the agents. Review gate configuration monthly.
- • **Escalation culture.** The escalation ladder only works if the team uses it consistently. If engineers bypass gates or skip phases, the system degrades. Leadership must reinforce that escalation is a normal part of the workflow, not a sign of failure.
- • **Gate maintenance.** As the codebase evolves, gates must evolve with it. New architectural rules need new conformance checks. New security patterns need new scanning rules. Assign ownership of gate maintenance to the Evaluation Engineer role.