Software Factory
An automated development pipeline where AI agents execute spec-driven tasks under human governance, scaling output through higher operator leverage ratios.
Definition
A software factory is an automated development pipeline in which AI agents execute structured tasks — code generation, testing, refactoring, deployment — under continuous human governance. The Agentic Engineering model treats agents as execution capacity that scales through higher Operator Leverage Ratio values (more agents per human operator), not through removing humans from the process.
The defining characteristic of a software factory is that agents work from Live Spec documents rather than ad-hoc prompts. Specifications define what to build; agents determine how to build it; and an Eval Harness validates the result against machine-readable acceptance criteria. Humans remain responsible for specification authoring, evaluation design, and architectural decisions.
Maturity Levels
Software factory maturity is measured by the operator leverage ratio — the number of concurrent agent tasks a single human can effectively govern — not by the degree of human removal.
| Level | Name | Description | Operator Leverage |
|---|---|---|---|
| L0 | Manual | No agent involvement. Developers write all code directly. | N/A |
| L1 | Assisted | Agents provide inline suggestions and completions. A developer reviews each suggestion before accepting it. | 1:1 |
| L2 | Copilot | Agents generate multi-file changes from natural-language prompts. Developers review outputs before committing. | 1:1 to 1:3 |
| L3 | Spec-Driven | Agents execute against Live Specs with automated evaluation. Human review focuses on spec quality and eval results rather than line-by-line code inspection. | 1:3 to 1:10 |
| L4 | Governed Autonomy | Agents operate continuously on queued specs with Gate Based Governance. Humans define gates, review exceptions, and handle escalations. Routine tasks flow through without manual intervention, but governance gates ensure human oversight at defined checkpoints. | 1:10 to 1:50 |
At every maturity level, Human In The Loop oversight is present. The nature of that oversight shifts from reviewing individual lines of code (L1–L2) to reviewing specifications and evaluation results (L3) to defining governance policies and handling exceptions (L4). The goal is not to eliminate human judgment but to apply it where it has the highest leverage — at the specification and evaluation layers rather than the implementation layer.
Relationship to Vibe Coding
A software factory is distinct from Vibe Coding, which relies on conversational, ad-hoc interaction with AI models. Vibe coding can be productive for exploration and prototyping but does not scale to multi-agent, multi-task execution because it lacks the structured specifications and automated evaluation that a factory pipeline requires.