Spec-Driven Development
Replace ad-hoc prompting with structured Live Specs and Context Packets to produce deterministic, evaluable agent outputs through the Specify-Execute-Evaluate cycle.
Overview
Spec-Driven Development is a workflow pattern in which every unit of agent work begins with a structured specification — a Live Spec — rather than an ad-hoc natural-language prompt. The Context Architect authors the spec, an agent executes against it, and an Eval Harness validates the result. This three-phase loop — Specify, Execute, Evaluate — is the Spec Driven Development methodology described in the Agentic Development Handbook, and it is the primary alternative to Vibe Coding.
Problem
Teams that rely on ad-hoc prompting encounter predictable failure modes:
- Non-reproducible outputs. The same intent phrased differently produces different code. There is no stable artifact to version, diff, or review.
- Missing context. Prompts rarely carry the full system context an agent needs — architecture constraints, interface contracts, quality standards — so the agent guesses, and guesses diverge.
- No evaluation anchor. Without machine-readable acceptance criteria, there is no way to automatically verify whether agent output satisfies the requirement. Review becomes a manual, subjective process.
- Drift across sessions. Knowledge evaporates between agent sessions. Each new conversation starts from zero unless the developer manually re-supplies context.
These problems compound as teams scale the number of agents and tasks. What works for a single developer chatting with a copilot breaks down when multiple agents execute in parallel across a codebase.
Solution
Replace the ad-hoc prompt with a formal specification layer composed of two artifacts:
- Live Spec — A versioned, machine-readable document that defines what the agent must build, including behavioral contracts, acceptance criteria, and references to relevant context.
- Context Packet — A bundled set of files, schemas, examples, and instructions that the agent receives alongside the spec. Context Packets supply the how — architecture decisions, coding standards, API contracts, and Golden Samples that demonstrate expected output quality.
The Context Architect authors and maintains these artifacts. Execution follows the Triangular Workflow:
- Specify — The Context Architect writes or updates the Live Spec with clear acceptance criteria and attaches the relevant Context Packet.
- Execute — The agent receives the spec and context, then produces code, tests, or documentation.
- Evaluate — The Eval Harness runs automated checks against the acceptance criteria defined in the spec. Failures loop back to the Execute phase with diagnostic context; passes advance the output to human review gates.
This pattern applies Context Engineering principles: the bottleneck in agent performance is not model capability but the quality and completeness of context provided to the model.
Implementation
Code Examples
Can you create a React product card component? It should show the
product image, name, price, and have an add-to-cart button. Use
TypeScript and Tailwind. Make it responsive.This prompt lacks architecture context, has no acceptance criteria, and produces non-reproducible results.
# specs/product-card.spec.yaml
spec:
id: product-card-v2
title: Product Card Component
status: active
author: "@context-architect"
behavioral_contract:
description: >
A presentational React component that displays a single product
with image, name, formatted price, and an add-to-cart action.
inputs:
- name: product
type: "Product"
source: "src/types/product.ts"
outputs:
- rendered ProductCard component
- onAddToCart callback invocation with product ID
acceptance_criteria:
- id: ac-1
description: Renders product image with lazy loading
validation: unit-test
- id: ac-2
description: Displays formatted price using currency util
validation: unit-test
- id: ac-3
description: Calls onAddToCart with product.id on button click
validation: unit-test
- id: ac-4
description: Passes axe accessibility audit with zero violations
validation: a11y-check
- id: ac-5
description: Responsive layout at 320px, 768px, and 1024px breakpoints
validation: visual-regression
context_references:
- path: context/frontend-standards.md
- path: context/component-patterns.md
- path: src/types/product.ts
- path: src/components/ExampleCard.tsx # golden sample
scope:
includes:
- ProductCard component implementation
- Unit tests for all acceptance criteria
excludes:
- Cart state management
- API integrationConsiderations
- • **Reproducibility.** The same spec produces consistent agent output regardless of phrasing, session, or agent model.
- • **Evaluability.** Machine-readable acceptance criteria enable automated validation through the [[eval-harness]], reducing reliance on manual review.
- • **Knowledge accumulation.** Specs and Context Packets are versioned artifacts that capture institutional knowledge. They survive developer turnover and agent model changes.
- • **Parallelization.** Multiple agents can execute against different specs simultaneously because each spec is self-contained with its own context.
- • **Governance integration.** Specs provide a natural gate for [[gate-based-governance]] — review the spec before authorizing execution, then review agent output against the spec criteria.
- • **Measurable improvement.** Teams can track spec pass rates over time and identify which context gaps cause the most failures.
- • **Upfront investment.** Writing a Live Spec takes more time than typing a prompt. The payoff comes from reuse, reproducibility, and reduced rework — but teams must commit to the practice before seeing returns.
- • **Spec maintenance.** Specs must evolve with the codebase. Stale specs produce incorrect agent output. Teams need processes (or agents) to keep specs current.
- • **Context Packet curation.** Assembling and maintaining high-quality Context Packets requires ongoing effort from the [[context-architect]]. Under-specified context leads to the same problems as ad-hoc prompting.
- • **Tooling maturity.** The ecosystem for spec-driven agent workflows is still developing. Teams may need to build custom tooling for spec parsing, context assembly, and eval harness integration.
- • **Cultural shift.** Developers accustomed to direct coding or conversational prompting may resist the overhead of writing specs. Leadership must reinforce that specs are the primary engineering artifact in an agentic workflow.