Design Patterns
WorkflowIntermediate

Spec-Driven Development

Replace ad-hoc prompting with structured Live Specs and Context Packets to produce deterministic, evaluable agent outputs through the Specify-Execute-Evaluate cycle.

Overview

Spec-Driven Development is a workflow pattern in which every unit of agent work begins with a structured specification — a Live Spec — rather than an ad-hoc natural-language prompt. The Context Architect authors the spec, an agent executes against it, and an Eval Harness validates the result. This three-phase loop — Specify, Execute, Evaluate — is the Spec Driven Development methodology described in the Agentic Development Handbook, and it is the primary alternative to Vibe Coding.

Problem

Teams that rely on ad-hoc prompting encounter predictable failure modes:

  • Non-reproducible outputs. The same intent phrased differently produces different code. There is no stable artifact to version, diff, or review.
  • Missing context. Prompts rarely carry the full system context an agent needs — architecture constraints, interface contracts, quality standards — so the agent guesses, and guesses diverge.
  • No evaluation anchor. Without machine-readable acceptance criteria, there is no way to automatically verify whether agent output satisfies the requirement. Review becomes a manual, subjective process.
  • Drift across sessions. Knowledge evaporates between agent sessions. Each new conversation starts from zero unless the developer manually re-supplies context.

These problems compound as teams scale the number of agents and tasks. What works for a single developer chatting with a copilot breaks down when multiple agents execute in parallel across a codebase.

Solution

Replace the ad-hoc prompt with a formal specification layer composed of two artifacts:

  1. Live Spec — A versioned, machine-readable document that defines what the agent must build, including behavioral contracts, acceptance criteria, and references to relevant context.
  2. Context Packet — A bundled set of files, schemas, examples, and instructions that the agent receives alongside the spec. Context Packets supply the how — architecture decisions, coding standards, API contracts, and Golden Samples that demonstrate expected output quality.

The Context Architect authors and maintains these artifacts. Execution follows the Triangular Workflow:

  1. Specify — The Context Architect writes or updates the Live Spec with clear acceptance criteria and attaches the relevant Context Packet.
  2. Execute — The agent receives the spec and context, then produces code, tests, or documentation.
  3. Evaluate — The Eval Harness runs automated checks against the acceptance criteria defined in the spec. Failures loop back to the Execute phase with diagnostic context; passes advance the output to human review gates.

This pattern applies Context Engineering principles: the bottleneck in agent performance is not model capability but the quality and completeness of context provided to the model.

Implementation

1

2

3

4

5

6

Code Examples

Ad-Hoc Prompt (Before)
Can you create a React product card component? It should show the
product image, name, price, and have an add-to-cart button. Use
TypeScript and Tailwind. Make it responsive.

This prompt lacks architecture context, has no acceptance criteria, and produces non-reproducible results.

Live Spec (After)
# specs/product-card.spec.yaml
spec:
  id: product-card-v2
  title: Product Card Component
  status: active
  author: "@context-architect"

behavioral_contract:
  description: >
    A presentational React component that displays a single product
    with image, name, formatted price, and an add-to-cart action.
  inputs:
    - name: product
      type: "Product"
      source: "src/types/product.ts"
  outputs:
    - rendered ProductCard component
    - onAddToCart callback invocation with product ID

acceptance_criteria:
  - id: ac-1
    description: Renders product image with lazy loading
    validation: unit-test
  - id: ac-2
    description: Displays formatted price using currency util
    validation: unit-test
  - id: ac-3
    description: Calls onAddToCart with product.id on button click
    validation: unit-test
  - id: ac-4
    description: Passes axe accessibility audit with zero violations
    validation: a11y-check
  - id: ac-5
    description: Responsive layout at 320px, 768px, and 1024px breakpoints
    validation: visual-regression

context_references:
  - path: context/frontend-standards.md
  - path: context/component-patterns.md
  - path: src/types/product.ts
  - path: src/components/ExampleCard.tsx  # golden sample

scope:
  includes:
    - ProductCard component implementation
    - Unit tests for all acceptance criteria
  excludes:
    - Cart state management
    - API integration

Considerations

Benefits
  • **Reproducibility.** The same spec produces consistent agent output regardless of phrasing, session, or agent model.
  • **Evaluability.** Machine-readable acceptance criteria enable automated validation through the [[eval-harness]], reducing reliance on manual review.
  • **Knowledge accumulation.** Specs and Context Packets are versioned artifacts that capture institutional knowledge. They survive developer turnover and agent model changes.
  • **Parallelization.** Multiple agents can execute against different specs simultaneously because each spec is self-contained with its own context.
  • **Governance integration.** Specs provide a natural gate for [[gate-based-governance]] — review the spec before authorizing execution, then review agent output against the spec criteria.
  • **Measurable improvement.** Teams can track spec pass rates over time and identify which context gaps cause the most failures.
Challenges
  • **Upfront investment.** Writing a Live Spec takes more time than typing a prompt. The payoff comes from reuse, reproducibility, and reduced rework — but teams must commit to the practice before seeing returns.
  • **Spec maintenance.** Specs must evolve with the codebase. Stale specs produce incorrect agent output. Teams need processes (or agents) to keep specs current.
  • **Context Packet curation.** Assembling and maintaining high-quality Context Packets requires ongoing effort from the [[context-architect]]. Under-specified context leads to the same problems as ad-hoc prompting.
  • **Tooling maturity.** The ecosystem for spec-driven agent workflows is still developing. Teams may need to build custom tooling for spec parsing, context assembly, and eval harness integration.
  • **Cultural shift.** Developers accustomed to direct coding or conversational prompting may resist the overhead of writing specs. Leadership must reinforce that specs are the primary engineering artifact in an agentic workflow.