Glossary
EvaluationEmerging

Architectural Violation Rate

The metric tracking how often agent-generated code attempts to violate domain boundaries, dependency rules, or structural constraints.

Definition

The Architectural Violation Rate tracks how often agent-generated code attempts to violate domain boundaries, dependency rules, or structural constraints defined by the Principal Systems Architect. It is calculated as:

Architecture test failures / Total agent PRs submitted

Architecture tests check structural properties: dependency direction between modules, layer boundary compliance, naming convention adherence, and API contract conformance. These tests run as part of the Eval Harness before agent output reaches human review, catching violations early in the pipeline.

The rate falls into three interpretive ranges:

  1. Below 0.05 — constraints are well-defined and agents are consistently producing structurally sound code. The Golden Samples and architectural documentation in the Context Index are effectively guiding agent behavior.
  2. 0.05 to 0.15 — moderate violation rate indicating that some constraints are unclear or some context is missing. Common causes include newly introduced architectural rules that have not yet been reflected in Golden Samples, or domain boundaries that are ambiguous in the documentation. Update the relevant samples and constraint definitions.
  3. Above 0.15 — systemic problem. Agents are regularly producing code that violates structural rules, indicating either that the architectural constraints are not reaching agents through Context Packets, or that the constraints themselves are inconsistent with the codebase's actual structure. This requires a focused Boundary Audit to realign rules with reality.

The Architectural Violation Rate complements the Pattern Consistency Score: the violation rate measures hard constraint failures (code that breaks rules), while the consistency score measures soft adherence (code that follows patterns). A low violation rate with a low consistency score indicates that agents are respecting boundaries but not following stylistic conventions — a Golden Samples quality issue rather than an architectural rules issue.

This metric is reviewed during the monthly Boundary Audit and tracked on the AgentOps Dashboard.

Last updated: 3/11/2026