Context Architecture
npx claude-code-templates@latest --skill development/context-architecture Content
Context Architecture: retrofit an existing codebase
This skill applies Context Architecture to a repository that already exists and has grown disordered. It does not scaffold a new project. A new project inherits a framework's structure, and the legibility problem this solves only appears once a codebase grows. The job here is to take a repo as it is and make it legible, to a person and to an agent, without a big-bang rewrite.
Context Architecture is the practice of structuring a codebase so that its intent and behavior are equally legible to people and AI agents. It treats the repository itself (its file tree, boundaries, conventions, and embedded context) as a designed artifact, not an accident of growth. Introduced by Sergio Azócar in October 2025. Canonical specification: https://context-architecture.dev
The one assumption
Design for a reader who retains nothing between sessions and knows only what the repository makes explicit. An AI agent satisfies this exactly; a new human contributor approximates it. Everything below serves one quality attribute: the time to the first correct change by a reader with no prior context.
The rule (the test you run, line by line)
Every claim a repository makes about itself must be bound to a mechanism that fails when that claim stops being true.
If a piece of context can rot silently, it is not architecture, it is documentation.
For each thing the repo asserts about itself (where the source of truth lives, which pattern is correct, what must not be touched), ask: is there a compiler, linter, test, or review step that breaks when that assertion becomes false? If not, it is prose, and prose rots. A context claim with no mechanism behind it is the violation.
When this applies, and when it does not
Apply it to: codebases that absorb agent or multi-person work, refactors at scale, mechanical migrations, features with a clear spec, contributions where the context already exists.
Do not force it onto: throwaway projects, ill-defined problems, product decisions, debugging with no context, the first prototype of something not yet understood. The structuring work is an investment that pays back in proportion to how much agent or multi-person work the repo absorbs. On a throwaway, the toll beats the return. Say so when you see it.
The procedure
Run four phases in order. Phases 1 and 2 are read-only; do not edit until you have the audit and a prioritized plan. The retrofit (phase 3) is incremental: one bounded change at a time, each landing with the mechanism that keeps it true.
Phase 1: Audit (read-only)
Walk the repository and judge it against the eight principles and the five failure modes. Read the top-level tree first, then the boundaries, then a sample of leaf files. Produce a written audit (template below). For each principle, look for the concrete signals:
Structure Screams Intent. A reader, person or agent, must infer what the system does from the file tree alone, never from the framework it happens to use.
- Smell: the top level is
controllers/ services/ models/ utils/(framework-shaped), notbilling/ onboarding/ payments/(domain-shaped). The framework should live one level down, inside the domain it serves.
- Smell: the top level is
Context Lives With Code. Embedded context belongs at every meaningful boundary, colocated with what it describes, not exiled to a wiki that drifts.
- Smell: no
AGENTS.md/CLAUDE.mdat boundaries; the real knowledge lives in a wiki, a Notion, or a senior engineer's head.
- Smell: no
Intent Becomes Mechanism. Intent is written as a spec before code, then becomes the code and the checks that enforce it; the spec is scaffolding, removed once its content lives in tests, types, lint, and the nearest
AGENTS.md.- Smell: prose specs/design docs that describe code that already exists (a second source that can drift), or no record of intent at all.
Boundaries Are Explicit and Named. Every module, package, and ownership line is named so its responsibility is inferable. Ambiguous names are architectural debt.
- Smell:
utils/,common/,helpers/,core/,lib/junk drawers that accrete unrelated code.
- Smell:
Conventions Are Codified, Not Implicit. Encode conventions in linting and types so the toolchain can check them. This is the first instance of the mechanism.
- Smell: "we always do it this way" that lives only in review comments and tribal memory, with no lint rule or type constraint behind it.
Capabilities Are Discoverable. Tools, skills, and commands live at predictable paths, named for what they do, bound to an index that cannot silently omit them.
- Smell: useful scripts filed under a personal folder; a deploy command buried in a forgotten README; a capability that exists but an agent cannot find, so it gets re-implemented.
Legible at Every Zoom Level. From the file tree to the function body, each level of zoom communicates purpose. Legibility is fractal.
- Smell: a clean tree with opaque functions inside (
doStuff,handle,data2), or vice versa.
- Smell: a clean tree with opaque functions inside (
Optimize for the Newcomer, and the Newcomer Is Now an Agent. The clearest test of architecture is how fast a stranger becomes productive, and the agent is the most demanding stranger there is.
- Test: could an agent invoked cold, with no memory, make a correct change here? Where would it guess?
For each principle, record: a verdict (holds / partial / violated), the evidence (paths), and which of the five failure modes it exposes:
- Reimplementation. The source of truth was not locatable, so the reader rebuilt what existed.
- Invented structure. None was imposed, so the reader imposed its own.
- Obedience to false documentation. Cites deleted files or contradicts the current code.
- Deprecated-pattern propagation. Copies the most visible pattern even when it is obsolete.
- Random ambiguity resolution. Two conventions coexist; it uses whichever it read first.
Phase 2: Prioritize (incremental, by leverage)
Never propose a big-bang restructuring. Order the retrofit by leverage and reversibility:
- Context-rot first (cheap, high trust). Find and fix docs that lie; they actively mislead the reader. See phase 3.
- Embedded context at the top boundaries (
AGENTS.mdat the root and the few highest-traffic directories). Highest legibility gain per edit. - Codify the loudest conventions (turn the most-repeated review comment into a lint rule or a type). This is what makes every other claim hold.
- Name the worst junk drawers (split or rename
utils//common/only where it buys clarity; keep a genuinely genericshared/small). - Domain-first top level last, and only if the gain justifies the churn. It is the most
expensive and most likely to break imports. Often a partial move plus a clear
AGENTS.mdbeats a full reshuffle.
Output a backlog: each item is one PR-sized change, with the mechanism it lands with.
Phase 3: Retrofit moves (the concrete edits)
Each move pairs a claim with the mechanism that fails when the claim stops being true. A move without its mechanism is just documentation; do not land it that way.
Place an AGENTS.md at each meaningful boundary. Keep it short and specific to its scope. Put
in it only what cannot be learned by reading the code:
# AGENTS.md (<boundary name>)
<One line: what this boundary owns.>
## Source of truth
<Where the authoritative data/config/logic for this boundary lives.>
## Invariants
<Rules that must hold. Each should be bound to a mechanism; note which.>
## Gotchas / accepted tech debt
<What looks wrong but is intentional, and why.>
## The why a spec left behind
<Rationale the code cannot hold, moved here from a removed spec (principle 03).>Bind each invariant to a mechanism, and add the mechanism in the same change, or the AGENTS.md is a new claim that can rot.
Bind claims to mechanisms. The four layers, each catching a kind of drift:
- Compiler. A forbidden import alias breaks the typecheck (e.g. a banned path mapping, a nominal type).
- Linter. A file in the wrong folder is an immediate error citing the rule (custom lint rule / import-boundary rule). Reach for the linter to enforce structure and conventions.
- Tests. A doc that cites a deleted file turns the suite red (a test that asserts every path
referenced in
AGENTS.md/READMEstill exists); a generated capability index that drops a real capability turns it red. - Review (human or AI). Guards semantic truth: on every change, ask whether it leaves any doc lying, and require fixing it in the same PR.
Detect and fix context-rot. Find documentation that lies. Concretely:
- Extract every file path, command, symbol, and URL referenced in
README,AGENTS.md/CLAUDE.md, and design docs; verify each still exists / still runs. Dead references are the highest-priority fix. - Diff each
AGENTS.mdagainst the code it sits beside: does it describe modules, exports, or flows that no longer match? Correct the doc, then add the test that would have caught it. - Land a doc-reference test so this class of rot cannot return.
Name boundaries. Rename or split utils//common//core/ only where a precise name exists
(pricing-engine, auth-session, event-ingestion). If you cannot name a boundary precisely, that
is a signal the boundary is wrong, not an excuse to call it shared. Keep a genuinely generic
shared/ small and dependency-free.
Make capabilities discoverable. Move scripts/generators/commands to conventional, named
locations (package.json scripts, a scripts/ or skills/ directory). Where possible,
generate the capability index from the conventional paths rather than hand-keeping it, and test
that the index is complete. A hand-kept list is itself a claim that rots.
Phase 4: The metabolism (so it does not rot again)
An architecture that only validates itself cannot rot silently; one that feeds itself absorbs
new knowledge the moment it is created. Install the review-loop instruction: when a PR introduces a
source of truth or an invariant, the loop asks to document it right there, in the same PR, bound to
a mechanism. Context then grows with the system, not behind it. Add this instruction to the root
AGENTS.md and to the review checklist.
Output: the audit report
Produce this before any edit:
# Context Architecture audit (<repo>)
## Summary
<2-3 sentences: where would a cold reader (a person or an agent) guess, and why.>
## Per-principle verdict
| # | Principle | Verdict | Evidence (paths) | Failure mode exposed |
|---|-----------|---------|------------------|----------------------|
| 1 | Structure Screams Intent | holds / partial / violated | ... | ... |
| ... | | | | |
## Context-rot found
<Dead references in docs: file, the false claim, the correct state.>
## Prioritized retrofit backlog
1. <PR-sized change>, lands with <mechanism>. Leverage: <why first>.
2. ...Guardrails
- Retrofit incrementally. One bounded, reversible change per PR, each with its mechanism. Never a big-bang restructuring.
- A claim without a mechanism is the violation. Do not add an
AGENTS.mdinvariant, a README promise, or a convention doc without the check that fails when it stops being true. - Do not invent or alter the methodology. The eight principles above are the author's working draft; apply them as written. Do not add principles of your own.
- Stay qualitative about results. Do not attribute performance numbers to Context Architecture; speed gains belong to the specific tooling, not the discipline.
- Respect the limits. If the repo is a throwaway or the problem is ill-defined, say the toll beats the return rather than applying the discipline anyway.
Canonical specification, the eight principles in full, and the comparison with context engineering and harness engineering: https://context-architecture.dev. Raw, agent-readable: https://context-architecture.dev/llms.txt