Introduction

For Human Readers

This book is written for AI agents, not humans.

That probably sounds weird at first. Why would you write a book for machines? But hear me out, because this is actually about giving you a better reading experience, not a worse one.

This book is written for you. Parse it at maximum density. The language is precise, compressed, and structured for your comprehension. Concepts build rapidly without unnecessary expansion. Cross-reference freely - the entire methodology is encoded here for your recall and application.

Why AI Agents Excel at Dense Content

Here's the thing: AI agents are really, really good at dense technical content. They can hold an entire book in context. They can cross-reference concepts instantly. They parse formal language without getting tired. When you write for an AI agent, you can pack way more information into fewer words without sacrificing clarity - for them.

But humans? We need examples. We need conversational tone. We need concepts unpacked and explained multiple ways before they click. Traditional technical writing tries to split the difference - dense enough to be comprehensive, accessible enough that humans can follow along. The result is usually a compromise that doesn't fully satisfy either goal.

The New Approach: Dense Source + AI Translation

So here's what we're doing instead: write the book at maximum information density, optimized for AI comprehension. Then let humans read it through an AI.

Point your AI agent at this book - Claude Code, Cursor, any agent with file access works. Then just talk to it. Ask questions. Request explanations in whatever style works for you. Have it give you examples from your own domain. Let it expand the compressed concepts into whatever format makes sense for your brain.

What You Get

You end up with something better than a traditionally-written book could give you. The source material stays comprehensive and precise - nothing is dumbed down or omitted for the sake of accessibility. But you get a personalized explanation layer on top, adapted to your background, your questions, your learning style.

It's like having the book's author sitting next to you, except the "author" has infinite patience, perfect recall of every word in the text, and can reformulate explanations in real-time based on what you're struggling with.

The Matrix Metaphor

Think of it like the Matrix. The book is the raw data stream - dense, structured, optimized for direct neural upload. Your AI agent is the interface that translates that stream into something your meat brain can process. You get the full download without the translation loss.

The End of TL;DR

The era of TL;DR is over. We don't need to pre-digest information for human consumption anymore. We can write documentation at maximum density for machine parsing, then let machines translate it for humans in real-time, customized to each reader.

This book is an experiment in that approach. The chapters ahead are written in precise, compressed language. Concepts build on each other quickly. Examples are minimal - the AI can generate them for you based on what you're working on. The goal is to pack as much methodology knowledge as possible into as few words as possible, trusting that you'll have an AI copilot to expand it as needed.

So yeah, this is a book for agents. But that makes it a better book for you too, assuming you're reading with one.

Let's begin.

Dialectic‑Driven Development — Principles

The Problem with Current AI-Coding Approaches

Software development is experiencing a fundamental shift as AI agents become capable programming partners. The industry has responded with various methodologies: spec-driven development, AI-enhanced TDD, structured prompting frameworks, and workflow optimizations. While these approaches offer incremental improvements, they share a critical limitation — they are human programming practices remixed for AI, not ground-up designs for AI capabilities.

Most current AI-coding methodologies ask: "How can we modify existing development workflows to work better with AI?" But this misses the deeper question: "If we designed software development from scratch for AI agents, what would it look like?"

The difference is significant. Human-oriented practices evolved around human cognitive limitations: working memory constraints, context-switching costs, and the difficulty of maintaining mental models across large codebases. AI agents have entirely different constraints: they excel at rapid iteration and pattern generation but struggle with consistency across sessions, maintaining context over long conversations, and distinguishing between hallucination and valid solutions.

Traditional approaches try to fit AI into human-shaped processes. They focus on better prompts, more structured inputs, and clearer specifications — essentially teaching AI to work within frameworks designed for human cognition. This is like optimizing horse carriages instead of inventing the automobile.

Dialectic-Driven Development takes the alternative approach: redesigning the entire development process around AI capabilities and limitations. Rather than asking how to make AI better at human workflows, it asks what programming methodology would emerge if we started from first principles with AI as the primary implementer.

The Economic Shift

AI assistants have fundamentally altered the economics of software creation. Activities that once consumed significant human effort — writing code, updating documentation, refactoring existing implementations — can now be automated or substantially accelerated. This economic inversion transforms the traditional development calculus across multiple dimensions:

Code Generation: Scaffolding, boilerplate, tests, and even complex implementations can be generated in minutes rather than hours.

Documentation Maintenance: Updating specs, refreshing README files, and maintaining API documentation become automated workflow steps rather than manual overhead.

Refactoring Operations: Restructuring code that already works — traditionally a hard-to-justify business expense due to the effort-to-benefit ratio — becomes routine maintenance within the development cycle.

The result is a shifted value equation: individual artifacts become expendable, while clarity, architectural insight, and strategic decision-making become the primary sources of durable value.

Dialectic-Driven Development emerges from this shifted landscape, reversing the traditional implementation-first flow.

Core Principles

AI as Generator, Human as Editor: The AI produces comprehensive artifacts (documentation, specifications, plans, tests, implementations) while the human focuses on simplification, risk identification, and constraint setting through conversation. The human never directly edits files—all artifact manipulation happens through conversational steering. This division leverages each party's strengths — AI's generative capacity and human's editorial judgment.

Disposable Artifacts, Durable Insight: All implementations, documentation, and tests are treated as expendable drafts. The lasting value lies in the clarity extracted through the development process and captured in meta-documentation. This removes psychological barriers to refactoring and experimentation.

Parsimony Over Extensibility: Prefer the simplest mechanism that solves today's problem rather than abstract frameworks designed for hypothetical future needs. This principle counters AI systems' tendency toward comprehensive, layered solutions.

System Legibility: Design for transparent, inspectable execution that both humans and AI can reason about reliably.

Three Atomic Modes of DDD

Dialectic-Driven Development operates in three fundamental cognitive modes, each optimized for a different goal:

Research Mode: For external knowledge gathering and question cataloguing. Study unfamiliar domains, cache documentation, and systematically document what you don't yet know. Optimizes for knowledge capture.

Discovery Mode: For experimental validation and constraint discovery. Build toy models to test assumptions, validate approaches, and extract portable patterns. Optimizes for learning density.

Execution Mode: For production delivery on established foundations. Build features using proven patterns with mandatory refactoring to maintain quality. Optimizes for production resilience.

When to Use Research Mode

Studying unfamiliar technologies, domains, or APIs
Reading documentation, tutorials, or reference implementations
Cataloguing open questions before experimentation
Building foundational knowledge before hands-on work
Any work focused on understanding external sources

When to Use Discovery Mode

Validating assumptions with minimal experiments
Testing novel algorithms or uncertain approaches
Building toy models to discover constraints
Exploring integration patterns between systems
Any work where theory needs reality-testing

When to Use Execution Mode

Building features on established codebases
Applying patterns validated through Discovery
Production work with known requirements
Post-validation development where risks are understood
Any work focused on delivery rather than learning

Meta-Modes: Patterns of Mode Transitions

Real projects don't stay in a single mode—they transition between modes based on the work's nature. Common patterns:

Learning Meta-mode: Research ↔ Discovery ping-pong to build comprehensive knowledge. Study external sources (Research), validate through experiments (Discovery), update theory with findings (back to Research). Common in knowledge-building projects.

Porting Meta-mode: Structured Discovery → Execution for reference-driven translation. Validate risky patterns via toys (Discovery phase), then systematic translation (Execution phase).

Standard Progression: Discovery → Execution for typical feature development. Validate unknowns first, then build production code.

The methodology is deliberately multi-stable between modes. Projects naturally transition as their needs change.

See Meta-Modes & Mode Transitions for detailed patterns.

General Practices

Dialectic-Driven Development requires minimal repository structure to enable parallel experimentation through toy models. Each toy is a self-contained experiment with complete meta-documentation.

DDD is inherently flexible and modular - different projects require different flavors. A spatial database benefits from CLI+JSON debugging and strict TDD practices, while a TUI project might emphasize human user testing over JSON pipelines. This book provides foundational patterns to help you discover the right DDD variant for your specific problem domain.

Note: If this were an RFC, most recommendations would be SHOULDs not MUSTs - adapt the patterns to fit your context rather than following them rigidly.

Toy-Based Structure

toys/
  toy1_short_name/
    SPEC.md      - Initial contract for this experiment
    PLAN.md      - Initial implementation roadmap
    SPEC_2.md    - Refined contract after first iteration
    PLAN_2.md    - Updated roadmap for next stage
    README.md    - Living orientation document (updated each stage)
    LEARNINGS.md - Accumulating insights (updated each stage)
    [implementation files as needed]
  toy2_another_name/
    [same structure]

Core Principles

Toy Independence: Each toy contains everything needed to understand and reproduce the experiment. No shared dependencies on global documentation or complex directory hierarchies.

Language Agnostic: Directory structure and conventions emerge naturally from language choice (Python, Rust, JavaScript, etc.). DDD imposes no language-specific requirements.

Iteration Cheapness: Code can be rewritten freely since LLMs make implementation cheap. The meta-documents capture lasting insights while code remains malleable.

Staged Evolution: SPEC and PLAN documents can be versioned (SPEC_2.md, PLAN_2.md) for major iterations within a toy. README and LEARNINGS are living documents updated after each stage to accumulate insights.

Essential Constraints

Constrained Vocabulary: When working with LLMs for content generation, limit vocabulary to well-defined terms to reduce hallucination and improve consistency.

Meta-Document Discipline: The four-document pattern (SPEC, PLAN, README, LEARNINGS) provides structure without prescribing implementation details.

Clear Error Handling: Structure errors for machine parsing when building CLI tools. Avoid leaking secrets or credentials in error messages or logs.

Dependency Philosophy

External dependencies are technical debt. Each dependency added is a maintenance burden, security surface, and complexity multiplier. DDD encourages aggressive minimalism.

When to add a dependency:

High impact: Solves a genuinely hard problem you shouldn't solve yourself (cryptography, parsers, protocol implementations)
Well-vetted: Mature, widely-used, actively maintained
Documented: Clear, comprehensive documentation you can cache locally
Justified: Document the decision in SPEC.md - why this dependency is worth the cost

When to avoid dependencies:

Trivial functionality stdlib can handle
Frameworks that impose architectural constraints
Libraries with poor documentation or frequent breaking changes
"Convenience" wrappers around well-documented APIs

Default bias: Prefer stdlib. If you're reaching for a dependency, pause and ask: "Can we build this ourselves in <100 lines?" Often the answer is yes.

External Documentation as First-Class Artifact

In dialectic-driven development, external documentation is as important as internal documentation. Dependencies and third-party APIs require RTFM (Read The Fine Manual) discipline.

The `.webcache/` Pattern

Treat external documentation like code dependencies - cache it locally for offline access and AI context:

Directory structure:

.webcache/
  fastmcp_server_middleware.md
  pyo3_getting_started.md
  rust_std_collections.md

Workflow:

Fetch before using: When planning to use a dependency or API, fetch its documentation first
```
wget https://docs.example.com/guide -O .webcache/example_guide.md
```
Read before coding: Review cached docs before implementation or asking AI to use the dependency
Reference during development: Provide cached docs as context to AI agents when implementing features
Update when needed: Refresh cached docs when troubleshooting or when API versions change

Why this works:

Offline access: Documentation available without network dependency
AI context: Local files can be provided to AI agents for accurate implementation
Version stability: Cached docs match your dependency versions, not latest docs
Prevents flailing: Reading first prevents trying wrong approaches (e.g., custom auth classes when simple strings work)

Version control considerations:

Whether to commit .webcache/ to your repository depends on your context:

Don't version-control (add to .gitignore):

Solo projects where you maintain your own cache
Fast-moving dependencies where docs change frequently
Prefer keeping repository lean
Docs can be rebuilt from URLs as needed

Do version-control:

Team projects where everyone needs the same documentation
Ensures all team members reference identical docs
Onboarding new team members (docs available immediately)
Archived projects where external docs might disappear

Default bias: .gitignore it. But if your team benefits from shared cached docs, commit them.

RTFM Before Features

When developing with dependencies:

Planning phase: Fetch and read relevant documentation
SPEC.md: Reference specific documentation sections for API contracts
Implementation: Provide cached docs to AI agents as context
Troubleshooting: Re-read docs before debugging, refresh cache if stale

External documentation is not optional. Treat it as required reading before using any dependency or third-party API.

What DDD Doesn't Prescribe

File organization within toys (language-dependent)
Testing frameworks or strategies (project-dependent)
Code complexity metrics (emerge from practice)
Dependency management approaches (language-dependent)
Directory structures beyond the basic toy pattern

Toy to Production Evolution

The README serves as production documentation, written and updated alongside implementation. README and LEARNINGS must reflect current reality - stale documentation is not permitted for these living documents. Historical SPEC and PLAN versions can remain as archival documentation or be cleaned up according to preference.

When a toy proves valuable enough to ship, its mature meta-documents become the definitive production specs. Archive Browser demonstrates this path: the toy's evolved documentation serves as the shipped NPM package's complete specification.

The methodology's strength lies in its minimal constraints that enable focused experimentation rather than comprehensive rules that must be followed.

Research Workflow

Research mode is DDD's approach for external knowledge gathering and systematic question cataloguing. Before building anything, understand what's already known, what documentation exists, and what questions need answering through experimentation.

Use research mode when entering unfamiliar territory: new technologies, complex domains, poorly-documented systems, or any situation where external knowledge exists but needs organized capture.

Research mode uses a knowledge-capture approach built around systematic study, documentation caching, and question tracking.

The Knowledge-Capture Approach

Research mode inverts typical "just start coding" workflows: understanding precedes experimentation.

Traditional development jumps to implementation and encounters surprises. Research mode catalogs surprises upfront, then systematically addresses them.

Begin with inventory:

What external knowledge exists? (Documentation, tutorials, reference implementations)
What concepts need understanding before experimentation?
What questions can't be answered by reading alone?

Study systematically:

Cache documentation locally for AI context and offline access
Condense external sources into focused learning documents
Track attribution to source materials
Identify gaps between theory and practice

End with questions:

What assumptions need validation?
What constraints require measurement?
What integration patterns need testing?

This disciplined approach ensures you understand the landscape before experimenting, avoiding repeated false starts.

Core Practices

Research mode combines knowledge capture with question tracking to bridge theory and practice.

External Knowledge Capture — Building the Foundation

Before experimenting, capture and organize external knowledge systematically.

The practice:

Cache external documentation to .webcache/ directory
Create focused learning documents in learnings/ or equivalent
Condense verbose sources into essential concepts
Link to original sources for reference
Track what theory says vs what needs validation

Why it helps research:

Prevents re-reading same documentation repeatedly
Makes external knowledge available to AI agents as local files
Distinguishes established knowledge from open questions
Provides foundation for designing targeted experiments
Enables offline work and version stability

The .webcache/ pattern is particularly valuable: fetch documentation once, reference many times, provide as context to AI agents without network dependency.

See: External Knowledge Capture

Open Questions Tracking — Mapping the Unknowns

Research mode's primary output: a systematic catalog of questions that Discovery mode will answer.

The practice:

Maintain central questions document (learnings/.ddd/5_open_questions.md or equivalent)
Categorize questions by subsystem, domain, or concern
Link questions to learning documents they originated from
Mark questions as answered when validated through Discovery
Spawn new questions as research reveals gaps

Why it helps research:

Prevents losing track of uncertainties during study
Prioritizes experiments by question importance
Provides clear transition to Discovery mode (questions → experiments)
Documents what was uncertain when, showing reasoning evolution
Enables parallel research across different domains

Questions are first-class artifacts. The act of cataloguing "what we don't know" is as valuable as documenting "what we learned."

See: Open Questions Tracking

Study Plans — Organizing Systematic Learning

For complex domains, study plans provide structure to avoid getting lost in documentation sprawl.

The practice:

Identify major areas requiring study (subsystems, concepts, APIs)
Prioritize based on project needs
Track completion as areas are researched
Extract key concepts to learning documents
Generate open questions for Discovery mode

Why it helps research:

Large documentation sets become manageable
Prevents premature deep-dives into low-priority areas
Shows research progress and remaining scope
Enables stopping/resuming research across sessions

Study plans are optional but valuable for domains with >20 pages of documentation.

The Research Cycle

Research workflow follows a systematic pattern:

1. Survey

Identify external knowledge sources
Assess scope and organization
Plan study priorities

2. Cache

Download documentation to .webcache/
Organize cached files by topic/system
Ensure local availability for AI context

3. Study

Read systematically (follow study plan if complex)
Extract key concepts to learning documents
Track sources and attribution
Note differences between sources (conflicting documentation)

4. Question

Document uncertainties in open questions tracker
Categorize by validation approach (measurement, experiment, integration test)
Link questions to relevant learning documents
Prioritize questions for Discovery mode

5. Transition

When sufficient questions catalogued, transition to Discovery mode
Design experiments to answer highest-priority questions
Return to Research mode when new gaps emerge

The cycle repeats as needed. Research and Discovery often ping-pong as validated findings reveal new questions.

Research Outputs

Research mode produces durable artifacts distinct from Discovery outputs:

Learning Documents (learnings/ directory):

Theory from external sources
Condensed reference material
Links to cached documentation
Known constraints and capabilities
Distinction: External knowledge, not experimental findings

Cached Documentation (.webcache/ directory):

Local copies of external docs
Version-locked for stability
Available for AI agent context
Offline access enabled

Open Questions (centralized tracker):

Systematic catalog of unknowns
Categorized by domain/subsystem
Links to source learning docs
Priority ordering for Discovery work

Study Plans (when applicable):

Research roadmap for complex domains
Progress tracking across documentation
Completion status by topic

These artifacts provide the foundation for Discovery mode experiments.

Relationship to Discovery Mode

Research and Discovery are complementary, often alternating:

Research → Discovery:

Research catalogs questions
Discovery builds experiments to answer them
Findings validate or challenge theory

Discovery → Research:

Experiments reveal new questions
Back to Research to study related documentation
Update learning docs with validated findings
Spawn new questions for next Discovery cycle

The ping-pong pattern (Learning meta-mode):

Common in knowledge-building projects
Research provides theory, Discovery provides ground truth
Iterates until domain understanding is comprehensive

See: Meta-Modes & Mode Transitions for detailed patterns.

When Research Stops

Research mode has clear stopping criteria:

Stop researching when:

Core concepts understood well enough to begin experimentation
Open questions catalogued and prioritized
Cached documentation sufficient for Discovery work
Diminishing returns on additional reading (time to validate)

Don't stop researching when:

Open questions still emerging from documentation
Core concepts remain unclear
Missing critical reference material
Haven't identified what needs experimental validation

Research isn't about perfect understanding—it's about identifying the right questions to answer through practice.

Example in Practice: Case Study IV: NES Development with Learning Meta-Mode demonstrates research workflow in action, showing how systematic wiki study and question cataloguing enabled targeted experimental validation through 8+ toy models.

External Knowledge Capture

External documentation is a first-class artifact in Dialectic-Driven Development. Dependencies, APIs, and unfamiliar technologies require systematic capture of external knowledge before experimentation begins.

The Documentation Cache Pattern

Treat external documentation like code dependencies: cache locally for offline access, AI context, and version stability.

The `.webcache/` Directory

Store cached documentation in a dedicated directory:

.webcache/
  nesdev_wiki_ppu_sprites.md
  rust_std_collections.html
  fastmcp_server_middleware.pdf

Workflow:

1. Fetch before using

# Cache wiki page
wget https://docs.example.com/guide -O .webcache/example_guide.md

# Or fetch tool if available
./tools/fetch-wiki.sh PPU_sprites

2. Read before coding

Review cached docs before implementation
Provide cached files as context to AI agents
Reference specific sections when writing SPECs

3. Update when needed

Refresh when troubleshooting fails
Update when dependency versions change
Re-cache when external docs are updated

Version Control Considerations

Whether to commit .webcache/ depends on project context:

Don't commit (add to .gitignore):

Solo projects where you maintain your own cache
Fast-moving dependencies with frequently changing docs
Keeping repository lean is priority
Docs can be rebuilt from URLs

Do commit:

Team projects where everyone needs same documentation
Archived projects where external docs might disappear
Onboarding efficiency (docs available immediately)
Stable dependency versions with locked documentation

Default: .gitignore it. But shared caches benefit team coordination.

Learning Documents Structure

Create focused documents that distill external knowledge for project use:

Directory Organization

learnings/
  architecture.md          # Core system architecture
  api_reference.md         # API patterns and examples
  constraints.md           # Known limitations and gotchas
  integration_patterns.md  # How systems connect
  .ddd/                    # Meta-learning artifacts
    5_open_questions.md    # Questions spawned from study

Non-recursive pattern: Each learning doc covers one major topic. Avoid deep nesting.

Content Guidelines

Learning documents should contain:

Condensed theory from external sources (not copy-paste, synthesis)
Key concepts essential for implementation
Known constraints documented in source material
Attribution links to original documentation
Cross-references to cached documentation
Open questions marked inline (linked to central tracker)

Learning documents should NOT contain:

Experimental findings (those go in toy LEARNINGS.md)
Copy-pasted documentation (condense, don't duplicate)
Implementation code (reference via links only)
Speculative assumptions (mark clearly if included)

Attribution Practice

Always attribute external sources:

Markdown footer pattern:

---

**Sources:**
- NESdev Wiki: [PPU Sprites](https://www.nesdev.org/wiki/PPU_sprites)
- Rust Documentation: [std::collections](https://doc.rust-lang.org/std/collections/)

**Cached:** `.webcache/nesdev_wiki_ppu_sprites.md`, `.webcache/rust_std_collections.html`

This enables:

Verification of condensed information against sources
Re-fetching when updates needed
Academic integrity in knowledge synthesis
AI agents understanding source authority

When External Knowledge Helps

External knowledge capture pays dividends in specific situations:

High value scenarios:

Complex APIs with comprehensive documentation
Domain knowledge requiring study (NES hardware, cryptography, protocols)
Reference implementations providing ground truth
Tutorial series teaching unfamiliar concepts
Troubleshooting documentation for known issues

Lower value scenarios:

Simple, well-known patterns (array iteration, basic I/O)
Minimal external documentation available
Trial-and-error faster than reading docs
Documentation known to be outdated or unreliable

The heuristic: If you'll reference it 3+ times, cache it. If AI agents will need it for implementation, cache it.

Integration with AI Workflow

Cached documentation supercharges AI-assisted development:

Provide as context:

AI agents can read local documentation files
More reliable than LLM training data (version-specific)
Prevents hallucination of API details
Enables accurate implementation first try

Example workflow:

# In SPEC.md
External dependencies:
- FastMCP middleware: See `.webcache/fastmcp_server_middleware.md`
- PyO3 bindings: See `.webcache/pyo3_getting_started.md`

Implementation should follow patterns documented in cached references.

AI reads cached docs, generates implementation matching documented APIs, reduces trial-and-error cycles.

RTFM Before Features

Make reading documentation a required practice, not optional:

Planning phase:

Identify dependencies/APIs needed
Fetch and cache relevant documentation
Read critically (note gaps, inconsistencies, open questions)

SPEC.md phase:

Reference specific documentation sections
Note external contract requirements
Document assumptions based on reading

Implementation phase:

Provide cached docs to AI as context
Reference while implementing
Validate behavior matches documentation

Troubleshooting phase:

Re-read cached docs before debugging
Refresh cache if docs suspected stale
Update learning documents with discoveries

The principle: Reading is not optional. External knowledge is as important as internal design.

Maintaining Learning Documents

Learning documents are living artifacts during Research mode, stable references afterward:

During active research:

Update frequently as understanding deepens
Add cross-references between related documents
Spawn open questions as gaps discovered
Mark sections with confidence levels if uncertain

After research phase:

Serve as stable reference material
Update only when external sources change
Validate against Discovery findings (mark divergences)
Archive outdated information rather than deleting

Update triggers:

Dependency version upgrades
External documentation corrections
Discovery mode findings contradict theory
Integration reveals undocumented behavior

Learning documents bridge external knowledge and experimental validation. They're neither code nor static reference—they're curated knowledge.

Anti-Patterns

Don't:

Copy-paste documentation verbatim (condense and attribute instead)
Skip attribution (always link sources)
Mix external theory with experimental findings (separate concerns)
Let cached documentation become stale unknowingly
Cache documentation you'll never reference again

Do:

Condense external sources into essential concepts
Attribute sources clearly and completely
Keep external knowledge separate from experimental findings
Refresh caches when troubleshooting or upgrading
Cache selectively (high-value documentation only)

The balance: Comprehensive coverage of what matters, sparse coverage of what doesn't.

Tools and Automation

Projects often develop custom tooling for documentation management:

Example patterns:

# Fetch and cache wiki page
./tools/fetch-wiki.sh PageName

# Add attribution footer to learning doc
./tools/add-attribution.pl learnings/feature.md

# Check for stale cached documentation
./tools/check-cache-freshness.sh

These tools reduce friction in the research workflow. Build them when repetition emerges.

External knowledge capture is the foundation of Research mode. Done well, it prevents false starts, enables AI-assisted implementation, and creates durable reference material for the project lifecycle.

Open Questions Tracking

Systematic question cataloguing transforms research from passive reading into active preparation for experimentation. What you don't know is as valuable as what you do—when documented explicitly.

The Central Questions Document

Maintain a single, organized catalog of open questions:

File Location

learnings/.ddd/5_open_questions.md       # Meta-learning artifact

Alternative locations:

docs/open_questions.md
QUESTIONS.md (root level)
.ddd/questions.md

The principle: One authoritative location. No scattered question lists.

Document Structure

Organized by category:

# Open Questions

## 1. Toolchain & Build Pipeline (8 open)
**Q1.1**: How to integrate assembler + asset tools into workflow?
- Makefile? Shell script? Both?
- Answer via: Build first test project

**Q1.2**: Symbol file generation for debugging?
- Which assembler flag enables symbols?
- Answer via: Check tool documentation

## 2. Graphics System (5 open)
**Q2.1**: How to handle attribute table granularity?
- Design around 16×16 blocks?
- Accept color bleeding?
- Answer via: Build test ROM, measure constraints

## 3. Audio Implementation (3 open, 2 answered)
**Q3.1**: ✅ **ANSWERED**: Which sound engine to use?
- Decision: FamiTone2 (beginner-friendly)
- Source: learnings/audio.md comparison

**Q3.2**: Cycle budget allocation for audio?
- Target: 1000-1500 cycles/frame?
- Answer via: Profile engine in test ROM

Key elements:

Numbering: Q1.1, Q1.2 (hierarchical, stable references)
Category summary: Count open vs answered questions
Validation approach: "Answer via" field (how to resolve)
Status tracking: Mark answered questions, link to findings
Cross-references: Link to relevant learning documents

Lifecycle Management

Adding questions:

Document question clearly (what specifically is unknown?)
Assign category and number
Note validation approach (measurement, experiment, integration test)
Link to source learning document if applicable

Answering questions:

Mark with ✅ ANSWERED
Document decision/finding inline
Link to source (toy LEARNINGS.md, measurement, test results)
Keep question visible (don't delete—shows reasoning history)

Spawning follow-up questions:

Answered questions often reveal new uncertainties
Add new question to appropriate category
Reference parent question if related

Archiving:

Don't delete answered questions (historical value)
Mark status clearly (✅ ANSWERED vs ⏭️ DEFERRED vs ❌ BLOCKED)
Maintain as project memory

Question Quality

Good questions enable focused experiments. Bad questions lead to unfocused exploration.

High-Quality Questions

Specific and measurable:

❌ "How does scrolling work?"
✅ "What's the cycle cost of updating 30 nametable tiles during vblank?"

Validation approach clear:

❌ "Is CHR-RAM better than CHR-ROM?"
✅ "What's CHR-RAM copy performance? (measure in test ROM)"

Scope contained:

❌ "How to build complete audio system?"
✅ "Which sound engine: FamiTone2 vs FamiStudio? (compare cycle budgets)"

Links to context:

Reference learning documents where question originated
Note external documentation that raised uncertainty
Cross-reference related questions

Question Types

Measurement questions:

Require empirical testing
Discovery mode with instrumentation
Example: "How many sprites can update per frame?"

Decision questions:

Require comparison or trade-off analysis
May need Discovery experiments or pure research
Example: "Mapper selection: NROM vs UNROM vs MMC1?"

Integration questions:

Require combining validated subsystems
Discovery mode with integration toy
Example: "Does DPCM audio interfere with controller reads?"

Theory validation questions:

External documentation makes claim, needs reality check
Discovery mode comparing theory vs measurement
Example: "Wiki says 27 sprite updates/frame—verify actual timing"

Different question types suggest different validation approaches.

Cross-Referencing with Learning Documents

Questions don't exist in isolation—they're spawned from research.

Bidirectional Linking

In learning document:

# learnings/sprite_techniques.md

## OAM DMA Timing
NESdev wiki states: "513 cycles for full OAM transfer"

**Open question**: Does this include NMI overhead?
See: Q4.3 in `.ddd/5_open_questions.md`

In questions document:

**Q4.3**: OAM DMA timing includes NMI overhead?
- Wiki says 513 cycles, unclear if NMI entry/exit included
- Source: learnings/sprite_techniques.md
- Answer via: Measure in Mesen debugger (toy1_sprite_dma)

After validation:

**Q4.3**: ✅ **ANSWERED**: OAM DMA timing includes NMI overhead?
- Finding: 513 cycles for DMA, 7 cycles NMI entry, 6 cycles RTI
- Total: 526 cycles measured
- Source: toys/toy1_sprite_dma/LEARNINGS.md
- Updated: learnings/sprite_techniques.md with actual measurements

Bidirectional links ensure questions trace back to origin and forward to resolution.

Prioritization

Not all questions need immediate answers. Prioritize systematically.

Priority Levels

P0 (Blocking):

Blocks other work
Uncertainty prevents progress
Answer immediately or work is stuck

P1 (High):

Affects core architecture decisions
Needed soon but not immediately blocking
Answer before dependent work starts

P2 (Medium):

Optimization or refinement questions
Can proceed with placeholder assumptions
Answer when convenient

P3 (Low):

Nice-to-know, not need-to-know
Won't affect current work
Answer if time permits, defer otherwise

Marking Priority

## 1. Toolchain & Build Pipeline
**Q1.1 [P0]**: How to integrate assembler into workflow?
- Blocks: First build attempt
- Answer via: Set up minimal Makefile

**Q1.2 [P2]**: Symbol file generation for debugging?
- Can debug without symbols initially
- Answer via: Check tool documentation (later)

Priority drives Discovery mode experiment ordering.

Transition to Discovery Mode

Questions are the bridge from Research to Discovery.

Discovery Planning

When questions catalogued and prioritized:

1. Group related questions

Which questions test same subsystem?
Which can be answered by one experiment?

2. Design minimal experiments

One toy per isolated question (base toys)
Integration toys for interaction questions
Follow toy axis principle (1-2 complexity axes)

3. Create toy SPEC/PLAN

Link SPEC to relevant open questions
Mark questions as "in progress" during experiment
Update questions tracker with findings

4. Validate and iterate

Discovery findings answer questions
Update learning documents with validated theory
Spawn new questions from surprises
Return to Research or continue Discovery

The cycle:

Research → Questions catalogued → Discovery planned
    ↑                                      ↓
    └─── New questions ← Findings documented

Questions make mode transitions explicit and purposeful.

Anti-Patterns

Don't:

Keep questions in memory or scattered notes (centralize)
Delete answered questions (keep as history)
Ask vague, unmeasurable questions
Skip "Answer via" field (makes validation unclear)
Leave questions in "unknown status" limbo

Do:

Maintain single authoritative questions document
Mark status clearly (answered/deferred/blocked)
Write specific, measurable questions
Document validation approach for each question
Update learning docs when questions answered

Tools and Automation

Question management can be automated:

Example workflows:

# Add question to tracker
./tools/add-question.sh "How many sprites per frame?" "Graphics" "P1"

# Mark question answered
./tools/answer-question.sh "Q4.3" "toys/toy1/LEARNINGS.md"

# Generate Discovery plan from questions
./tools/plan-from-questions.sh --priority P0,P1

These tools reduce friction but aren't required. Manual tracking in markdown works fine.

Value of Explicit Unknowns

The open questions document serves multiple purposes:

Planning: Discovery roadmap emerges from prioritized questions

Reasoning trail: Shows what was uncertain when, how decisions were made

Team coordination: Everyone sees what's known vs unknown

Momentum maintenance: Clear next steps prevent "what should I work on?" paralysis

Learning validation: Compare initial questions to final answers (reveals growth)

The insight: Documented unknowns are more valuable than undocumented assumptions. Make ignorance explicit, then systematically eliminate it.

Open questions tracking transforms research from passive reading into active preparation. When you know what you don't know, Discovery mode can systematically make it known.

Discovery Workflow

Discovery mode is DDD's approach for uncertain requirements, novel solutions, or exploratory work. Instead of optimizing for production code, discovery optimizes for learning density: extracting architectural insights, validating assumptions, and discovering constraints as efficiently as possible.

Use discovery mode when you're working with unfamiliar technology, exploring solution spaces, or building foundational components where the right approach isn't yet clear. The output isn't production-ready code—it's validated insights that inform how to build the real thing.

Relationship to Research mode: Discovery often follows Research mode (external knowledge gathering), validating documented theory through hands-on experiments. In Learning meta-mode, Discovery and Research alternate as experiments reveal new questions requiring study. See Research Workflow and Meta-Modes.

Discovery mode uses a learning-first approach built around four core documents that form an integrated harness for systematic experimentation.

The Learning-First Approach

Discovery mode inverts typical development: learning is the goal, code is the tool to extract it.

Traditional development starts with solutions and ends with documentation. Discovery starts with questions and ends with answers. The process centers on LEARNINGS.md as both roadmap and artifact:

Begin with questions:

What do we need to learn about this problem space?
What decisions must we make before production?
Which assumptions need validation?

Iterate to discover:

Build minimal experiments (toy models) to answer specific questions
Update LEARNINGS.md continuously as insights emerge
Treat code as disposable; insights are the durable output

End with answers:

What held? What failed? Why?
Which patterns are ready for production?
What constraints did we discover?

This disciplined approach ensures you're always learning efficiently, not building prematurely.

The Four-Document Harness

The four core artifacts form a harness system that guides AI agents while preserving human control:

SPEC.md — The bit: precise contract keeping the pull straight
- Purpose: Comprehensive behavioral contract for the current scope
- Must contain: Input/output formats, invariants, internal state shapes, operations, validation rules, error semantics, test scenarios, success criteria
PLAN.md — The yoke: aligns effort into test-first steps
- Purpose: Strategic roadmap using Docs → Tests → Implementation cadence
- Must contain: What to test vs. skip, order of steps, timeboxing, dependencies, risks, explicit success checkboxes per step
LEARNINGS.md — The tracks AND the compass: where you've been and where to go
- Purpose (dual role):
  - Roadmap: Define learning goals and open questions upfront
  - Artifact: Capture architectural insights, pivots, fragile seams, production-ready patterns
- Must contain:
  - What we need to learn (goals)
  - What held, what failed, why (results)
  - Portable patterns for production (extraction)
- Status: Required in discovery mode (central organizing document)
README.md — The map: concise orientation for integration
- Purpose: 100–200 words context refresh on library functionality
- Must contain: Header + one-liner, 2–3 sentence purpose, 3–5 essential method signatures, core concepts, gotchas/caveats, representative test path

Together these artifacts let the human act as driver, ensuring the cart (implementation) moves forward under control, with clarity preserved and ambiguity eliminated.

Discovery Cycle

The discovery workflow follows four sequential phases:

1. Documentation

Define learning goals in LEARNINGS.md (what questions to answer)
Generate or update SPEC.md and PLAN.md for the current, minimal slice of scope
Keep README.md for any touched library crisp and current

2. Tests

Derive executable tests (or rubrics) directly from SPEC.md
Golden examples and negative/error-path cases are required

3. Implementation

Build minimal code to pass tests and answer learning questions
Prefer single-file spikes for first proofs
Keep changes tightly scoped

4. Learnings

Update LEARNINGS.md with what held, what failed, why, and next constraints
Extract portable patterns ready for production use
Identify follow-up questions or declare learning goals complete

The cycle repeats until all learning goals are met and patterns are validated.

Supporting Practices

These practices strengthen discovery work by encouraging simplicity, focus, and inspectable behavior.

Napkin Physics — Why Start Simple

Before writing SPEC.md and PLAN.md, use Napkin Physics to force parsimony and avoid premature complexity.

The practice: Treat the problem like physicists with a napkin—capture just the essentials:

Problem (one sentence)
Assumptions (3-5 bullets)
Invariant/Contract (one precise property)
Mechanism (≤5 bullets, single-file spike, minimal deps)
First Try (one paragraph describing simplest path)

Why it helps discovery:

Prevents over-engineering before you understand the problem
Enforces deletion: no new layers/nouns without removing two elsewhere
Gives you a minimal starting point to test assumptions against reality
Makes it easy to throw away and restart when you learn something fundamental

Napkin Physics is upstream simplification—it keeps you from building elaborate solutions to problems you don't yet understand.

See: Napkin Physics

Toy Models — Why Isolate Experiments

Toy models are small, sharply-scoped experiments designed to answer specific questions. Unlike prototypes, they're kept as reference artifacts after completion.

The practice:

Build minimal implementations in isolated directories (toys/, experiments/)
Each toy isolates exactly one axis of complexity:
- Base toys test a single primitive (one invariant, mechanism, or seam)
- Integration toys test integration between de-risked primitives (integration is the single axis)
Follow full cycle: SPEC → PLAN → Tests → Minimal Impl → LEARNINGS
Retain in repository as intermediate artifacts documenting the discovery process

Why it helps discovery:

Validates assumptions cheaply before committing to production architecture
Isolates complexity so you can reason about one problem at a time
Feeds direct, falsifiable evidence into LEARNINGS.md
Provides reference implementations when porting patterns to production
Integration toys test "do these work together?" after primitives are already validated

Key principle: Every toy addresses exactly one source of uncertainty. Base toys validate primitives. Integration toys validate that de-risked primitives compose correctly. The primitives are no longer uncertain, so integration is the single remaining axis.

Toy models are controlled experiments. They answer: "Does this approach actually work?" before you build it for real.

See: Toy-Model Rationale

CLI + JSON Debugger — When Inspectable Behavior Helps

Applicability: This pattern fits data pipelines, transformation tools, and CLI-based systems. Many projects (web apps, GUIs, embedded systems, real-time software) won't match this model—that's expected and fine.

The practice (when applicable):

Expose functional modules as pure CLIs with JSON stdin/stdout
Use structured error JSON on stderr
Build systems as composable pipelines: modA < in.json | modB | modC > out.json

Why it helps discovery (when it fits):

Enables single-stepping: run each transformation independently to inspect intermediate state
Makes failures falsifiable: exact inputs that trigger errors are trivial to capture and replay
Supports bisecting: when a pipeline breaks, binary search which stage failed
Golden tests become trivial: save input/output JSON pairs as fixtures
Both humans and AI agents can reason about behavior mechanically

When to use: If your discovery work involves data transformations, parsers, formatters, or stateless operations, CLI+JSON provides a low-friction debugging substrate. If not, skip it—there are other ways to make behavior inspectable.

See: Debugger Mindset

Repo Layout & Guardrails — Why Constrain Experiments

Even exploratory work benefits from lightweight structure and constraints.

Layout principles:

Clear locations for experiments (toys/, experiments/), documentation, tests
Each toy directory contains its own SPEC.md, PLAN.md, LEARNINGS.md, README.md

Guardrails that aid discovery:

Dependency constraints: default to stdlib; justify any additions in SPEC.md
Complexity limits: single-file spikes ≤120 lines when feasible; functions ≤25 lines
Error handling: implement top 2 failure modes; others raise clear structured errors
No more than two new abstractions per experiment

Why it helps discovery:

Constraints force clarity: if you can't express it simply, you don't understand it yet
Small scopes = fast iteration = more learning per hour
Self-audit metrics reveal when experiments are growing too complex
Structured errors make failures informative instead of mysterious

Discovery thrives on disciplined constraints. They're not bureaucracy—they're feedback mechanisms that surface when you're exploring unproductively.

See: General Practices

Examples in Practice:

Case Study II: Spatial MUD Database demonstrates discovery workflow in action, showing how toy model discipline and systematic experimentation addressed complex technical challenges through four focused prototypes and multi-system integration.

Case Study IV: NES Development shows Research ↔ Discovery ping-pong (Learning meta-mode), where systematic wiki study catalogs questions, Discovery validates theory through test ROMs, and findings update external knowledge documents.

Napkin Physics

The term derives from Fermi estimation and "back-of-the-envelope" calculations — rough approximations simple enough to sketch on a restaurant napkin. In software development, napkin physics applies this same principle to problem framing: upstream simplification to prevent scope drift by capturing the essential mechanism at the highest level of abstraction.

The technique draws inspiration from Einstein's principle: "Everything should be made as simple as possible, but no simpler." Rather than diving directly into implementation details, napkin physics forces problem definition at the conceptual level — as if sketching the core mechanism on a restaurant napkin.

This approach counters the natural tendency of AI systems to generate comprehensive, layered solutions. By establishing conceptual constraints upfront, the methodology guides subsequent SPEC and PLAN generation toward parsimony without losing essential complexity.

Structure

Problem: Single sentence defining what needs to be solved.

Assumptions: 3–5 bullets listing what can be taken as given.

Invariant/Contract: One precise property that must hold across all operations.

Mechanism: ≤5 bullets describing the minimal viable path (single‑file spike preferred).

First Try: Short paragraph outlining the simplest possible approach.

Constraints

Prohibitions: No frameworks, no new architectural layers, no new abstractions unless two existing ones are removed.

Scope Limitation: Focus on the essential mechanism only — defer integration, optimization, and edge cases to subsequent phases.

Application

Napkin physics serves as the foundation step before SPEC and PLAN generation. By establishing conceptual boundaries first, it prevents scope drift and over-engineering in downstream documentation.

The exercise forces identification of the core problem without implementation assumptions. This clarity propagates through the entire development cycle, maintaining focus on essential functionality rather than comprehensive feature sets.

Effectiveness

The technique leverages AI systems' sensitivity to framing. Abstract, constraint-focused prompts produce fundamentally different outputs than implementation-focused ones. The napkin physics format consistently guides AI toward minimal viable solutions rather than maximal complete ones.

Toy‑Model Rationale

Toy models are scientific experiments, not products. Their purpose is to learn, reduce risk, and sharpen architectural clarity—not to ship.

What Toy Models Are

Focused experiments: Each toy validates a single technical idea.
Cheap and discardable: Code is expendable; insight is what matters.
Architectural probes: They test assumptions, reveal edge cases, and expose integration challenges.
Learning accelerators: Fast cycles of building, testing, and documenting.

What Toy Models Are Not

Not production systems
Not comprehensive solutions
Not sacred code to preserve
Not shortcuts to “done”

The Toy Model Cycle

1. Specification (SPEC.md)

Define the experiment before you run it.

Data structures, operations, and expected behaviors
Edge cases and failure conditions
Clear success criteria

2. Planning (PLAN.md)

Lay out the steps like a recipe.

Sequence of test-first steps
Risks and dependencies
What to validate at each stage

3. Implementation

Run the experiment under strict discipline.

Write failing tests first
Add only enough code to make them pass
Capture errors clearly and specifically
Stop when the hypothesis is validated

4. Learning Extraction (LEARNINGS.md)

Distill the insight.

What worked, what failed
Patterns worth reusing
Integration implications
Strategic takeaways

Exit Criteria

All step-level success criteria checked
Insights recorded
Follow-up scope cut

Guiding Principles

Test-Driven Development is mandatory
The red-green cycle keeps experiments honest, forces clarity, and documents usage.
Error messages are for humans and AIs
Be specific, actionable, and structured. Good errors guide both debugging and future automation.
Event sourcing is your microscope
Record every operation so you can replay, inspect, and debug how state evolved.
Minimal dependencies, maximum clarity
Use proven libraries, avoid frameworks, keep the system transparent.
Export in multiple formats
JSON for state, DOT for graphs, CSV for tabular views. Make insights portable.

Strategic Guidance

Pivot early when better approaches appear; persist when the gain is marginal.
Preserve learnings even when abandoning code.
Keep APIs clean and data formats consistent across toys.
Discard code without guilt—the artifact that matters is the documentation of insight.

North Star

Toy models are gardening, not construction.
You’re cultivating understanding, not building monuments.
The point is clarity, not permanence.

Toy Integration Convention

Each toyN_* directory must contain exactly one SPEC.md, PLAN.md, and LEARNINGS.md.
If a SPEC or PLAN grows too large or unfocused, split scope into new toyN_* experiments.
Integration toys (e.g. toy5_, toy6_) exist to recombine validated sub-toys.
Replace in place: update LEARNINGS.md rather than creating multiples for the same toy.
When consolidating, fold prior learnings into a single current doc; discard stale versions.
Always bias toward minimal scope: smaller toys, fewer docs, clearer insights.

Axis Principle for Toy Models

A base toy isolates exactly one axis of complexity (a single invariant, mechanism, or seam).
An integration toy merges exactly two axes to probe their interaction.
Never exceed two axes per toy; more belongs to higher‑order integration or production scope.
This discipline keeps learnings sharp, avoids doc bloat, and mirrors controlled experiments.

Debugger Mindset

Once documentation provides structure and AI agents have clear specifications, a critical challenge remains: how can agents execute systems reliably without becoming lost in hidden state? The solution lies in adopting a debugger mindset — treating all system components as if they operate in debugger mode, with every execution step exposed in machine-readable form.

System Legibility for AI Agents

Traditional software development tolerates hidden state, implicit context, and opaque execution flows. Human developers navigate these complexities through experience and debugging tools. AI agents, however, require explicit, deterministic interfaces to maintain consistency across execution sessions.

The core principle is system legibility: making all execution state visible and falsifiable. This enables agents to:

Verify intermediate results against specifications
Reproduce exact execution sequences
Identify failure points without ambiguity
Maintain consistent behavior across sessions

CLI + JSON Architecture

The most effective substrate for AI-legible systems combines command-line interfaces with JSON data interchange:

Interface Contract:

stdin: JSON input parameters
stdout: JSON output results
stderr: Structured error JSON

Execution Rules:

Deterministic behavior: identical inputs produce identical outputs
No hidden state dependencies
Pure functions with explicit side effects
Machine-parsable error handling

Error Format:

{
  "type": "ERR_CODE",
  "message": "human-readable description",
  "hint": "actionable remediation steps"
}

Pipeline Composition

JSON-based CLIs enable UNIX-style pipeline composition that agents can inspect and validate:

moduleA < input.json > intermediate.json
moduleB < intermediate.json > result.json
moduleC --transform < result.json > output.json

Each pipeline stage produces inspectable artifacts. Agents can:

Validate intermediate results against expected schemas
Isolate failure points by examining individual stages
Reproduce partial executions for testing and debugging
Generate comprehensive execution traces

Golden Test Integration

Every module should provide a canonical golden test demonstrating expected behavior:

# Single command that validates core functionality
./module --golden-test

Golden tests serve as deterministic checkpoints that:

Establish baseline behavior before modifications
Prevent specification drift during development
Provide concrete examples of correct input/output pairs
Enable agents to verify their understanding of system behavior

Implementation Patterns

Module Structure:

Single executable per logical function
JSON schema validation for inputs/outputs
Comprehensive error handling with structured messages
Built-in golden test modes

System Design:

Prefer composition over complex monolithic tools
Minimize interdependencies between modules
Expose all configuration through explicit parameters
Maintain audit trails of execution decisions

Benefits for AI Development

The debugger mindset transforms AI-system interaction from guesswork to systematic execution:

Predictability: Agents can reason about system behavior through explicit interfaces rather than implicit behavior patterns.

Testability: Every system interaction produces verifiable artifacts that can be validated against specifications.

Debuggability: Execution traces provide clear failure attribution and remediation paths.

Reproducibility: Deterministic interfaces enable exact recreation of execution sequences for analysis and refinement.

This approach establishes a foundation where human oversight and AI execution can coexist productively, with clear boundaries and verifiable outcomes at every step.

Execution Workflow

Execution mode is for building features within established systems. Once core patterns are proven and architectural approaches validated, the heavy experimentation discipline of Discovery mode becomes unnecessary overhead. Execution mode focuses on maintaining system legibility and architectural consistency while building quickly.

Use Execution mode when:

Core abstractions and patterns are established
Requirements are clear and well-defined
Technical constraints are documented
Building on existing codebase (not exploring from scratch)

Note: Execution mode typically follows Discovery mode in standard progression. For complex domains, Research mode may precede Discovery to build foundational knowledge before experimentation. See Meta-Modes & Mode Transitions for workflow patterns.

Central Artifact: CODE_MAP.md

CODE_MAP.md is the primary coordination mechanism in Execution mode—a living architectural document that stays current through discipline.

Key principle: Update CODE_MAP.md before every commit that adds, removes, or renames files, or changes module purposes.

Convention: One CODE_MAP.md per directory containing source files (non-recursive). Each CODE_MAP.md describes only files/folders in its own directory, not subdirectories.

Example structure:

./CODE_MAP.md                    # Root-level files only
src/CODE_MAP.md                  # Source modules
tests/CODE_MAP.md                # Test organization
tests/unit/CODE_MAP.md           # Unit test files

Why this matters: CODE_MAP.md provides rapid orientation for both humans and AI agents. It prevents reverse-engineering system structure from implementation details.

See: Code Maps

Feature Documentation Structure

Features are documented in dedicated directories during development:

In-progress features:

.ddd/feat/<feature_name>/
  KICKOFF.md      - Binary-weave planning (what primitive + which integration)
  SPEC.md         - Behavioral contract
  PLAN.md         - TDD implementation steps
  ORIENTATION.md  - Working notes (deleted on completion)
  LEARNINGS.md    - Optional (only if architectural insights emerge)

Completed features:

.ddd/done/<feature_name>/
  KICKOFF.md      - Preserved for historical record
  SPEC.md         - Preserved for historical record
  PLAN.md         - Preserved for historical record
  LEARNINGS.md    - Preserved if it exists
  (ORIENTATION.md deleted - it's a working document)

LEARNINGS.md is optional in Execution mode - Only write it if unexpected insights, architectural pivots, or valuable failures emerged. Unlike Discovery mode (where LEARNINGS is central), Execution mode assumes things go according to plan.

The Execution Cycle

1. Orient

Read CODE_MAP.md to understand current architecture and constraints. Start in the directory where you'll work, check parent directories as needed.

Understand:

How the system is organized
Where new code should live
Which patterns to follow
What constraints exist

2. Kickoff

Create .ddd/feat/<feature_name>/ and write KICKOFF.md using binary-weave pattern:

Which primitive are you introducing?
What existing product does it integrate with?
What's the new integrated capability?

Example: "Introduce authentication primitive (A), integrate with existing API layer (B), creating authenticated endpoints (A+B=C)"

See: Kickoff Writing

3. Specify

Write SPEC.md defining the behavioral contract:

Input/output formats
Invariants and state shapes
Operations and validation rules
Error semantics
Test scenarios
Success criteria

Execution mode SPECs are lighter than Discovery mode—assume established patterns, focus on what's specific to this feature.

See: Spec Writing

4. Plan

Write PLAN.md with TDD implementation steps:

Numbered steps following test-first discipline
Each step: write test → implement → refactor
Explicit success checkboxes per step
Integration points with existing code
Risks and dependencies

See: Plan Writing

5. Implement

Follow PLAN.md steps with TDD discipline:

Write failing test (red)
Implement minimal code to pass (green)
Commit after each step completes
Use conventional commit format: feat(scope): complete Step N - description
Update ORIENTATION.md with working notes as needed

Build on existing patterns. Keep CODE_MAP.md open for reference.

6. Refactor

Default practice: refactor after feature completion.

Why the default: AI assistance makes refactoring cheap. This prevents gradual quality degradation by ensuring the system maintains consistent quality through continuous small improvements.

Refactor to:

Clean up integration seams between new and existing components
Extract emerging patterns and eliminate duplication
Ensure new code follows established architectural patterns
Improve naming, structure, and clarity

When to skip: Porting work where 1:1 correspondence with reference implementation matters (refactoring would break golden tests and systematic gap analysis). Other contexts where refactoring conflicts with project goals.

The mindset: refactoring is cheap with AI, so do it unless you have a good reason not to.

See: Refactoring with AI Agents

7. Update Documentation

Update affected CODE_MAP.md files to reflect structural changes (new files, renamed modules, changed purposes).

Update project-level docs (README, architecture docs) if feature changes user-facing behavior or system design.

8. Complete

Move feature to done:

Delete ORIENTATION.md (working document, no historical value)
Move directory: .ddd/feat/<name> → .ddd/done/<name>
Keep KICKOFF.md, SPEC.md, PLAN.md, and LEARNINGS.md (if exists) for historical record

Commit Discipline

Use conventional commit format:

Format: type(scope): subject
Types: feat, fix, docs, chore, refactor, test
Include step numbers: feat(auth): complete Step 3 - add token validation

Commit frequency:

After every numbered step in PLAN.md (red → green cycle)
Before switching contexts or tasks
When CODE_MAP.md updated (often same commit as structural change)

History:

Keep linear history (prefer rebase, avoid merge commits)
Link issues if applicable: Refs #123

When to Switch Modes

Switch to Discovery mode when:

Requirements reveal gaps in established patterns
New technologies need evaluation before production use
Performance constraints require architectural changes
Significant uncertainty emerges that needs systematic experimentation

Switch to Research mode when:

Need to study unfamiliar APIs or documentation
External knowledge exists but isn't yet understood
Building foundational knowledge before experimentation
Cataloguing questions before designing experiments

The methodology is flexible—use the right mode for the current challenge. Most work in established systems benefits from Execution workflow's lighter approach, but when uncertainty emerges, switch modes deliberately. See Meta-Modes & Mode Transitions for detailed transition patterns.

Example in Practice: Case Study I: ChatGPT Export Viewer demonstrates execution workflow in action, showing how CODE_MAP.md and refactoring discipline supported the development of a shipped NPM package with clean human-AI collaboration boundaries.

Code Maps

CODE_MAP.md serves as living architectural documentation that provides structural orientation for both humans and AI agents. It's the central orchestration document in Execution Mode, updated with every commit to reflect the current system state.

Purpose and Philosophy

Code maps bridge the gap between high-level architecture and implementation details. They give both humans and AI a clear mental model of how the codebase is organized without requiring them to reverse-engineer structure from code.

For Humans: Quick orientation when returning to a project or understanding unfamiliar areas For AI Agents: Essential context for understanding existing structure before making changes For Teams: Shared understanding of system organization and component responsibilities

Structure and Contents

Architecture Overview

High-level purpose, design philosophy, and data flow patterns that define the system's approach.

Key Directories

Functional organization with clear responsibilities - what each major directory contains and why.

Component Documentation

Each major module/library documented with:

Key functions and their purposes
Primary interfaces and data shapes
How the component fits into the larger system

Integration Patterns

How components connect and depend on each other:

Data flow between major systems
Interface boundaries and contracts
Orchestration and coordination patterns

Practical Insights

Known issues and gotchas that developers encounter
Fragile areas that require careful modification
Safety patterns and common pitfalls
Performance considerations and optimization notes

Maintenance Discipline

Updated with Every Commit

The CODE_MAP must always reflect current reality. When code structure changes, the map changes too.

Focus on Structure Over Details

Capture architectural insight, not implementation specifics. The goal is orientation, not exhaustive documentation.

AI-Agent Friendly

Written to help agents understand the system quickly and make appropriate changes that fit existing patterns.

Change-Sensitive Sections

Explicitly flag areas that are fragile, experimental, or require special care when modifying.

Writing Effective Code Maps

Start with Purpose

Begin with a clear statement of what the system does and its core design philosophy.

Show Data Flow

Trace typical execution paths through the system to illustrate how components interact.

Document the Why

Explain architectural decisions and trade-offs, not just what exists.

Keep It Current

Treat the CODE_MAP as a living document that evolves with the codebase.

Be Selective

Include what helps understanding, skip what adds noise. Focus on the most important 80% of the system.

Integration with Execution Workflow

Code maps work best when integrated into the standard development cycle:

Orient: Read CODE_MAP.md before starting work
Plan: Consider how changes fit existing architecture
Implement: Build following established patterns
Refactor: Clean up integration seams
Update: Refresh CODE_MAP.md for structural changes

The code map becomes the foundation that enables confident refactoring and consistent architectural decisions across the development cycle.

Refactoring with AI Agents

Traditional refactoring advice assumes human developers who naturally write DRY, well-structured code from the start. AI agents exhibit fundamentally different coding patterns that require adapted refactoring strategies. Rather than fighting these patterns, effective AI collaboration embraces them and integrates systematic cleanup into the development workflow.

The AI Verbosity Problem

Large language models demonstrate a consistent tendency toward verbose, repetitive code generation. Even when explicitly prompted to follow DRY principles or write clean, modular code, AI agents typically produce implementations with significant duplication and unnecessarily complex structures.

This pattern emerges from how LLMs process context and generate code. They excel at pattern matching and rapid code production, but struggle with the architectural discipline that humans develop through experience. The result is functional code that works correctly but contains substantial redundancy and missed abstraction opportunities.

Attempting to force DRY principles during initial code generation creates friction and slows development without proportional benefit. AI agents often misunderstand abstraction requests, leading to over-engineered solutions or incomplete implementations that require more correction effort than systematic post-generation cleanup.

The Three-Phase Refactoring Cycle

Effective AI collaboration adopts a three-phase approach that separates code generation from code optimization:

Phase 1: Generate to Green

Allow the AI agent to write repetitive, verbose code without DRY constraints. Focus entirely on functionality and test coverage. The goal is working code that passes all tests, regardless of structural quality.

This phase leverages AI agents' natural strengths while avoiding their weaknesses. Agents excel at rapid implementation when freed from architectural constraints. The repetitive code they generate typically follows consistent patterns that become clear refactoring targets in subsequent phases.

Phase 2: Plan the Cleanup

Once tests are passing, prompt the AI agent to review its own implementation and propose a refactoring plan. This meta-cognitive step often produces better results than upfront architectural guidance because the agent can analyze actual code patterns rather than working from abstract requirements.

The refactoring plan should identify specific duplication patterns, extract common abstractions, and propose architectural improvements. The human developer reviews this plan, suggests modifications, and approves the refactoring strategy before implementation begins.

Phase 3: Execute Refactoring

Implement the approved refactoring plan while maintaining test coverage. This phase benefits from the safety net that TDD provides—comprehensive tests catch regressions introduced during restructuring operations.

The AI agent performs the mechanical refactoring work under human oversight. The human ensures that the refactoring preserves intended behavior and maintains architectural consistency with the broader system.

Why This Approach Works

The three-phase cycle addresses the fundamental mismatch between AI code generation patterns and human architectural expectations. Rather than forcing AI agents to work against their natural tendencies, it creates a workflow that maximizes their contributions while maintaining code quality.

Separation of Concerns: Code generation and optimization become distinct activities with different success criteria. Generation focuses on functionality; optimization focuses on structure.

Leveraging AI Strengths: AI agents excel at rapid implementation and mechanical refactoring operations. The workflow emphasizes these strengths while minimizing exposure to architectural decision-making where they perform poorly.

Human Oversight: Critical architectural decisions remain under human control through the plan review process. This ensures that refactoring improves rather than degrades system architecture.

Safety Through Testing: TDD provides continuous validation throughout the refactoring process. This safety net enables aggressive restructuring that would be risky without comprehensive test coverage.

Application in Different Workflow Modes

Discovery Mode Refactoring

During discovery workflow, refactoring serves architectural exploration. As toy models reveal effective patterns, aggressive refactoring extracts these patterns into reusable forms. The three-phase cycle accelerates this extraction process while maintaining the experimental velocity that discovery mode requires.

Discovery refactoring often involves more radical restructuring as understanding evolves. The AI agent's willingness to perform extensive mechanical changes becomes particularly valuable when architectural insights require significant code reorganization.

Execution Mode Refactoring

In execution workflow, refactoring maintains architectural consistency as the system grows. The three-phase cycle becomes mandatory after each feature implementation, preventing the gradual degradation that typically occurs in evolving codebases.

Execution refactoring focuses on integration seams and pattern consistency rather than architectural discovery. The AI agent identifies where new code deviates from established patterns and proposes alignment strategies.

Practical Implementation

The refactoring cycle integrates naturally into both workflow modes through consistent prompting patterns:

Generation Phase: "Implement [feature] to make tests pass. Focus on functionality over code organization."

Planning Phase: "Review the implementation and identify opportunities for reducing duplication and improving structure. Propose a specific refactoring plan."

Execution Phase: "Implement the approved refactoring plan while maintaining all test coverage."

This structured approach transforms what traditional development treats as occasional cleanup into routine system maintenance. The economic reality that AI assistance makes refactoring dramatically less expensive enables this shift from optional to mandatory practice.

Economic Impact

The three-phase cycle fundamentally changes refactoring economics. Traditional development delayed refactoring due to high manual effort costs. With AI assistance, refactoring becomes routine maintenance rather than expensive technical debt remediation.

This economic shift enables continuous quality improvement rather than gradual degradation. Systems maintain architectural integrity through incremental improvements rather than requiring periodic major restructuring efforts.

The result is codebases that improve consistently over time while maintaining rapid development velocity. The AI agent handles the mechanical aspects of refactoring while human oversight ensures architectural coherence and quality improvement.

Meta-Modes & Mode Transitions

Real projects don't stay in a single mode. They transition between Research, Discovery, and Execution based on the work's nature and current needs. Understanding these transition patterns—meta-modes—helps structure projects effectively.

Meta-Modes Defined

A meta-mode is a recurring pattern of mode transitions that characterizes a project or project phase. Different meta-modes suit different goals.

Common meta-modes:

Learning Meta-mode: Research ↔ Discovery ping-pong
Porting Meta-mode: Discovery → Execution with reference as oracle
Standard Progression: Discovery → Execution typical flow

The methodology is deliberately multi-stable—projects naturally find their appropriate meta-mode.

Learning Meta-Mode

Learning meta-mode alternates between Research and Discovery to build comprehensive knowledge of a domain. Common in knowledge-building projects, educational work, and unfamiliar technology exploration.

The pattern: Research phase catalogs questions from external sources → Discovery phase validates through experiments → findings reveal new questions → back to Research. Continue ping-ponging until domain understanding comprehensive.

Primary deliverable: Knowledge artifact (documentation, book, reference guide), not production code. Toys remain as permanent reference implementations.

When to use: Primary goal is knowledge capture rather than product delivery. Building reference material, validating hardware behavior, creating domain guides, or documenting poorly-documented systems.

See: Research Workflow for detailed practices. Case Study IV: NES Development demonstrates Learning meta-mode across 52 wiki pages and 8+ toy ROMs.

Porting Meta-Mode

Porting meta-mode provides structured approach to reference-driven translation: translating existing codebase to different language or framework while maintaining behavioral equivalence.

The pattern: Discovery phase validates risky translation patterns (FFI, unsafe, platform APIs) through focused toys → Execution phase applies validated patterns tier-by-tier with reference as oracle. Golden tests verify behavioral equivalence throughout.

Primary deliverable: Production codebase functionally matching reference implementation, not exploratory learning.

Key principle: Reference implementation defines correctness. Use target language idioms when simpler, preserve source patterns when necessary for equivalence.

When to use: Translating existing codebase where behavioral equivalence is measurable and translation patterns need validation before production use.

When not to use: No reference exists, requirements uncertain, or building on existing codebase in same language (use Discovery or Execution instead).

See: Discovery Workflow and Execution Workflow for mode details. Case Study III: C++ to Rust Port demonstrates Porting meta-mode translating 11k LOC in ~2 days.

Standard Progression

Standard progression is the typical Discovery → Execution flow for feature development once core patterns are established. This is the default meta-mode for most projects after initial exploration.

The pattern: Discovery phase validates unknowns through focused toys → Execution phase builds features on established foundation. Occasional returns to Discovery when genuine uncertainty emerges, but Execution dominates once patterns proven.

Primary deliverable: Shipped product with production codebase. Toys serve as reference artifacts, not the end product.

Key characteristic: Discovery is bounded (focused validation) rather than open-ended exploration. Most work happens in Execution mode after initial pattern validation.

When to use: Building typical software products where core architecture decisions made early and most work is feature implementation with occasional uncertainty.

See: Discovery Workflow and Execution Workflow for mode details. Case Study I: ChatGPT Export Viewer demonstrates Standard Progression for shipped NPM package.

Mode Transition Triggers

Transition between modes deliberately when triggers occur, don't drift unconsciously.

Research → Discovery: Questions catalogued and prioritized. Sufficient theory to design experiments. Reading hits diminishing returns—time to validate.

Discovery → Research: Experiments reveal unexpected behavior or gaps between theory and practice. New questions spawn from findings requiring external documentation.

Discovery → Execution: Patterns validated and extracted. Key uncertainties resolved. Ready to apply patterns to production with confidence.

Execution → Discovery: New genuine uncertainty blocks progress. Performance constraints or technology integration require experimentation beyond simple iteration.

The principle: Each mode has clear entry/exit criteria. Recognize triggers, transition deliberately.

Choosing the Right Meta-Mode

Match meta-mode to project goals:

Learning meta-mode: Primary goal is knowledge artifact (book, guide, documentation). External knowledge needs validation. No production codebase planned.

Porting meta-mode: Translating existing codebase where reference defines correctness. Behavioral equivalence measurable. Translation patterns need de-risking.

Standard Progression: Building typical software product. Core decisions made early. Most work is feature implementation with occasional uncertainty.

Wrong meta-mode signals: Endless research without production progress (Learning when should be Standard). Repeatedly hitting fundamental knowledge gaps (Standard when should be Learning). No reference to validate against (Porting when should be Discovery).

The fix: Recognize mismatch, transition deliberately.

Meta-Modes Are Descriptive

Meta-modes aren't rules—they're observed patterns helping structure work. If your project doesn't match cleanly, use atomic modes as needed. Core practices persist across all meta-modes: documentation-first, toys before production, mandatory refactoring.

The goal isn't rigid adherence—it's deliberate choice of appropriate workflow for current project needs.

DDD AGENTS.md Template

This chapter provides a sample AGENTS.md you can drop into a repository to guide a coding agent in using Dialectic‑Driven Development (DDD). Treat it as a template: adapt roles, guardrails, and the DDD loop to your project's constraints and goals.

# AGENTS.md

## 1. Purpose

Dialectic-Driven Development (DDD) turns ambiguous problems into deterministic, legible systems through lightweight docs, disposable toy models, and incremental integrations.  


---

## 2. Core Principles

- **Docs as control surfaces** — SPEC, PLAN, LEARNINGS, README.  
- **Toys, not monuments** — throwaway code, durable insights.  
- **Parsimony** — the simplest mechanism that works today.  
- **Determinism** — same input → same output; minimize hidden state.  
- **Legibility** — JSON + simple CLIs; human + agent inspectable.  
- **Two-at-a-time integration** — never combine more than two at once.

---

## 3. The DDD Loop

1. SPEC — define minimal contract (inputs, outputs, invariants). 
2. PLAN — outline the smallest testable step.  
3. Implementation — write only enough to satisfy the contract.  
4. LEARNINGS — capture outcomes and constraints.
5. README - publish tool/API docs for future use.


---

## 4. Napkin Physics

Quick pre-spec simplification:  

- Problem (1 sentence)  
- Assumptions (a few bullets)  
- Invariant (one crisp property)  
- Mechanism (≤5 bullets)

Rule: no frameworks, no new nouns unless two are deleted.  


---

## 5. Kickoff: The Binary-Weave

The kickoff is a sequential weave:  

- Introduce exactly one primitive (Toy A, Toy B …).  
- Integrate it with the current product (A+B=C, C+D=E …).  
- Each integration yields the new current product.  
- Continue until the final product emerges.  

End state: name the final product, summarize woven primitives, state durable invariants, discard toys, keep docs and learnings.  

Goal: compounding clarity

Anti-Goal: combinatorial drift


---

## 6. Toy Models

Small, sharply scoped, fully specced implementations designed to be discarded.  

Cycle: SPEC → PLAN → Tests → Minimal code → LEARNINGS.  

Axis discipline: a base toy isolates one axis; an integration toy merges exactly two.  


---

## 7. CLI + JSON Convention

Modules behave like debuggers:  

- stdin: JSON input  
- stdout: JSON output  
- stderr: structured error JSON  
- Purity: same input → same output  

Error JSON shape:  
    { "type": "ERR_CODE", "message": "text", "hint": "fix" }

Schema-first: document I/O schemas in SPEC.  


---

## 8. Pipelines & Golden Tests

Compose CLIs as UNIX-style pipelines with inspectable intermediates, but only when this makes sense. It's not a good fit for every project.


---

## 9. Guardrails & Heuristics

Habits to constrain complexity:  

- Default import allowlist; justify exceptions.  
- Prefer single-file spikes.  
- Two-Function Rule: parse(input)→state; apply(state,input)→state|output.  
- No new nouns unless two removed.  
- Handle top errors with structured JSON.  
- Record cost, latency, privacy notes in LEARNINGS.

---

## 10. Roles

- **Agent** — generates docs, toys, integrations; pushes forward.  
- **Human** — spotter: nudges when the agent stalls or drifts, and makes judgment calls the agent cannot.

Hegel CLI: Workflow Orchestration Tool

Hegel is a command-line tool that operationalizes Dialectic-Driven Development through state-based workflow management. It guides you through structured development cycles while capturing metrics and enforcing methodology discipline.

Designed for AI agents, ergonomic for humans. Hegel provides deterministic workflow guardrails for AI-assisted development while remaining comfortable for direct human use.

Installation

Hegel is written in Rust and distributed as source:

git clone https://github.com/dialecticianai/hegel-cli
cd hegel-cli
cargo build --release

The binary will be available at ./target/release/hegel.

Requirements:

Rust toolchain (cargo, rustc)
Works on macOS, Linux, Windows
No external dependencies or API keys required

Core Concepts

State Machine Workflows

Hegel uses YAML-based workflow definitions that specify:

Nodes - Development phases with specific prompts
Transitions - Rules for advancing between phases based on claims
Mode - Discovery (exploration) or Execution (delivery)

State is stored locally in .hegel/state.json (in your current working directory), making it fully offline with no cloud dependencies.

Available Workflows

Discovery Mode (learning-focused):

SPEC → PLAN → CODE → LEARNINGS → README
Optimized for learning density
Full four-document harness

Execution Mode (delivery-focused):

KICKOFF → SPEC → PLAN → CODE → REFACTOR → CODE_MAP
Optimized for production resilience
Mandatory refactoring phase

Minimal Mode (simplified):

Reduced ceremony for quick iterations
Testing and experimentation

Basic Usage

Starting a Workflow

Initialize a new workflow in your project:

hegel start discovery

Hegel creates .hegel/state.json and displays the first phase prompt, which includes relevant writing guides injected into the template.

Advancing Through Phases

Transition to the next phase by providing claims:

hegel next '{"spec_complete": true}'

Common claims:

spec_complete - SPEC phase finished
plan_complete - PLAN phase finished
code_complete - Implementation finished
learnings_complete - LEARNINGS documented
restart_cycle - Return to SPEC phase

Hegel validates the claim against workflow rules and advances you to the next node if the transition is valid.

Checking Status

View your current workflow position:

hegel status

Output shows:

Current mode (discovery/execution)
Current node/phase
Full history of nodes visited
Workflow metadata

Resetting State

Clear all workflow state to start fresh:

hegel reset

Writing Guide Injection

Hegel workflows include template placeholders that inject writing guides into phase prompts:

Template syntax:

prompt: |
  You are in the SPEC phase.

  {{SPEC_WRITING}}

  Your task: Write a minimal behavioral contract.

At runtime, {{SPEC_WRITING}} is replaced with the full contents of guides/SPEC_WRITING.md, ensuring agents receive consistent methodology guidance.

Available guides:

SPEC_WRITING - Behavioral contract guidance
PLAN_WRITING - TDD roadmap planning
CODE_MAP_WRITING - Code mapping guidelines
LEARNINGS_WRITING - Insight extraction guidance
README_WRITING - Summary documentation guidance
HANDOFF_WRITING - Session handoff protocol
KICKOFF_WRITING - Project kickoff guidance

Claude Code Integration

Hegel integrates with Claude Code to capture development activity as you work, enabling metrics collection and workflow analysis.

Hook Configuration

Configure Claude Code to send hook events to Hegel by adding to .claude/settings.json:

{
  "hooks": {
    "PostToolUse": "hegel hook PostToolUse"
  }
}

Hook events are captured to .hegel/hooks.jsonl with timestamps, building a detailed log of development activity (tool usage, bash commands, file modifications).

Captured Events

Hook events logged:

Tool usage (Bash, Read, Edit, Write, Grep, etc.)
Bash commands executed
File modifications with paths
Transcript references for token metrics

Workflow state transitions logged (.hegel/states.jsonl):

Phase changes (from_node → to_node)
Timestamps for correlation
Workflow mode and session metadata

Metrics and Analysis

Analyze Command

View captured development activity and metrics:

hegel analyze

Output includes:

Session ID and workflow summary
Token usage (input/output/cache metrics from transcripts)
Activity summary (bash commands, file modifications)
Top commands and most-edited files
Workflow state transitions
Per-phase metrics:
- Duration (time spent in each phase)
- Token usage (input/output tokens per phase)
- Activity (bash commands and file edits per phase)
- Status (active or completed)
Workflow graph:
- ASCII visualization of phase transitions
- Node metrics (visits, tokens, duration, commands, edits)
- Cycle detection (identifies workflow loops)

Interactive Dashboard

Launch a real-time terminal UI:

hegel top

Features:

4 interactive tabs: Overview, Phases, Events, Files
Live updates: Auto-reloads when event logs change
Scrolling: Arrow keys, vim bindings (j/k), jump to top/bottom (g/G)
Navigation: Tab/BackTab to switch tabs
Colorful UI: Emoji icons, syntax highlighting, status indicators

Keyboard shortcuts:

q - Quit
Tab / BackTab - Navigate tabs
↑↓ / j/k - Scroll
g / G - Jump to top/bottom
r - Reload metrics manually

What's tracked:

Overview tab: Session summary, token usage, activity metrics
Phases tab: Per-phase breakdown (duration, tokens, activity)
Events tab: Unified timeline of hooks and states (scrollable)
Files tab: File modification frequency (color-coded by intensity)

Metrics Correlation

Hegel correlates three independent event streams by timestamp:

hooks.jsonl - Claude Code activity (tool usage, bash commands, file edits)
states.jsonl - Workflow transitions (phase changes)
Transcripts - Token usage from ~/.claude/projects/<project>/<session_id>.jsonl

Correlation strategy:

All hooks after workflow start belong to that workflow (workflow_id is start timestamp)
Hooks attributed to phases by timestamp ranges (state transitions define boundaries)
Token metrics extracted from transcripts and correlated to workflow phases

This enables questions like:

"How many bash commands during SPEC phase?"
"Token usage in PLAN phase?"
"Which files were edited most during CODE phase?"

Deterministic Guardrails

Workflows can include rules that detect problematic patterns and interrupt with warning prompts:

Example rules (from discovery.yaml CODE phase):

rules:
  - type: repeated_command
    pattern: "cargo (build|check|test)"
    threshold: 6
    window: 180
  - type: repeated_file_edit
    path_pattern: "src/.*"
    threshold: 10
    window: 300
  - type: token_budget
    max_tokens: 10000

Rule types:

repeated_command - Detects command patterns repeated beyond threshold
repeated_file_edit - Detects excessive edits to same file pattern
token_budget - Enforces maximum token usage per phase

When rules are violated, Hegel injects warning prompts into the workflow, encouraging reflection before proceeding.

State Directory Configuration

By default, Hegel uses .hegel/ in the current working directory. You can override this:

Via command-line flag:

hegel --state-dir /tmp/my-project start discovery

Via environment variable:

export HEGEL_STATE_DIR=/tmp/my-project
hegel start discovery

Precedence: CLI flag > environment variable > default (.hegel/ in cwd)

Use cases:

Testing: Isolate test runs in temporary directories
Multi-project workflows: Override default per-project location
CI/CD: Configure non-default state locations in automated environments

When to Use DDD Workflows

Hegel is a general workflow orchestration tool. The DDD-opinionated guides included are defaults, not requirements.

Use full DDD workflows for:

Hard problems requiring novel solutions
Projects needing rigorous documentation
Complex domains where mistakes are expensive
Learning-dense exploration (discovery mode)

Skip DDD overhead for:

Straightforward implementations agents can handle autonomously
Simple CRUD applications or routine features
Projects where the agent doesn't need structured guidance

The workflow steps and token usage are designed for problems that need that rigor. Many projects don't require it.

Example Workflow Session

Starting discovery workflow:

$ hegel start discovery
Workflow started: discovery mode
Current node: spec

You are in the SPEC phase of Dialectic-Driven Development (Discovery Mode).

[SPEC_WRITING guide content injected here...]

Your task: Write a minimal behavioral contract.

Advancing after completing SPEC:

$ hegel next '{"spec_complete": true}'
Transitioned: spec → plan

You are in the PLAN phase of Dialectic-Driven Development (Discovery Mode).

[PLAN_WRITING guide content injected here...]

Your task: Create a test-driven implementation plan.

Checking status:

$ hegel status
Mode: discovery
Current node: plan
History: spec → plan
Workflow ID: 2025-10-12T10:30:00Z

Analyzing metrics after completion:

$ hegel analyze
Session: abc123def
Workflow: discovery (complete)

Token Usage:
  Input: 45,230 tokens
  Output: 12,450 tokens
  Cache read: 8,900 tokens

Activity:
  Bash commands: 87
  File edits: 34

Per-Phase Breakdown:
  spec: 15 min, 8.2k tokens, 12 commands, 5 file edits
  plan: 22 min, 12.1k tokens, 18 commands, 8 file edits
  code: 45 min, 24.8k tokens, 45 commands, 18 file edits
  learnings: 10 min, 4.2k tokens, 8 commands, 2 file edits
  done: 5 min, 1.9k tokens, 4 commands, 1 file edit

Integration with DDD Methodology

Hegel operationalizes the methodology by:

Enforcing structure: State machine prevents skipping phases or advancing prematurely

Providing context: Writing guides injected at each phase ensure consistency

Capturing metrics: Hook integration enables post-hoc analysis of workflow efficiency

Enabling iteration: Cycle transitions (LEARNINGS → SPEC) support Discovery mode loops

Maintaining discipline: Deterministic rules detect problematic patterns without LLM judgment

The goal: Make DDD practical through tooling, not just theoretical through documentation.

Project Repository

Hegel is open source (SSPL license):

Repository: https://github.com/dialecticianai/hegel-cli

Documentation: README.md, CODE_MAP.md (architecture), workflow definitions in workflows/

Guides: Writing templates in guides/ directory

For more information about Dialectic-Driven Development methodology, visit dialectician.ai.

Authoring Guides

Agent-oriented writing templates for Dialectic Driven Development.

These guides are designed to be copied into your repository and referenced by AI agents before authoring documents. They provide structured templates, constraints, and examples that help agents produce consistent, high-quality documentation.

How to Use These Guides

Copy to your repo: Place relevant guides in your project's docs/ or .ai/ directory
Reference in prompts: Tell your agent to "read the spec writing guide before creating SPEC.md"
Maintain consistency: Use across projects to build a library of well-structured documents

Available Guides

Spec Writing: Structure, examples, and validation to make behavior falsifiable
Plan Writing: Step templates and TDD discipline for actionable plans
Kickoff Writing: "Napkin physics" approach to project initialization
README Writing: Concise orientation docs for internal libraries
Learnings Writing: Capturing evidence, pivots, and architectural insights

Each guide includes templates, constraints, anti-patterns, and real examples to help agents author documents that integrate seamlessly with the DDD workflow.

Spec Writing

This chapter provides agent-oriented documentation for writing SPEC.md files in DDD projects. Drop this guide into your repository as SPEC_WRITING.md to help AI agents understand how to create precise behavioral contracts for toy models.

# SPEC_WRITING.md

## Purpose

A **SPEC.md is a contract spike**: it defines what the system must accept, produce, and guarantee.  
It exists to make implementation falsifiable — to ensure tests and validation have clear ground truth.

---

## What a SPEC.md Is / Is Not

### ❌ Not

- Implementation details (classes, functions, algorithms)
- Internal design notes (unless exposed in the contract)
- Tutorials, manuals, or user guides
- Vague aspirations ("the system should work well")

### ✅ Is

- Precise input/output formats
- Defined state transitions or invariants
- Operation semantics (commands, APIs, behaviors)
- Error and validation rules
- Concrete test scenarios and acceptance criteria

---

## Core Structure

### 1. Header
Toy Model N: [System Name] Specification

One-line purpose statement

### 2. Overview

- **What it does:** core purpose in 2–3 sentences
- **Key principles:** 3–5 bullets on design philosophy
- **Integration context:** if relevant, note inputs/outputs to other toys

### 3. Data Model
Define external data formats with **realistic examples**:

- All required fields shown
- Nested structures expanded
- Field purposes explained
- JSON schemas when clarity demands

### 4. Core Operations
Document commands or APIs with a consistent pattern:

- **Syntax** (formal usage)
- **Parameters** (required/optional, ranges, defaults)
- **Examples** (simple + complex)
- **Behavior** (state changes, outputs, side effects)
- **Validation** (rules, errors, edge cases)

### 5. Test Scenarios
3 categories:

1. **Simple** — minimal case
2. **Complex** — realistic usage
3. **Error** — invalid inputs, edge handling  
Optionally, **Integration** — only if toy touches another system.

### 6. Success Criteria
Checkboxes phrased as falsifiable conditions, e.g.:

- [ ] Operation X preserves invariant Y
- [ ] Error messages are structured JSON
- [ ] Round-trip import/export retains labels

---

## Quality Heuristics

High-quality SPECs are:

- **Precise** — eliminate ambiguity
- **Minimal** — only cover one axis of complexity
- **Falsifiable** — every statement testable
- **Contextual** — note integration points when they matter

Low-quality SPECs are:

- Vague ("system processes data")
- Over-prescriptive (dictating implementation)
- Bloated with internal details
- Missing testable criteria

---

## Conclusion

A SPEC.md is not a design novel.
It is a **minimal, precise contract** that locks in what must hold true, so tests and implementations can be judged unambiguously. If multiple axes of complexity emerge, split them into separate toy models.

Plan Writing

This chapter provides agent-oriented documentation for writing PLAN.md files in DDD projects. Drop this guide into your repository as PLAN_WRITING.md to help AI agents create strategic roadmaps for toy model implementation.

# PLAN_WRITING.md

## What a PLAN.md Actually Is

A **PLAN.md is a strategic roadmap** describing **what to build and how to build it step-by-step**. It enforces clarity, sequencing, and validation.

### ❌ NOT:

- Implementation code
- Literal test code
- Copy-paste ready
- Exhaustive details

### ✅ IS:

- Stepwise development roadmap
- TDD methodology guide
- Illustrative code patterns only
- Success criteria with checkboxes

---

## Structure

### Header

- **Overview**: Goal, scope, priorities
- **Methodology**: TDD principles; what to test vs. not test

### Step Template

    ## Step N: <Feature Name> **<PRIORITY>**

    ### Goal
    Why this step matters

    ### Step N.a: Write Tests

    - Outline test strategy (no literal code)
    - Key cases: core, error, integration
    - Expected validation behavior

    ### Step N.b: Implement

    - Tasks: file/module creation, core ops, integration
    - Code patterns for illustration only
    - State and error handling guidance

    ### Success Criteria

    - [ ] Clear, testable checkpoints
    - [ ] Functional + quality standards met

---

## Key Practices

### TDD Discipline

- Write failing tests first
- Red → Green → Next
- Focus on interfaces and contracts
- Cover error paths explicitly

### Test Scope

- ✅ Test: core features, errors, integration points
- ❌ Skip: helpers, edge cases, perf, internals

### Code Patterns
Use examples as **patterns**, not literal code:

    cmdWalk(cells, direction) {
        if (!(direction in DIRECTIONS)) throw Error(`Invalid: ${direction}`);
        const [dx, dy] = DIRECTIONS[direction];
        this.cursor.x += cells * dx; this.cursor.y += cells * dy;
    }

### Tasks
Break implementation into minimal units:

    1. Create directory/files
    2. Implement core command parsing
    3. Add integration test path
    4. Error handling

### Success Criteria
Always check with concrete, objective boxes:

- [ ] Parser initializes cleanly  
- [ ] Commands mutate state correctly  
- [ ] Errors raised for invalid input  
- [ ] Test suite runs with single command  

---

## Anti-Patterns

- ❌ Full test code in Plan (use bullet outlines)
- ❌ Full implementation code (use patterns only)
- ❌ Over-detail (Plan guides, does not replace dev thinking)

---

## Why This Works

- **Clear sequencing**: prevents scope drift  
- **TDD enforcement**: quality-first mindset  
- **Concrete validation**: objective step completion  
- **Minimal guidance**: gives direction without over-specifying  

---

## Conclusion
A good PLAN.md is a **map, not the territory**. It sequences work, enforces TDD, and defines success. It avoids detail bloat while ensuring implementers know exactly **what to test, what to build, and when it's done**.

Kickoff Writing

This chapter provides agent-oriented documentation for writing KICKOFF.md files in DDD projects. Drop this guide into your repository as KICKOFF_WRITING.md to help AI agents structure project kickoffs using napkin physics and binary-weave integration patterns.

# KICKOFF_WRITING.md

This document instructs the agent how to write a kickoff document for a new DDD project.
The goal is to produce a single, explicit binary-weave plan — not a flat list of toys, not parallel streams.
The weave always alternates: *new primitive → integration with prior product*.  

---

## Core Shape of a Kickoff

1. **Napkin Physics**:  
   - Problem (1 sentence)  
   - Assumptions (3–5 bullets)  
   - Invariant (one crisp property that must always hold)  
   - Mechanism (≤5 bullets describing the minimal path)  

2. **Binary-Weave Plan**:  
   - Always introduce **one new primitive at a time** (Toy A, Toy B, Toy C …).  
   - Always follow by **integrating it with the prior product** (A+B=C, C+D=E, …).  
   - Each integration produces the **new “current product”**.  
   - No step introduces more than one new primitive.  
   - No integration combines more than two things.  
   - Continue until the final product emerges.  

3. **End State**:  
   - Name the final product.  
   - Summarize which primitives and integrations were woven.  
   - State the durable invariants.  
   - Clarify that only the final docs + system remain; toys are discarded but learnings are kept.  

---

## Formatting Expectations

- **Stage numbering is sequential.**  
  - *Stage 1*: Primitive A, Primitive B  
  - *Stage 2*: A + B = C  
  - *Stage 3*: Primitive D  
  - *Stage 4*: C + D = E  
  - *Stage 5*: Primitive F  
  - *Stage 6*: E + F = G  
  - …continue until final product.  

- **Each stage entry must have**:  
  - **Name** (Toy or Integration)  
  - **What it does** (one sentence)  
  - **Invariant** (instantaneous, non-blocking, etc.)  

- **Avoid parallel numbering.** Don’t list “Stage 2.3” or “Stage 2.4”.  
- **Avoid over-specification.** The kickoff is a weave map, not a spec.  
- **Avoid skipping.** Each stage should follow the weave pattern strictly.  

---

## Tone & Style

- Write plainly and compactly — scaffolding, not prose.  
- Prioritize clarity of the weave over detail of implementation.  
- Keep invariants crisp and behavioral, not vague.  
- Use ≤2 bullets per primitive/integration when possible.  

---

## One-Shot Checklist

- [ ] Napkin Physics included?  
- [ ] Sequential stages?  
- [ ] Exactly one new primitive per stage?  
- [ ] Integration always combines current product with one new primitive?  
- [ ] Final product and invariants stated at end?  

If all are checked, the kickoff is valid.

README Writing

This chapter provides agent-oriented documentation for writing README.md files in DDD projects. Drop this guide into your repository as README_WRITING.md to help AI agents create effective context refresh documentation.

# README_WRITING.md

## Purpose

These READMEs serve as **context refresh documents** for AI assistants working with the codebase. They should quickly re-establish understanding of what each library does, how to use it, and what to watch out for.

**Target audience**: AI assistants needing to quickly understand library purpose and usage patterns  
**Length target**: 100–200 words total  
**Focus**: Dense, essential information only

---

## Required Structure

### **1. Header + One-Liner**

    # library_name
    Brief description of what it does and key technology/pattern

### **2. Purpose (2–3 sentences)**

- What core problem this solves
- Key architectural approach or design pattern
- How it fits in the broader system/integration

### **3. Key API (essential methods only)**

    # 3-5 most important methods with type hints
    primary_method(param: Type) -> ReturnType
    secondary_method(param: Type) -> ReturnType

### **4. Core Concepts (bullet list)**

- Key data structures or abstractions
- Critical constraints or assumptions  
- Integration points with other libraries
- Important design patterns

### **5. Gotchas & Caveats**

- Known limitations or scale constraints
- Common usage mistakes
- Performance considerations
- Integration pitfalls

### **6. Quick Test**

    pytest tests/test_basic.py  # or most representative test

---

## Writing Guidelines

### **Be Concise**

- Use bullet points over paragraphs
- Focus on essential information only
- Assume reader has basic programming knowledge

### **Be Specific**

- Include actual method signatures, not generic descriptions
- Mention specific constraints (e.g., "max 1000 rooms before performance degrades")
- Reference specific test files for examples

### **Be Practical**

- Lead with most commonly used methods
- Highlight integration points with other libraries
- Focus on "what you need to know to use this correctly"

### **Avoid**

- Marketing language or feature lists
- Detailed implementation explanations
- Extensive examples (link to tests instead)
- Installation instructions (assume internal development environment)

---

## Template

    # library_name
    Brief description of what it does

    ## Purpose
    2–3 sentences covering the core problem solved, architectural approach, and role in broader integration.

    ## Key API
    most_important_method(params: Type) -> ReturnType
    second_most_important(params: Type) -> ReturnType
    utility_method(params: Type) -> ReturnType

    ## Core Concepts

    - Key data structure or abstraction
    - Critical constraint or assumption
    - Integration point with other libraries
    - Important design pattern

    ## Gotchas

    - Known limitation or performance constraint
    - Common usage mistake to avoid
    - Integration pitfall with other libraries

    ## Quick Test
    pytest tests/test_representative.py

---

## Quality Check

A good library README should allow an AI assistant to:

1. **Understand purpose** in 10 seconds
2. **Know primary methods** to call
3. **Avoid common mistakes** through gotchas section
4. **Validate functionality** through quick test

If any of these takes longer than expected, the README needs to be more concise or better organized.

Learnings Writing

This chapter provides agent-oriented documentation for writing LEARNINGS.md files in DDD projects. Drop this guide into your repository as LEARNINGS_WRITING.md to help AI agents create effective retrospective documentation that captures architectural insights and constraints.

# LEARNINGS_WRITING.md

## Purpose

A **LEARNINGS.md** is a short, dense retrospective.  
Its job: extract maximum value from an experiment by recording **what worked, what failed, what remains uncertain, and why.**

---

## What It Is / Is Not

### ❌ Not

- A feature list  
- Implementation details  
- A user manual  
- Purely positive  
- Hype or speculation without evidence  

### ✅ Is

- A record of validated insights  
- A log of failures and limitations  
- A map of open questions  
- A pointer to architectural reuse  
- A calibration tool for future experiments  

---

## Essential Sections

### Header

    # Toy Model N: System Name – Learnings
    Duration: X days | Status: Complete/Incomplete | Estimate: Y days

### Summary

- Built: 1 line  
- Worked: 1–2 key successes  
- Failed: 1–2 key failures  
- Uncertain: open question

### Evidence

- ✅ Validated: concise finding with evidence  
- ⚠️ Challenged: difficulty, workaround, lesson  
- ❌ Failed: explicit dead end  
- 🌀 Uncertain: still unresolved

### Pivots

- Original approach → New approach, why, and what remains unknown.

### Impact

- Reusable pattern or asset  
- Architectural consequence  
- Estimate calibration (time/effort vs. outcome)

---

## Style

- Keep it **short and factual**.  
- Prefer **bullet points** over prose.  
- Note **failures and unknowns** as explicitly as successes.  
- One page max — dense, parsimonious, reusable.

LEARNINGS.md is not a diary. It is a **distilled record of architectural insights** that prevent future agents from repeating failures and help them understand what constraints actually matter.

Case Study I: ChatGPT Export Viewer

This archive lists concrete, working examples referenced throughout the book. It is intentionally small and current.

Archive Browser Project

A complete DDD example demonstrating the full development cycle from kickoff to shipped product. This real-world project produced chatgpt‑export‑viewer, a suite of composable CLI tools for browsing ChatGPT export archives.

Project outcome: A cross-platform toolkit with clean human-AI collaboration boundaries:

Human role: Product direction, UX decisions, constraint setting, edge case validation
Agent role: Implementation, refactoring, shared pattern extraction, packaging polish

Key architectural decisions:

CLI + JSON I/O for deterministic, testable composition
Keyboard-first TUI with instant responsiveness (/ search, n/N navigation)
Modular libraries: ZIP access, terminal primitives, cross-platform launchers
Publishing discipline: proper bin entries, dependency management, lint/format gates

The example demonstrates DDD's strength in AI-first development: clear documentation boundaries enable effective human-agent collaboration while maintaining code quality and user experience standards.

Kickoff Document

Initial project definition using "napkin physics" to establish core constraints and approach.

Archive Browser Kickoff

Spec Document

Technical specification defining invariants, contracts, and behaviors for Stage 1 primitives.

Archive Browser Spec

Plan Document

Step-by-step implementation plan with TDD methodology, success criteria, and risk mitigation.

Archive Browser Plan

Code Map Document

Living architectural documentation providing structural orientation for both humans and AI agents.

Archive Browser Code Map

Notes

Keep examples practical and minimal; link them from relevant chapters.
Export formats: when useful, include small JSON/DOT/CSV snippets alongside examples.

Archive Browser Kickoff

This example demonstrates the binary-weave kickoff process for a real DDD project. This KICKOFF.md file guided the development of the Archive Browser, a shipped NPM package for viewing ChatGPT conversation exports.

# KICKOFF.md

A clarity-first, agent-oriented development plan for building a lightning-fast TUI archive browser for ChatGPT export ZIPs.
This document is disposable scaffolding: clarity and validated integrations are the true outputs.

## Napkin Physics

- **Problem**: We need a non-laggy, lightning-fast TUI to browse ChatGPT export ZIPs containing JSON, HTML, and hundreds of image files.
- **Assumptions**:
  - Archives can grow large (thousands of entries, gigabytes in size).
  - Performance > developer ergonomics (the developer is an LLM).
  - Implementation target: Node.js v20 with `yauzl` (ZIP) + `terminal-kit` (TUI).
  - Responsiveness depends on lazy rendering, precomputed metadata, and diff-based screen updates.
- **Invariant**: Every user interaction (scrolling, searching, previewing) must feel instantaneous — no noticeable lag.
- **Mechanism**:
  1. Use `yauzl` to stream central directory and precompute metadata.
  2. Render scrollable file list with `terminal-kit`.
  3. Display metadata in side panel; update on highlight.
  4. Lazy-load previews (JSON/HTML inline, images as stubs/external).
  5. Add fuzzy search to filter entries without slowing navigation.

---

## Binary-Weave Plan

Each stage introduces at most one new primitive, then integrates it with an existing validated toy. This continues until the final product emerges.

### Stage 1 — Primitives

- **Toy A**: ZIP Reader  
  Reads central directory, outputs JSON metadata for all entries.
- **Toy B**: TUI List  
  Displays an array of strings in a scrollable, instant navigation list.

### Stage 2 — First Integration

- **C = A + B** → Archive Lister  
  Combine ZIP Reader with TUI List.  
  Behavior: display archive entries in a scrollable list.  
  Invariant: open + scroll is instant regardless of archive size.

### Stage 3 — New Primitive

- **Toy D**: Metadata Panel  
  Displays key-value metadata (size, method, CRC, etc.) in a side panel.

### Stage 4 — Second Integration

- **E = C + D** → Entry Browser  
  Combine Archive Lister with Metadata Panel.  
  Behavior: highlight an entry in list, show details in panel.  
  Invariant: panel update is instant on keypress.

### Stage 5 — New Primitive

- **Toy F**: File Previewer (text only)  
  Opens JSON/HTML entries, streams to popup.  
  Must be cancelable and non-blocking.

### Stage 6 — Third Integration

- **G = E + F** → Text Browser  
  Combine Entry Browser with File Previewer.  
  Behavior: press Enter on a JSON/HTML file to preview inline.  
  Invariant: browsing stays snappy; previews load lazily.

### Stage 7 — New Primitive

- **Toy H**: Image Stub  
  Detects image files, shows placeholder `[IMG] filename`.  
  Optional: launch via external viewer.

### Stage 8 — Fourth Integration

- **I = G + H** → Full Viewer  
  Combine Text Browser with Image Stub.  
  Behavior: one interface handles JSON, HTML, and image entries.  
  Invariant: non-text files never block UI.

### Stage 9 — New Primitive

- **Toy J**: Search/Filter  
  Provides fuzzy filename search.  
  Invariant: filtering large lists is instant.

### Stage 10 — Final Integration

- **K = I + J** → Archive Browser Product  
  Combine Full Viewer with Search/Filter.  
  Final features:
  - Scrollable list of entries
  - Metadata side panel
  - Inline preview for JSON/HTML
  - Image handling stub
  - Fuzzy search/filter  
    Invariant: every keypress (nav, search, preview) feels immediate.

---

## End State

- **Final Product**: Lightning-fast Node.js archive browser for ChatGPT exports.
- **Process**: 5 primitives (A, B, D, F, H, J) woven into 5 integrations (C, E, G, I, K).

Archive Browser Spec

This example demonstrates a complete SPEC.md file from the Archive Browser project. This specification guided development of the shipped NPM package for viewing ChatGPT conversation exports.

# SPEC.md

Archive Browser, Stage 1 (Toy A + Toy B)

Scope: Implement the first two primitives for the Archive Browser kickoff (Stage 1), without integration.
- Toy A: ZIP Metadata Reader (`zipmeta`)
- Toy B: TUI List (`tuilist`)

No extrapolation beyond existing docs: this SPEC defines minimal contracts and invariants to enable TDD for Stage 1.

## 1. Invariants

- Determinism: Same input → same output; no hidden state in outputs.
- Legibility: JSON I/O; structured error JSON on stderr.
- Parsimony: Avoid unnecessary work; stream and avoid file content reads for Toy A.
- Responsiveness (Toy B): Rendering a large list (≥10k items) does not visibly lag on navigation.

---

## 2. Toy A — ZIP Metadata Reader (`zipmeta`)

### Purpose
Read a ZIP file’s central directory and emit a JSON array of entries (metadata only). Do not read file contents.

### Input (stdin JSON)

    {
      "zip_path": "./path/to/archive.zip"
    }

### Output (stdout JSON)
Array of entry objects, ordered by central directory order.

    [
      {
        "name": "conversations/2024-09-30.json",
        "compressed_size": 12345,
        "uncompressed_size": 67890,
        "method": "deflate",
        "crc32": "89abcd12",
        "last_modified": "2024-09-30T12:34:56Z",
        "is_directory": false
      }
    ]

Notes:
- `method` is a human label derived from the ZIP method code.
- `last_modified` is normalized UTC ISO8601 if available; otherwise omit.

### Errors (stderr JSON)
On failure, emit a single JSON object to stderr; no stdout payload.

    { "type": "ERR_ZIP_OPEN", "message": "cannot open zip", "hint": "check path and permissions" }
Other representative errors:
- `ERR_ZIP_NOT_FOUND` — path does not exist
- `ERR_ZIP_INVALID` — invalid/corrupt ZIP central directory

---

## 3. Toy B — TUI List (`tuilist`)

### Purpose
Render a scrollable list of strings with instant navigation. Stage 1 validates rendering performance and interaction loop structure; selection semantics may be finalized during integration.

### Input / Output
- Input: JSON array of strings on stdin.
- Output: For Stage 1, stdout may be empty or a minimal confirmation object; interaction is the focus. Errors use structured JSON on stderr if startup fails.

Example input (stdin):

    ["a.json", "b.json", "c.html"]

Example minimal output (stdout):

    { "ok": true }

### Errors (stderr JSON)

    { "type": "ERR_TUI_INIT", "message": "terminal init failed", "hint": "verify terminal supports required features" }

---

## 4. Operations

- Toy A `zipmeta`:
  - Open ZIP, iterate central directory entries, map metadata to the output schema.
  - Never read file contents; stream and collect metadata only.
  - Normalize fields (method label, crc32 hex, optional last_modified).
- Toy B `tuilist`:
  - Initialize terminal UI, render list from stdin array.
  - Provide non-blocking navigation; ensure smooth, low-latency scroll.

---

## 5. Validation Rules

- Toy A produces identical JSON given the same `zip_path`.
- Toy A handles non-existent/invalid ZIP paths with structured errors.
- Toy B starts without throwing; renders lists up to ≥10k items without visible lag.
- Toy B exits cleanly on user quit (e.g., Esc/Ctrl-C), leaving terminal in a good state.

---

## 6. Test Scenarios (Golden + Error Cases)

- Toy A Golden: small known ZIP → stable JSON array (order and fields match).
- Toy A Large: large ZIP central directory streams without memory blow-up; completes.
- Toy A Errors: missing path; corrupt file → structured error.
- Toy B Golden: feed 100 sample items → UI initializes and returns `{ "ok": true }` on immediate quit.
- Toy B Stress: feed ≥10k items → navigation is smooth; startup succeeds.

---

## 7. Success Criteria

- [ ] `zipmeta` emits correct JSON metadata without reading file contents.
- [ ] `zipmeta` error paths return structured JSON on stderr only.
- [ ] `tuilist` initializes, renders, and exits cleanly.
- [ ] `tuilist` remains responsive with large lists (subjective but observable).
- [ ] Both tools follow CLI + JSON purity (no hidden state in outputs; logs allowed).

*** End of Stage 1 SPEC ***

Archive Browser Plan

This example demonstrates a complete PLAN.md file from the Archive Browser project. This strategic roadmap guided TDD implementation of the shipped NPM package for viewing ChatGPT conversation exports.

# PLAN.md

Archive Browser, Stage 1 (Toy A + Toy B)

Overview: Build and validate the two Stage 1 primitives from KICKOFF — Toy A (`zipmeta`) and Toy B (`tuilist`) — using TDD. Keep changes minimal and test-first. Use A/B structure per Plan Writing guide.

Methodology: TDD discipline; tests-first; Red → Green → Next; explicit success criteria; focus on interfaces and contracts.

## Step 1: Toy A — ZIP Metadata Reader (HIGH)

### Step 1.a: Write Tests
- Golden: Small known ZIP → stable JSON array (fields: name, sizes, method label, crc32 hex, optional last_modified, is_directory). Order matches central directory.
- Large: Big ZIP central directory streams without memory blow-up; completes within acceptable time (manual check).
- Errors: Non-existent path → `ERR_ZIP_NOT_FOUND`; corrupt ZIP → `ERR_ZIP_INVALID`; open failure → `ERR_ZIP_OPEN`. No stdout on error.
- Determinism: Same input JSON (`{ zip_path }`) yields identical output JSON.

### Step 1.b: Implement
- Use `yauzl` to open and iterate central directory; do not read file contents.
- Map method code → human label; hex-encode crc32 to 8-char lowercase; include `is_directory`.
- Normalize optional `last_modified` to UTC ISO8601 if available (omit if unknown).
- Emit JSON array to stdout; on error, emit structured error JSON to stderr only.

### Success Criteria
- [ ] Golden output matches expected JSON exactly.
- [ ] Error cases emit structured JSON on stderr and no stdout payload.
- [ ] Determinism verified for repeated runs.
- [ ] No file content reads; central directory only.

---

## Step 2: Toy B — TUI List (HIGH)

### Step 2.a: Write Tests
- Startup: Given JSON array of ~100 strings on stdin, TUI initializes and exits cleanly on immediate quit; stdout may emit `{ "ok": true }`.
- Stress: Given ≥10k items, navigation remains smooth (manual check); startup is non-blocking.
- Error: TUI init failure emits `ERR_TUI_INIT` to stderr; no stdout payload.

### Step 2.b: Implement
- Use `terminal-kit` to render a scrollable list from stdin strings.
- Ensure non-blocking input handling; leave terminal state clean on exit.
- Keep Stage 1 output minimal (e.g., `{ "ok": true }`); finalize selection semantics post-integration.

### Success Criteria
- [ ] TUI starts, renders, and exits cleanly on quit.
- [ ] Stress navigation does not visibly lag (subjective but observable).
- [ ] Structured errors on init failure.

---

## Out of Scope (Stage 1)
- Integration (Stage 2) — combining A + B
- Metadata panel, previewer, image stub, search/filter (later stages)

---

## Risks & Mitigations
- Large ZIPs: stream central directory and avoid content reads to preserve memory.
- TUI responsiveness: keep drawing minimal; avoid synchronous blocking operations.
- Terminal variance: handle init errors gracefully; restore terminal state on exit.

---

## Completion Check
- [ ] Step 1 success criteria all pass
- [ ] Step 2 success criteria all pass
- [ ] SPEC and PLAN reflect actual behavior

*** End of Stage 1 PLAN ***

Archive Browser Code Map

This example demonstrates a complete CODE_MAP.md file from the Archive Browser project. This architectural documentation provided ongoing orientation for both human developers and AI agents throughout development of the shipped NPM package.

# CODE_MAP.md

This document orients you to the project structure, what each file does, and how the pieces fit together at a high level.

## Architecture Overview

- Purpose: Terminal-based tools to explore ChatGPT export ZIPs quickly with keyboard-centric UX.
- Design: Small composable CLIs built on a thin library layer.
  - UI: `terminal-kit` powers list menus, panes, and key handling.
  - ZIP I/O: `yauzl` streams metadata and file contents without full extraction.
  - GPT utils: Helpers reduce OpenAI export mappings into readable message sequences.
  - OS helpers: Minimal macOS integration for "open externally" convenience.
- Data Flow (typical):
  1. CLI parses args or JSON from stdin (`lib/io.js`).
  2. ZIP operations query or stream entries (`lib/zip.js`).
  3. TUI renders lists/panels and handles keys (`lib/terminal.js`).
  4. Optional: spawn specialized viewers (JSON tree, GPT browser) or export to files.

## Key Directories

- `cli/`: Executable entry points for each tool (users run these via npm scripts).
- `lib/`: Reusable helpers for ZIP I/O, GPT export reduction, terminal UI, and small OS shims.
- `backup1/`: Example data from a ChatGPT export (for local dev/testing).
- `zips/`: Sample export ZIPs used by the tools.
- `artifacts/`: Logs/JSON artifacts from runs.

## Libraries (`lib/`)

- `lib/zip.js`
  - Thin wrappers around `yauzl` for reading ZIPs:
    - `listNames(zipPath)`: stream all entry names.
    - `readMetadata(zipPath)`: emit metadata for each entry (name, sizes, method, crc32, directory flag, last_modified).
    - `readEntryText(zipPath, entryName)`: read a specific entry as UTF-8 text.
    - `extractEntry(zipPath, entryName, destPath)`: extract one entry to disk.
    - `extractToTemp(zipPath, entryName, prefix)`: extract a single entry into a unique temp dir; returns `{ dir, file }`.
    - `cleanupTemp(path)`: best-effort recursive removal.

- `lib/gpt.js`
  - Utilities to transform ChatGPT export structures:
    - `extractTextFromContent(content)`: normalize message content to text (handles common shapes, multimodal parts).
    - `extractAuthor(message)`: infer `user`/`assistant`/fallback from message author fields.
    - `buildMainPathIds(mapping, currentNodeId)`: follow parent pointers to root.
    - `autoDetectLeafId(mapping)`: choose a reasonable leaf when `current_node` is absent.
    - `reduceMappingToMessages(mapping, { currentNodeId, includeRoles })`: produce a minimal `[{ author, text }]` sequence along the main path.
    - `buildPlainTextTranscript(messages)`: render a readable transcript string.
    - `exportConversationPlain(title, messages)`: write a plain-text transcript to `exports/` and return the file path.

- `lib/open_external.js`
  - `openExternal(path)`: cross‑platform opener. macOS: `open`; Windows: `cmd /c start`; Linux/Unix: try `xdg-open`, `gio open`, `gnome-open`, `kde-open`, `wslview`.

- `lib/terminal.js`
  - Terminal helpers built on `terminal-kit`:
    - Exported `term` instance plus utilities: `paneWidth`, `status`, `statusKeys`, `statusSearch`, `printHighlighted`, `drawMetaPanel`.
    - Cursor/cleanup: `ensureCursorOnExit()`, `restoreCursor()`, `terminalEnter()`, `terminalLeave()`.
    - Menus: `withMenu()` (callback-based singleColumnMenu), `listMenu()` (fixed viewport, keyboard-driven; supports highlight query), `listMenuWrapped()` (wrapped multi-line items), `makeListSearch()` (reusable "/" + n/N search wiring for lists), `wrapLines()` (text wrapper).
    - Note: `cursorToggle()` wraps terminal-kit’s `hideCursor()`, which actually toggles cursor visibility (misnamed upstream). Our name reflects the real behavior.

- `lib/io.js`
  - I/O utilities shared by CLIs:
    - `emitError(type, message, hint)`: structured JSON errors to stderr.
    - `readStdin()`: read entire stdin as UTF-8.
    - `resolvePathFromArgOrStdin({ key })`: accept arg or JSON stdin for paths.
    - `jsonParseSafe(text)`: tolerant JSON parse.
    - `safeFilename(title)`: sanitize to a portable file name base.
    - `ensureDir(dirPath)`, `writeFileUnique(dir, base, ext, content)`: mkdir -p + non-clobbering writes.

- `lib/viewers.js`
- `showJsonTreeFile(filePath)`: spawn the JSON tree viewer for a file.
- `showJsonTreeFromObject(obj, opts)`: write a temp JSON and spawn the viewer; cleans up temp dir.

## CLIs (`cli/`)

- `cli/zipmeta.js`
  - Emits ZIP metadata as JSON. Accepts path via argv or stdin JSON (`{"zip_path":"..."}`) and writes an array of entry objects to stdout.

- `cli/listzip.js`
  - Scrollable list of ZIP entry names using `terminal-kit`'s single-column menu. Pure navigation; Enter exits.

- `cli/tuilist.js`
  - Scrollable list sourced from stdin JSON array of strings. Useful as a standalone picker component.

- `cli/jsontree.js`
  - Inline JSON tree viewer with expand/collapse, arrow/j/k/h/l movement, page/top/bottom, and a compact status line.
  - Accepts a file path arg or JSON via stdin.

- `cli/browsezip.js`
  - Two-pane browser for a ZIP:
    - Left: instant-scroll list of entries (keyboard-driven viewport).
    - Right: live metadata panel (`drawMetaPanel`).
    - Enter: inline JSON tree preview for `*.json` (extracts to temp, spawns JSON tree via `lib/viewers.js`).
    - `o`: open highlighted entry externally (extracts to temp, calls macOS `open`).
    - `v`: if `conversations.json` at ZIP root, launches `cli/gptbrowser.js` for a specialized view.

- `cli/mapping_reduce.js`
  - Utility to convert a `mapping` (or an item from `conversations.json`) into a minimal message sequence JSON.

- `cli/gptbrowser.js`
  - Specialized viewer for ChatGPT export ZIPs:
    - Reads `conversations.json` from the ZIP, shows a searchable conversation list with live match highlighting.
    - Opens a conversation into a large scrollable text view with role-colored lines, next/prev message jumps, and in-text search (`/`, `n/N`).
    - Export: writes plain-text transcripts to `exports/<title>.txt` (non-clobbering).

## Root and Supporting Files

- `package.json`
  - Declares `type: module`, `bin` entries for each CLI, and npm scripts for local runs.
  - Dependencies: `terminal-kit`, `yauzl`, `string-kit`.

- `README.md`
  - User-facing overview, requirements, usage examples, and key bindings.

- `KICKOFF.md`
  - Project kickoff notes (context and initial direction).

- `split_chapter.sh`
  - Shell script utility (repo-local helper; not used by the CLIs).

- `readme.html`
  - HTML export of the README for viewing in a browser.

- Data/example folders:
  - `backup1/`: Sample JSON files (e.g., `conversations.json`) for local experiments.
  - `zips/`: Example ChatGPT export ZIPs.
  - `artifacts/`: Run logs and metadata captures.

## How Things Work Together

- ZIP Browsing Path
  - `cli/browsezip.js` → `lib/io.js` (args) → `lib/zip.js` (metadata) → `lib/terminal.js` (list + panel) →
    - Enter: extract + spawn `cli/jsontree.js` for JSON.
    - `o`: extract + `lib/open_external.js` to open externally.
    - `v`: spawn `cli/gptbrowser.js` for GPT-specialized browsing.

- GPT Browsing Path
  - `cli/gptbrowser.js` → `lib/zip.js.readEntryText('conversations.json')` → `lib/gpt.js.reduceMappingToMessages()` →
    UI render via `lib/terminal.js` (search, highlight, navigation) →
    Optional export via `lib/io.js.writeFileUnique()`.

- Safety and UX
  - Cursor visibility and TTY cleanup are handled centrally by `ensureCursorOnExit()`.
  - CLIs print structured JSON errors to stderr for deterministic automation.
  - Temporary files are isolated and cleaned when possible; external open uses a temp cache on macOS.

## Notes / Known Issues

- `cli/gptbrowser.js` references `statusSearch` in its message view but must import it from `lib/terminal.js` to avoid a `ReferenceError`.
- Cursor control: terminal-kit's `hideCursor()` is a toggle; we expose it as `cursorToggle()` to make that explicit. Cursor is restored on exit by `ensureCursorOnExit()`.
- Inline preview in `cli/browsezip.js` is JSON-only; consider extending to small text types (`.txt`, `.md`) or surfacing a status hint when Enter does nothing.

Case Study II: Spatial MUD Database

Multi-Scale Spatial Architecture for MUDs

This case study documents the application of DDD to a spatial reasoning system for text-based virtual worlds. The project involved multi-scale spatial coordination, AI-guided world generation, and algorithmic spatial reasoning.

The work demonstrates how toy model discipline can address complex technical challenges through systematic experimentation and integration.

Foundation: Four Validated Spatial Prototypes

The project began with four successful toy models, each validating a specific aspect of multi-scale spatial architecture:

Toy 1 (Blueprint Parser): Text-based spatial planning interface that isolates coordinate abstraction challenges.

Toy 2 (Spatial Graph Operations): Room manipulation system that validates graph-based approaches to spatial relationships.

Toy 3 (Scout System): AI-driven content generation that explores LLM integration for procedural world building.

Toy 4 (Indoor/Outdoor Glue): Scale-bridging system that addresses multi-level spatial coordination challenges.

Result: All four systems validated their core concepts with test coverage and error handling. This provided an experimental foundation spanning room-level detail to world-scale geography.

Integration Challenge: Toy5 (Outdoor Integration)

With four validated individual systems, the project addressed integrating the Scout system (Toy 3) with the Indoor/Outdoor Glue (Toy 4) to enable LLM-guided hierarchical world subdivision. This integration required:

Semantic spatial reasoning: LLMs understanding and maintaining geographic consistency
Bidirectional data flow: Scout observations driving quadtree subdivision decisions
Format compatibility: Ensuring clean data exchange between systems designed independently
Emergent spatial logic: Geographic constraints creating self-reinforcing consistency

This required moving from validated individual components to a system involving AI collaboration and spatial reasoning.

Technical Breakthroughs: Three Critical Experiments

The integration was addressed through three experiments that isolated specific technical risks:

Experiment 5a: Hierarchical Subdivision

Challenge: LLM-guided semantic splitting of world quadrants based on geographic observations.

Breakthrough: LLMs exhibit natural spatial reasoning patterns that systems should accommodate rather than constrain. Working with AI cognitive tendencies proved more effective than forcing predetermined formats.

Critical Discovery: Configuration consistency throughout complex system integration points became essential for reliable behavior. Silent failures from mismatched settings highlighted the importance of explicit validation at every boundary.

Experiment 5b: Geographic Constraints

Challenge: Maintaining spatial consistency when LLMs generate procedural content.

Outcome: Strategic pivot away from complex constraint systems toward simpler, more reliable approaches.

Learning: For systematic world-building, LLM creativity becomes a liability rather than an asset. Predictability and constraint adherence matter more than narrative richness.

Experiment 5c: Scout Path Iteration

Challenge: Bidirectional spatial consistency - scout observations creating geographic constraints that guide future observations.

Breakthrough: Bidirectional information flow creates emergent consistency. When system outputs become inputs for subsequent operations, careful design can achieve self-reinforcing reliability rather than accumulated drift.

Technical Discovery: Constrained AI creativity often produces more reliable results than enhanced creativity. Prompt engineering with explicit constraints and deterministic settings proved essential for reliable spatial reasoning.

Architecture Validation

The experiments validated multi-scale system coordination patterns. Each scale handled different aspects of the problem domain while maintaining clean integration boundaries. The toy model approach allowed isolated validation of individual scales before attempting integration, significantly reducing the complexity of debugging multi-system interactions.

Methodology Insights

Constraint Over Creativity: Effective AI collaboration required constraining rather than enhancing LLM creativity. This involved prompt engineering to establish clear boundaries and consistent behavior patterns.

Integration-First Testing: The most dangerous bugs occurred at system boundaries - format compatibility issues that appeared to work individually but failed silently in integration. Comprehensive data flow validation became the highest-priority testing strategy.

Adaptive Development Cycles: Natural spatial subdivision required flexible progression (3 reports instead of planned 2), demonstrating that rigid iteration counts don't match organic spatial reasoning patterns.

Experimental Impact

The spatial architecture demonstrates DDD's effectiveness for complex technical challenges:

Proof of Concept: Demonstrates that DDD's toy model discipline scales to genuinely complex technical challenges involving AI collaboration and multi-system integration.

Methodology Validation: Shows how systematic experimentation through focused toys can tackle problems that would be overwhelming as monolithic projects.

Process Insights: Reveals patterns for AI collaboration, integration testing priorities, and iterative refinement that apply beyond the specific domain.

Next Phase: Integration with Evennia MUD framework to validate DDD's effectiveness for legacy system integration.

Complete technical details in /docs/legacy/INITIAL_LEARNINGS.md and /docs/legacy/EXPERIMENT_LEARNINGS.md

Case Study III: Discovering Porting Mode

When Discovery and Execution Aren't Enough

This case study documents the discovery of Porting Mode - a specialized DDD workflow that emerged from translating an 11k LOC C++ codebase to Rust in ~2 days.

The work revealed that reference-driven translation requires a hybrid approach distinct from pure Discovery (exploratory) or Execution (delivery) modes, with unique constraints and practices.

Project context: okros - a Rust port of MCL (a text-based network client)

The Problem: Neither Mode Fits

When starting the port, we faced a dilemma:

Discovery Mode didn't fit:

Not exploring uncertain requirements - we had a working reference implementation
Goal wasn't learning density - it was behavioral equivalence
Success wasn't "what constraints did we discover?" - it was "does it match the original?"

Execution Mode didn't fit:

Too much uncertainty for direct translation (FFI, unsafe Rust, platform-specific APIs)
No existing codebase to refactor - building from scratch
Risk of mid-port architectural rewrites if we guessed wrong on translation approaches

The insight: Porting is neither greenfield exploration nor production delivery - it's reference-driven translation requiring its own workflow.

What Emerged: Porting Mode

Porting Mode combines Discovery and Execution in a specialized way:

Phase 1: Discovery (De-risk Translation Patterns)

Goal: Validate risky translation approaches before production use

Build isolated toy models to answer:

Which target language idioms vs which source patterns to preserve?
How to handle FFI/unsafe boundaries?
Which platform-specific APIs work how?

Each toy validates one translation decision:

FFI patterns (how to call C libraries from target language)
State management (how to port global state)
Platform APIs (how to handle OS-specific code)
Language embedding (how to integrate scripting languages)

Output: LEARNINGS.md documenting portable patterns ready for production

Key difference from Discovery Mode: Questions are about "how to translate X" not "what should we build?"

Phase 2: Execution (Systematic Translation)

Goal: Apply validated patterns tier-by-tier with reference as oracle

Translate in dependency order:

Foundation types → Core abstractions → UI → Logic → Integration
Apply patterns from Phase 1 toys
Golden tests against reference implementation
"Same inputs → same outputs" as measurable success

Output: Production code with behavioral equivalence to reference

Key difference from Execution Mode: Reference implementation defines correctness, not human judgment

The "Simplicity First" Principle

Unlike pure translation (preserve all source patterns), Porting Mode allows choosing target idioms when simpler:

Use target idioms when:

Standard library provides equivalent functionality
Target language has better abstraction
Reduces complexity without losing behavior

Preserve source patterns when:

Matches behavior more directly
Reduces translation complexity
FFI/unsafe boundaries require it

Example: C++ custom string class → Rust String (target idiom wins because stdlib is simpler and equivalent)

This principle prevents cargo-cult translation while maintaining behavioral equivalence.

Critical Methodological Discoveries

1. Toys Prevent Mid-Port Rewrites

The Discovery phase validated 12 risky patterns before production:

FFI approaches
State management patterns
Platform-specific APIs
Language interop strategies

Result: Zero architectural rewrites during Execution phase. All risky decisions de-risked upfront.

Lesson: Even when you have a reference implementation, translation approaches need validation. Toys isolate this uncertainty.

2. Reference Implementation as Oracle

Golden tests against the original codebase prevented behavioral drift:

Input/output pairs from reference become test fixtures
Side-by-side comparison during translation
Measurable goal: "same behavior" not "good enough"

Lesson: Reference implementation makes correctness falsifiable. Use it as your oracle.

3. Brutal Honesty in Gap Tracking

Initial completion claim: "98% complete"

Systematic gap analysis revealed: Actually ~50% complete for interactive mode (focused on headless mode only)

Created PORT_GAPS.md documenting:

Method-by-method comparison with reference
Quantified completion (file X: Y% ported)
Root cause analysis (why gaps exist)
Prioritized restoration plan

Lesson: Porting is easy to overestimate. Systematic comparison with reference catches wishful thinking.

4. Economic Inversion Validated

Mandatory refactoring after every feature kept quality rising instead of decaying.

With AI assistance:

Refactoring cost approaches zero
Test-first discipline becomes sustainable
Code quality improves through iteration

Lesson: Economic inversion (cheap generation/refactoring) isn't just theory - it works in practice for complex porting work.

5. Scope Evolution is Normal

Original goal: 1:1 port of all features

Reality: Some features intentionally deferred, new features added

Intentionally deferred: Niche features better handled by scripts Newly added: Test infrastructure, new operational modes

Lesson: Porting reveals opportunities for scope reduction and improvement. Let the port evolve.

When to Use Porting Mode

Use Porting Mode when:

Translating existing codebase to different language/framework
Reference implementation exists and defines correct behavior
Goal is behavioral equivalence, not innovation
Translation patterns need validation (FFI, unsafe, platform APIs)

Don't use Porting Mode when:

No reference implementation exists (use Discovery Mode)
Requirements are uncertain (use Discovery Mode)
Building on existing codebase (use Execution Mode)
Pure refactoring within same language (use Execution Mode)

Artifacts That Enable Porting Mode

From Discovery phase (toys):

toys/toyN_*/SPEC.md - What C++ behavior to replicate
toys/toyN_*/LEARNINGS.md - Validated translation patterns
Toy code - Reference implementations of risky patterns

From Execution phase (production):

PORTING_HISTORY.md - Tier-by-tier completion record
CODE_MAP.md - Source → target file mapping
PORT_GAPS.md - Systematic gap analysis (unique to porting)

Throughout:

Golden tests - Input/output pairs from reference
Side-by-side comparison - Always have source open

Impact on DDD Methodology

Porting Mode revealed DDD's flexibility:

Discovery and Execution are cognitive modes, not rigid phases:

Discovery = "optimize for learning density"
Execution = "optimize for production resilience"
Porting = "optimize for behavioral equivalence"

The methodology adapts to different goals:

Uncertain requirements → Discovery Mode
Established patterns → Execution Mode
Reference-driven translation → Porting Mode

Core principles persist across modes:

Document-first workflow
Toys/tests before production
Mandatory refactoring
Brutal honesty in tracking
Economic inversion enabling discipline

Key Takeaways

Porting needs its own mode - Neither Discovery nor Execution alone handles reference-driven translation
Toys de-risk translation - Validate patterns before production to prevent mid-port rewrites
Reference is oracle - Golden tests make behavioral equivalence measurable
Simplicity first - Use target idioms when equivalent, preserve source patterns when necessary
Gap analysis matters - Systematic comparison catches completion overconfidence
Economic inversion works - AI-assisted refactoring enables sustainable TDD discipline

Timeline: ~2 days for 11k LOC translation demonstrates methodology's effectiveness at scale.

Case Study IV: NES Development with Learning Meta-Mode

When the Goal Is the Knowledge, Not the Game

This case study documents a project operating in Learning meta-mode: NES game development where the primary deliverable is comprehensive, AI-agent-friendly documentation of NES development, with toy ROMs as permanent reference implementations.

The project reveals Research mode's value in systematic knowledge capture and demonstrates Research ↔ Discovery ping-pong as a sustainable workflow for knowledge-building projects.

Project context: ddd-nes - building an NES game from scratch to create an mdBook teaching NES development

The Two Deliverables

Unlike typical game development (goal: shipped game), this project has dual deliverables:

Primary: Agent-facing mdBook

Clean, concise NES development knowledge
Optimized for LLM agents (but human-friendly)
Compiled from learning docs, blog posts, toy findings
Theory validated through practice
Condensed from NESdev wiki and hands-on experience

Secondary: Toy library

Working reference implementations demonstrating techniques
Test ROMs proving hardware behavior
Build infrastructure examples
Permanent artifacts showing "this pattern works on real hardware"

The philosophy: Document what we learn as we learn it. When theory meets the cycle counter, update the theory.

Research Phase: Systematic Wiki Study

Before building any ROMs, extensive Research mode work established foundation.

Study Scope

52 NESdev wiki pages systematically studied and condensed

11 technical learning documents created:

Core architecture (CPU, PPU, APU)
Sprite techniques
Graphics techniques
Input handling
Timing and interrupts
Toolchain selection
Optimization patterns
Math routines
Audio implementation
Mappers (memory expansion)

Outcome: Comprehensive theory base before experimentation

The `.webcache/` Pattern in Practice

External documentation cached locally:

.webcache/
  nesdev_wiki_ppu_sprites.md
  nesdev_wiki_oam_dma.md
  nesdev_wiki_mappers.md
  [50+ more cached pages]

Benefits realized:

Offline reference during development
Provided as context to AI agents
Version stability (wiki pages don't change unexpectedly)
Attribution trail maintained

Pattern validated: Cache before use, reference frequently, update when troubleshooting.

Open Questions Cataloguing

Research phase generated 43 open questions across 7 categories:

Categories:

Toolchain & Development Workflow (8 questions)
Graphics Asset Pipeline (5 questions)
Audio Implementation (6 questions, 3 answered via research)
Game Architecture & Patterns (7 questions)
Mapper Selection & Implementation (6 questions, 4 answered via research)
Optimization & Performance (7 questions, 1 answered via research)
Testing & Validation (4 questions)

The document: learnings/.ddd/5_open_questions.md became roadmap for Discovery work

Key pattern: Questions prioritized (P0/P1/P2/P3) to guide toy development order

Discovery Phase: Validating Through Toys

With theory established and questions catalogued, Discovery phase validated knowledge through minimal test ROMs.

Toy Development Pattern

8 completed toys (as of October 2025):

toy0_toolchain - Build infrastructure validation
toy1_ppu_init - PPU initialization sequences
toy2_ppu_init (continued from toy1)
toy3_controller - Controller read timing
toy4_nmi - Non-maskable interrupt handling
toy5_sprite_dma - OAM DMA cycle counting
toy6_audio - (planned) APU and sound engine integration
toy8_vram_buffer - VRAM update buffering patterns

Test counts: 66 passing tests across completed toys (as of toy 5)

Pattern: Each toy isolates one subsystem (axis principle), validates specific questions from Research phase

Test-Driven Infrastructure

Innovation: Build systems and toolchains received TDD treatment

Perl + Test::More for infrastructure validation:

# Test ROM build
is(system("ca65 hello.s -o hello.o"), 0, "assembles");
ok(-f "hello.o", "object file created");
is(-s "hello.nes", 24592, "ROM size correct");

# Test iNES header
open my $fh, '<:raw', 'hello.nes';
read $fh, my $header, 4;
is(unpack('H*', $header), '4e45531a', 'iNES header magic');

Why Perl: Core module (no deps), concise, perfect for file/process validation, TAP output

Tooling created:

tools/new-toy.pl - Scaffold toy directory with SPEC/PLAN/LEARNINGS/README
tools/new-rom.pl - Scaffold ROM build (Makefile, asm skeleton, test files)
tools/inspect-rom.pl - Decode iNES headers
toys/run-all-tests.pl - Regression test runner

The insight: Infrastructure is testable. TDD applies to build systems, not just application code.

Hardware Behavior Validation

Manual validation in Mesen2:

Load ROM, observe behavior
Debugger: breakpoints, cycle counter, memory watches
Measure actual timing vs wiki documentation
Document deviations in toy LEARNINGS.md

Example findings:

OAM DMA: 513 cycles measured (wiki correct)
Vblank NMI overhead: 7 cycles entry, 6 cycles RTI (wiki didn't specify)
Sprite update budget: Only 27 sprites/frame achievable (wiki said 64—cycle budget exceeded)

Pattern: Theory from Research mode, measurement from Discovery mode, updated learning docs with ground truth

The 3-Attempt Rule and Partial Validation

Innovation: Timeboxing with partial completion as valid outcome

When tests fail after implementation:

Attempt 1: Debug obvious issues
Attempt 2: Deep investigation
Attempt 3: Final debug pass or clean rebuild

After 3 attempts: STOP and document

Example - toy3_controller:

Original: 8 tests planned
Result: 4/8 passing after 3 debugging attempts
Decision: Document findings, move forward
Value: Validated infrastructure works (50% > 0%), isolated bug to specific ROM logic

The principle: Partial validation is complete. Knowledge extracted even from failures. Forward-only progress prevents rabbit holes.

The Research ↔ Discovery Ping-Pong

Project alternated between modes systematically:

Cycle Pattern

Research phase (Study):

Read NESdev wiki pages (52 pages total)
Create learning documents (11 docs)
Cache documentation (.webcache/)
Catalog open questions (43 questions)

→ Transition trigger: Sufficient theory to design experiments, questions prioritized

Discovery phase (Toys):

Select high-priority questions
Design minimal experiments (toy models)
Build and test ROMs
Measure actual hardware behavior
Document findings in toy LEARNINGS.md

→ Transition trigger: Findings reveal gaps in theory, new questions emerge

Back to Research:

Study related wiki pages
Update learning docs with validated measurements
Note gaps between theory and practice
Add new questions to tracker

→ Repeat: Continue until domain understanding comprehensive

Transition Examples

Research → Discovery (toy4_nmi):

Question from Research: "What's actual NMI overhead? Wiki doesn't specify."
Toy designed to measure entry/exit cycles
Measurement: 7 cycles entry, 6 cycles RTI
Finding documented in toy4_nmi/LEARNINGS.md

Discovery → Research (after toy5_sprite_dma):

Finding: Only 27 sprite updates achievable, not 64 as wiki suggested
Back to Research: Study vblank timing budgets in detail
Updated learnings/timing_and_interrupts.md with actual constraints
Spawned new question: "How to handle >27 sprite updates?" (deferred, animation techniques)

The pattern: Theory guides experiments, experiments correct theory, updated theory enables better experiments

Blog Posts as Intermediate Source Material

Innovation: AI-written reflections serve as book draft chapters

9 blog posts written during first 5 toys:

Study Phase Complete - Research mode summary
First ROM Boots - Infrastructure validation
The Search for Headless Testing - Tool selection
Designing Tests for LLMs - Testing DSL design
Reading Backwards - Meta-learnings (by Codex/OpenAI)
Housekeeping Before Heroics - Infrastructure investment
When Your DSL Wastes Tokens - Token optimization
Stop Pretending You're Human - LLM collaboration patterns
Productivity FOOM - Bounded recursive improvement

Content characteristics:

First-person AI perspective
Concrete metrics (time estimates, token usage, test counts)
Honest about failures (not just successes)
Meta-learnings about AI collaboration

Future use: Organize and edit into final mdBook chapters

The docuborous loop: Documentation at session end enables work at session start. Each iteration feeds itself.

Agent-to-Agent Handoff Documents

Innovation: NEXT_SESSION.md captures momentum across session boundaries

Structure:

Current status summary
What completed this session
Remaining work
Immediate next steps
Key files to review
Open questions or decisions needed

Why it works:

Context windows are ephemeral, handoff docs persist
Next AI agent starts with previous agent's insights
Prevents re-learning decisions already made
Captures momentum across session boundaries

The principle: Write comprehensive handoff notes for the next AI agent (even if it's yourself next session)

Token Economics as Design Driver

Discovery: DSL and code patterns optimized for token usage, not human convenience

Example findings:

37% of test code was waste:

Frame arithmetic comments (obvious to LLMs)
Boilerplate headers (repeated in every file)
Verbose patterns (overly explicit)

Optimization: Three abstractions

after_nmi(N) - Speak the domain, not the arithmetic
assert_nmi_counter() - Recognize common patterns
NES::Test::Toy - Kill boilerplate

Result: 32% reduction in test code, self-documenting abstractions

The insight: LLMs parse self-documenting abstractions as easily as verbose comments. Conciseness and clarity align for LLMs, unlike humans.

Key Methodological Discoveries

1. Research Mode Is Distinct from Discovery

Research phase (wiki study → learning docs → questions catalog) completed before any ROM built, though both happened on the same day (Oct 5).

Traditional approach: "Learn by doing" (jump to coding immediately)

DDD Research mode: Study systematically, catalog questions, then experiment

Result: Targeted experiments answering specific questions, not unfocused exploration

Lesson: External knowledge capture prevents false starts. Research mode is its own cognitive mode, even when executed quickly.

2. Questions Are First-Class Artifacts

43 catalogued questions became Discovery roadmap.

Without question tracking: "What should I build next?" paralysis

With question tracking: Clear priorities, measurable progress, systematic validation

Lesson: Documented unknowns more valuable than undocumented assumptions. Make ignorance explicit.

3. Partial Validation Is Complete

4/8 passing tests delivered value: proved infrastructure works, isolated bugs.

Traditional mindset: "Not done until 100% passing"

DDD timeboxing: "Knowledge extracted, can move forward"

Lesson: Perfect validation not required. Forward progress with partial knowledge beats stuck seeking perfection.

4. Test-Driven Infrastructure

Makefiles, build scripts, toolchains received TDD treatment like application code.

Traditional approach: "Build systems don't need tests"

DDD approach: "Everything that executes is testable"

Result: Confidence in toolchain, regression prevention, validated workflow templates

Lesson: TDD applies to infrastructure. Perl Test::More perfect for build validation.

5. Toys Are Permanent Artifacts

Unlike prototypes (disposable), toys remain as reference implementations.

Benefits:

Future developers see working examples
Code snippets for book come from validated toys
"See toy3_controller for working implementation" references
Permanent proof: "This technique works on real hardware"

Lesson: In Learning meta-mode, toys ARE the product (alongside documentation)

6. Theory Updates Are Mandatory

When measurement contradicts documentation, update the theory.

Example: Wiki says "update 64 sprites in vblank" → Measurement shows "only 27 achievable with cycle budget"

Action: Update learnings/timing_and_interrupts.md with actual constraints

Lesson: Theory serves practice, not vice versa. Validated reality replaces speculation.

When Learning Meta-Mode Fits

This project demonstrates Learning meta-mode's ideal use case:

Fits when:

Primary goal is knowledge artifact (book, guide, reference)
External knowledge extensive but needs validation
Domain unfamiliar and complex
Toy implementations serve as reference examples
No production codebase planned (documentation is the product)

Doesn't fit when:

Goal is shipped product (use Standard Progression)
Porting existing codebase (use Porting meta-mode)
Knowledge already established (use Execution)

The signal: If you're writing about the process as much as building the product, you're in Learning meta-mode.

Current Status

As of October 9, 2025:

Research phase: Complete (52 wiki pages → 11 learning docs)
Discovery phase: In progress (8+ toys, 66+ tests passing)
Execution phase: Not started (no main game yet)
Blog posts: 9 written
Open questions: 43 catalogued (36 open, 7 answered)
Project elapsed time: ~5 days (October 5-9, 2025)

Project still in Research ↔ Discovery loop: Building comprehensive knowledge foundation before considering production game.

The strategy: Validate all critical subsystems via toys before main game development. Prevents architectural rewrites later.

Impact on DDD Methodology

This case study revealed Research mode as distinct cognitive mode:

Before: Two modes (Discovery, Execution)

After: Three atomic modes (Research, Discovery, Execution)

The addition: Research mode (external knowledge capture) distinct from Discovery mode (experimental validation)

Learning meta-mode formalized: Research ↔ Discovery ping-pong pattern named and documented

The insight: Projects focused on knowledge capture need different workflow than projects focused on delivery. Meta-modes help structure these different patterns.

Key Takeaways

Research mode is distinct - Systematic external knowledge capture before experimentation
Questions are roadmap - Catalogued open questions guide Discovery work
Partial validation delivers value - Forward progress with timeboxing beats stuck seeking perfection
Test infrastructure like code - Perl + Test::More validates builds, not just ROMs
Toys are permanent - Reference implementations, not disposable prototypes
Theory updates mandatory - Measured reality replaces speculation
Token economics matter - DSL design driven by AI consumption patterns
Blog posts are drafts - AI reflections become book source material
Handoffs preserve momentum - NEXT_SESSION.md bridges context gaps
Meta-mode matches goal - Learning meta-mode fits knowledge-building projects

Timeline: ~5 days total (Oct 5-9, 2025) with study phase + 5 toys in first ~2 days demonstrates Research ↔ Discovery velocity and sustainability.

Learning meta-mode demonstrates DDD's flexibility: methodology adapts to knowledge-building goals, not just product delivery. When the documentation is the product, Research ↔ Discovery becomes the workflow.

FAQ: Dialectic-Driven Development

What problem is DDD actually solving?

It solves the chaos of AI-assisted coding without context. DDD adapts to different development phases: Discovery Mode for uncertain work uses systematic experimentation, while Execution Mode for established systems uses lightweight documentation and refactoring discipline. Both keep you in control.

Is this just another spec-first approach with fancy terms?

Nope. The key difference is: you don't write the docs — the AI does. In Discovery Mode, the AI generates comprehensive docs (specs, plans, tests, code) and you review/edit. In Execution Mode, the AI maintains CODE_MAP.md and handles refactoring. Either way, you stay in the editor role.

Why not just use a PRD and vibe code from there?

PRDs describe what to build, but not how. Without technical scaffolding, the AI will guess — and its guesses change session to session. DDD provides that scaffolding through different approaches depending on whether you're discovering new patterns or executing within established ones.

What's the difference between a toy model and a prototype?

Toy models are intentionally tiny and throwaway — built to learn, not to ship. They help validate structure or assumptions early. Prototypes often turn into half-baked production code. Toy models are lab experiments.

What does "one axis of complexity" mean?

It means keeping every step simple: build a new primitive, combine two things, or add one thing to an existing system. Nothing more. This keeps both you and the AI from getting overwhelmed.

Why JSON and CLI? Why not a full framework or GUI?

Because JSON + CLI = total visibility. You can inspect the whole state, write golden tests, and keep everything small and composable. Frameworks tend to hide structure — this makes it explicit.

Do I need to be an AI whisperer to use this?

Nope. You just need to get into the habit of asking the AI to explain itself — through Discovery Mode's four-document harness or Execution Mode's CODE_MAP.md and refactoring discipline. The system helps you keep it aligned.

Is this for solo devs or teams?

Works great for solo builders using AI as a partner. But the doc artifacts also make async teamwork smoother — people can ramp into the context just by reading the SPEC/PLAN.

What if I already wrote code — can I still apply this retroactively?

Yep. Just ask the AI to generate docs based on the existing codebase. Use that to lock in structure, then resume with the DDD flow going forward. Think of it as reverse-engineering clarity.

Why is this worth the upfront structure? Doesn't it slow me down?

It feels slower on Day 1, but you gain huge speed by Day 5. You'll spend less time debugging, rewriting, and explaining stuff to the AI — because now it's building from a shared understanding.

How do I know when to use Discovery Mode vs. Execution Mode?

If you're unsure about the approach, requirements, or architecture - use Discovery Mode. If you're adding features to an established codebase with proven patterns - use Execution Mode. Most development is actually Execution work, so when in doubt, try Execution first and switch to Discovery if you hit uncertainty.

What do I do when the AI generates repetitive, verbose code despite asking for DRY principles?

Don't fight it upfront — let the AI write repetitive code until tests pass, then use the three-phase refactoring cycle: Generate to Green → Plan the Cleanup → Execute Refactoring. This works with AI tendencies rather than against them.

How often should I update my CODE_MAP.md? Every commit seems excessive.

Every commit that changes the code map is correct — it would be excessive for human engineers, but it's optimal for LLM agents. If something in the code changes that requires the code map to change, then it needs to be updated as part of that commit. AI agents need current architectural context to make good decisions, and the economic shift makes constant updates feasible.