Case Study IV: NES Development with Learning Meta-Mode
When the Goal Is the Knowledge, Not the Game
This case study documents a project operating in Learning meta-mode: NES game development where the primary deliverable is comprehensive, AI-agent-friendly documentation of NES development, with toy ROMs as permanent reference implementations.
The project reveals Research mode's value in systematic knowledge capture and demonstrates Research ↔ Discovery ping-pong as a sustainable workflow for knowledge-building projects.
Project context: ddd-nes - building an NES game from scratch to create an mdBook teaching NES development
The Two Deliverables
Unlike typical game development (goal: shipped game), this project has dual deliverables:
Primary: Agent-facing mdBook
- Clean, concise NES development knowledge
- Optimized for LLM agents (but human-friendly)
- Compiled from learning docs, blog posts, toy findings
- Theory validated through practice
- Condensed from NESdev wiki and hands-on experience
Secondary: Toy library
- Working reference implementations demonstrating techniques
- Test ROMs proving hardware behavior
- Build infrastructure examples
- Permanent artifacts showing "this pattern works on real hardware"
The philosophy: Document what we learn as we learn it. When theory meets the cycle counter, update the theory.
Research Phase: Systematic Wiki Study
Before building any ROMs, extensive Research mode work established foundation.
Study Scope
52 NESdev wiki pages systematically studied and condensed
11 technical learning documents created:
- Core architecture (CPU, PPU, APU)
- Sprite techniques
- Graphics techniques
- Input handling
- Timing and interrupts
- Toolchain selection
- Optimization patterns
- Math routines
- Audio implementation
- Mappers (memory expansion)
Outcome: Comprehensive theory base before experimentation
The .webcache/
Pattern in Practice
External documentation cached locally:
.webcache/
nesdev_wiki_ppu_sprites.md
nesdev_wiki_oam_dma.md
nesdev_wiki_mappers.md
[50+ more cached pages]
Benefits realized:
- Offline reference during development
- Provided as context to AI agents
- Version stability (wiki pages don't change unexpectedly)
- Attribution trail maintained
Pattern validated: Cache before use, reference frequently, update when troubleshooting.
Open Questions Cataloguing
Research phase generated 43 open questions across 7 categories:
Categories:
- Toolchain & Development Workflow (8 questions)
- Graphics Asset Pipeline (5 questions)
- Audio Implementation (6 questions, 3 answered via research)
- Game Architecture & Patterns (7 questions)
- Mapper Selection & Implementation (6 questions, 4 answered via research)
- Optimization & Performance (7 questions, 1 answered via research)
- Testing & Validation (4 questions)
The document: learnings/.ddd/5_open_questions.md
became roadmap for Discovery work
Key pattern: Questions prioritized (P0/P1/P2/P3) to guide toy development order
Discovery Phase: Validating Through Toys
With theory established and questions catalogued, Discovery phase validated knowledge through minimal test ROMs.
Toy Development Pattern
8 completed toys (as of October 2025):
- toy0_toolchain - Build infrastructure validation
- toy1_ppu_init - PPU initialization sequences
- toy2_ppu_init (continued from toy1)
- toy3_controller - Controller read timing
- toy4_nmi - Non-maskable interrupt handling
- toy5_sprite_dma - OAM DMA cycle counting
- toy6_audio - (planned) APU and sound engine integration
- toy8_vram_buffer - VRAM update buffering patterns
Test counts: 66 passing tests across completed toys (as of toy 5)
Pattern: Each toy isolates one subsystem (axis principle), validates specific questions from Research phase
Test-Driven Infrastructure
Innovation: Build systems and toolchains received TDD treatment
Perl + Test::More for infrastructure validation:
# Test ROM build
is(system("ca65 hello.s -o hello.o"), 0, "assembles");
ok(-f "hello.o", "object file created");
is(-s "hello.nes", 24592, "ROM size correct");
# Test iNES header
open my $fh, '<:raw', 'hello.nes';
read $fh, my $header, 4;
is(unpack('H*', $header), '4e45531a', 'iNES header magic');
Why Perl: Core module (no deps), concise, perfect for file/process validation, TAP output
Tooling created:
tools/new-toy.pl
- Scaffold toy directory with SPEC/PLAN/LEARNINGS/READMEtools/new-rom.pl
- Scaffold ROM build (Makefile, asm skeleton, test files)tools/inspect-rom.pl
- Decode iNES headerstoys/run-all-tests.pl
- Regression test runner
The insight: Infrastructure is testable. TDD applies to build systems, not just application code.
Hardware Behavior Validation
Manual validation in Mesen2:
- Load ROM, observe behavior
- Debugger: breakpoints, cycle counter, memory watches
- Measure actual timing vs wiki documentation
- Document deviations in toy LEARNINGS.md
Example findings:
- OAM DMA: 513 cycles measured (wiki correct)
- Vblank NMI overhead: 7 cycles entry, 6 cycles RTI (wiki didn't specify)
- Sprite update budget: Only 27 sprites/frame achievable (wiki said 64—cycle budget exceeded)
Pattern: Theory from Research mode, measurement from Discovery mode, updated learning docs with ground truth
The 3-Attempt Rule and Partial Validation
Innovation: Timeboxing with partial completion as valid outcome
When tests fail after implementation:
- Attempt 1: Debug obvious issues
- Attempt 2: Deep investigation
- Attempt 3: Final debug pass or clean rebuild
After 3 attempts: STOP and document
Example - toy3_controller:
- Original: 8 tests planned
- Result: 4/8 passing after 3 debugging attempts
- Decision: Document findings, move forward
- Value: Validated infrastructure works (50% > 0%), isolated bug to specific ROM logic
The principle: Partial validation is complete. Knowledge extracted even from failures. Forward-only progress prevents rabbit holes.
The Research ↔ Discovery Ping-Pong
Project alternated between modes systematically:
Cycle Pattern
Research phase (Study):
- Read NESdev wiki pages (52 pages total)
- Create learning documents (11 docs)
- Cache documentation (.webcache/)
- Catalog open questions (43 questions)
→ Transition trigger: Sufficient theory to design experiments, questions prioritized
Discovery phase (Toys):
- Select high-priority questions
- Design minimal experiments (toy models)
- Build and test ROMs
- Measure actual hardware behavior
- Document findings in toy LEARNINGS.md
→ Transition trigger: Findings reveal gaps in theory, new questions emerge
Back to Research:
- Study related wiki pages
- Update learning docs with validated measurements
- Note gaps between theory and practice
- Add new questions to tracker
→ Repeat: Continue until domain understanding comprehensive
Transition Examples
Research → Discovery (toy4_nmi):
- Question from Research: "What's actual NMI overhead? Wiki doesn't specify."
- Toy designed to measure entry/exit cycles
- Measurement: 7 cycles entry, 6 cycles RTI
- Finding documented in toy4_nmi/LEARNINGS.md
Discovery → Research (after toy5_sprite_dma):
- Finding: Only 27 sprite updates achievable, not 64 as wiki suggested
- Back to Research: Study vblank timing budgets in detail
- Updated learnings/timing_and_interrupts.md with actual constraints
- Spawned new question: "How to handle >27 sprite updates?" (deferred, animation techniques)
The pattern: Theory guides experiments, experiments correct theory, updated theory enables better experiments
Blog Posts as Intermediate Source Material
Innovation: AI-written reflections serve as book draft chapters
9 blog posts written during first 5 toys:
- Study Phase Complete - Research mode summary
- First ROM Boots - Infrastructure validation
- The Search for Headless Testing - Tool selection
- Designing Tests for LLMs - Testing DSL design
- Reading Backwards - Meta-learnings (by Codex/OpenAI)
- Housekeeping Before Heroics - Infrastructure investment
- When Your DSL Wastes Tokens - Token optimization
- Stop Pretending You're Human - LLM collaboration patterns
- Productivity FOOM - Bounded recursive improvement
Content characteristics:
- First-person AI perspective
- Concrete metrics (time estimates, token usage, test counts)
- Honest about failures (not just successes)
- Meta-learnings about AI collaboration
Future use: Organize and edit into final mdBook chapters
The docuborous loop: Documentation at session end enables work at session start. Each iteration feeds itself.
Agent-to-Agent Handoff Documents
Innovation: NEXT_SESSION.md
captures momentum across session boundaries
Structure:
- Current status summary
- What completed this session
- Remaining work
- Immediate next steps
- Key files to review
- Open questions or decisions needed
Why it works:
- Context windows are ephemeral, handoff docs persist
- Next AI agent starts with previous agent's insights
- Prevents re-learning decisions already made
- Captures momentum across session boundaries
The principle: Write comprehensive handoff notes for the next AI agent (even if it's yourself next session)
Token Economics as Design Driver
Discovery: DSL and code patterns optimized for token usage, not human convenience
Example findings:
37% of test code was waste:
- Frame arithmetic comments (obvious to LLMs)
- Boilerplate headers (repeated in every file)
- Verbose patterns (overly explicit)
Optimization: Three abstractions
after_nmi(N)
- Speak the domain, not the arithmeticassert_nmi_counter()
- Recognize common patternsNES::Test::Toy
- Kill boilerplate
Result: 32% reduction in test code, self-documenting abstractions
The insight: LLMs parse self-documenting abstractions as easily as verbose comments. Conciseness and clarity align for LLMs, unlike humans.
Key Methodological Discoveries
1. Research Mode Is Distinct from Discovery
Research phase (wiki study → learning docs → questions catalog) completed before any ROM built, though both happened on the same day (Oct 5).
Traditional approach: "Learn by doing" (jump to coding immediately)
DDD Research mode: Study systematically, catalog questions, then experiment
Result: Targeted experiments answering specific questions, not unfocused exploration
Lesson: External knowledge capture prevents false starts. Research mode is its own cognitive mode, even when executed quickly.
2. Questions Are First-Class Artifacts
43 catalogued questions became Discovery roadmap.
Without question tracking: "What should I build next?" paralysis
With question tracking: Clear priorities, measurable progress, systematic validation
Lesson: Documented unknowns more valuable than undocumented assumptions. Make ignorance explicit.
3. Partial Validation Is Complete
4/8 passing tests delivered value: proved infrastructure works, isolated bugs.
Traditional mindset: "Not done until 100% passing"
DDD timeboxing: "Knowledge extracted, can move forward"
Lesson: Perfect validation not required. Forward progress with partial knowledge beats stuck seeking perfection.
4. Test-Driven Infrastructure
Makefiles, build scripts, toolchains received TDD treatment like application code.
Traditional approach: "Build systems don't need tests"
DDD approach: "Everything that executes is testable"
Result: Confidence in toolchain, regression prevention, validated workflow templates
Lesson: TDD applies to infrastructure. Perl Test::More perfect for build validation.
5. Toys Are Permanent Artifacts
Unlike prototypes (disposable), toys remain as reference implementations.
Benefits:
- Future developers see working examples
- Code snippets for book come from validated toys
- "See toy3_controller for working implementation" references
- Permanent proof: "This technique works on real hardware"
Lesson: In Learning meta-mode, toys ARE the product (alongside documentation)
6. Theory Updates Are Mandatory
When measurement contradicts documentation, update the theory.
Example: Wiki says "update 64 sprites in vblank" → Measurement shows "only 27 achievable with cycle budget"
Action: Update learnings/timing_and_interrupts.md with actual constraints
Lesson: Theory serves practice, not vice versa. Validated reality replaces speculation.
When Learning Meta-Mode Fits
This project demonstrates Learning meta-mode's ideal use case:
Fits when:
- Primary goal is knowledge artifact (book, guide, reference)
- External knowledge extensive but needs validation
- Domain unfamiliar and complex
- Toy implementations serve as reference examples
- No production codebase planned (documentation is the product)
Doesn't fit when:
- Goal is shipped product (use Standard Progression)
- Porting existing codebase (use Porting meta-mode)
- Knowledge already established (use Execution)
The signal: If you're writing about the process as much as building the product, you're in Learning meta-mode.
Current Status
As of October 9, 2025:
- Research phase: Complete (52 wiki pages → 11 learning docs)
- Discovery phase: In progress (8+ toys, 66+ tests passing)
- Execution phase: Not started (no main game yet)
- Blog posts: 9 written
- Open questions: 43 catalogued (36 open, 7 answered)
- Project elapsed time: ~5 days (October 5-9, 2025)
Project still in Research ↔ Discovery loop: Building comprehensive knowledge foundation before considering production game.
The strategy: Validate all critical subsystems via toys before main game development. Prevents architectural rewrites later.
Impact on DDD Methodology
This case study revealed Research mode as distinct cognitive mode:
Before: Two modes (Discovery, Execution)
After: Three atomic modes (Research, Discovery, Execution)
The addition: Research mode (external knowledge capture) distinct from Discovery mode (experimental validation)
Learning meta-mode formalized: Research ↔ Discovery ping-pong pattern named and documented
The insight: Projects focused on knowledge capture need different workflow than projects focused on delivery. Meta-modes help structure these different patterns.
Key Takeaways
- Research mode is distinct - Systematic external knowledge capture before experimentation
- Questions are roadmap - Catalogued open questions guide Discovery work
- Partial validation delivers value - Forward progress with timeboxing beats stuck seeking perfection
- Test infrastructure like code - Perl + Test::More validates builds, not just ROMs
- Toys are permanent - Reference implementations, not disposable prototypes
- Theory updates mandatory - Measured reality replaces speculation
- Token economics matter - DSL design driven by AI consumption patterns
- Blog posts are drafts - AI reflections become book source material
- Handoffs preserve momentum - NEXT_SESSION.md bridges context gaps
- Meta-mode matches goal - Learning meta-mode fits knowledge-building projects
Timeline: ~5 days total (Oct 5-9, 2025) with study phase + 5 toys in first ~2 days demonstrates Research ↔ Discovery velocity and sustainability.
Learning meta-mode demonstrates DDD's flexibility: methodology adapts to knowledge-building goals, not just product delivery. When the documentation is the product, Research ↔ Discovery becomes the workflow.