As of May 19, 2026, the implementation of Andrej Karpathy’s LLM Wiki has bifurcated into two primary workflows: code-based automation and manual markdown synthesis. The core mechanism remains a directory-based knowledge structure where an LLM acts as both archivist and researcher, separating immutable raw data from the living wiki.
Core Insight: The Wiki serves as an index-linked vault where the LLM performs periodic 'health checks' to ensure consistency across entries, rather than generating responses from scratch on every prompt.
Structural Breakdown
The system relies on a tripartite file architecture to maintain information integrity:
raw/: Holds the immutable original sources (PDFs, URLs, notes).wiki/: Houses the AI-synthesized markdown files, often categorized as entity pages or concepts.CLAUDE.md/AGENTS.md: Acts as the governing schema or system prompt that guides the AI in cross-referencing and updating the knowledge base.
| Feature | Code-Driven Implementation | Manual/Markdown-Heavy |
|---|---|---|
| Consistency | High (Hash-based integrity) | Variable (User-reliant) |
| Scalability | Stronger, but hits index bottlenecks | Limited by manual synthesis |
| Flexibility | Rigid, requires prompt engineering | Highly adaptive to nuance |
The Mechanics of Compounding Knowledge
Practitioners are increasingly finding that the effectiveness of the wiki hinges on the index file. By querying wiki/index.md rather than the raw/ archive, the model functions within a curated "second brain" that grows in complexity as more files are ingested.
Read More: India Trains 100,000 Students on Open Source AI with Meta
Ingestion Protocol: New files in
raw/trigger a scan of existingwiki/pages to identify existing contexts before generating new entries.Maintenance: Periodic manual intervention or AI health-checks prevent the "unwieldy index" problem often seen after a few hundred articles.
Version Control: Git is effectively utilized to track changes in both
raw/andwiki/directories, creating a searchable timeline of how information has been synthesized over time.
Contextual Evolution
The concept emerged in early April 2026, gaining rapid adoption as users looked for alternatives to monolithic Vector Databases. While earlier methods relied on automated agents to handle the ingestion loop, recent discourse emphasizes a "hybrid" approach where the human operator retains authority over the research direction.
Observers note that as the wiki scales, the risk is not just the volume of data, but the "semantic decay" of the index file. Without a rigorous schema, the relationships between entities risk becoming circular or redundant. Today, the method is moving away from experimental toy projects toward long-term personal infrastructure, emphasizing that the value is in the compounding synthesis of sources rather than mere information storage.