Skip to content

Shiki, mdBook & Documentation Platform Evaluation (2026)

Shiki, mdBook & Documentation Platform Evaluation (2026)

Section titled “Shiki, mdBook & Documentation Platform Evaluation (2026)”

Vox currently uses mdBook 0.4.40 with a hand-rolled highlight.js plugin (docs/theme/highlight-vox.js) to syntax-highlight .vox code blocks in the rendered documentation portal. This creates a three-way grammar drift problem: the VS Code extension uses a vox.tmLanguage.json (TextMate grammar), Neovim/ Helix use Tree-sitter .scm queries, and the docs site uses a separate regex-based highlight.js definition. Every time the Vox language surface changes, three grammars must be updated in lockstep.

Critical discovery: shiki ^4.0.1 is already a direct dependency of apps/editor/vox-vscode/package.json (line 441). The vox.tmLanguage.json grammar and markdown-injection.json are both present in apps/editor/vox-vscode/syntaxes/. Shiki is already ingesting the TextMate grammar inside the VS Code extension’s webview. The documentation portal is the only surface not yet unified.

This document weighs all viable doc platform options against a structured feature matrix and produces a ranked recommendation.


1.1 Current highlight.js Grammar vs. Reality

Section titled “1.1 Current highlight.js Grammar vs. Reality”

The current docs/theme/highlight-vox.js defines Vox keywords inline:

keyword: 'fn let mut if else match for in to return import type pub with on
actor workflow spawn http activity component routes'

Meanwhile vox.tmLanguage.json is the authoritative grammar consumed by VS Code and Shiki. Any keyword added to the language (e.g., @mcp.tool, @island, spawn, decorator attributes) must be manually duplicated into this separate highlight.js file. This is already drifting — the /*SSOT_HJS_KW*/ sentinel comments indicate intent but no enforcement mechanism exists.

Comprehensive research confirms: Markdown (.md / .mdx) remains the undisputed gold standard for LLM context in 2026. Key findings:

  • Converting HTML or proprietary formats to clean Markdown reduces token usage by 80–90% and significantly improves RAG semantic chunking accuracy.
  • LLMs navigate Markdown heading hierarchies (#, ##, ###) as structural landmarks; this is how they orient themselves within large context windows.
  • JSON/YAML formats are superior for machine-to-machine structured output but degrade LLM comprehension of prose documentation due to quote/escape noise.
  • XML tags are effective for prompt engineering (separating instructions from context) but are suboptimal as a file storage format.
  • The emerging llms.txt / llms-full.txt standard (already in use by Anthropic, Stripe, Vercel) provides a curated Markdown index of a site’s most important content for AI agent discovery. Vox already has docs/src/.well-known/llms.txt.

Conclusion: Do not migrate away from Markdown. The doc format is correct. The issue is the rendering toolchain, not the storage format.


Shiki is a syntax highlighter that uses the same TextMate grammar engine and VS Code themes as Visual Studio Code. It produces pre-rendered, token-accurate HTML at build time — zero client-side JavaScript required for highlighting.

Key technical facts (2026):

  • Shiki v4.x (current) requires Node.js ≥ 20.
  • Uses the Oniguruma regex engine via WASM for TextMate grammar execution.
  • Custom languages: pass any .tmLanguage.json object directly as a lang.
  • Singleton highlighter pattern is required for build performance on large sites (prevents re-initialization per code block).
  • Output: static <span> HTML with inline styles or CSS variables — no runtime JS dependency.
Dimensionhighlight.js (current)ShikiSyntect (Rust)
Grammar engineRegex (JS)TextMate (WASM)TextMate (native Rust)
VS Code fidelityLowExact matchVery high
Custom Vox languageSeparate JS file, manual driftLoad vox.tmLanguage.json directlyLoad vox.tmLanguage.json directly
Build-time renderingNo (browser JS)YesYes
Client JS payload~50 KB0 KB0 KB
Twoslash supportNoYes (type hovers in docs)No
VS Code theme importNoYesPartial
Active developmentMature/slowVery activeModerate
SSOT compliance❌ 3rd separate grammar✅ Shares vox.tmLanguage.json✅ Shares grammar

Syntect (pure Rust, used internally by some mdBook forks) is technically excellent but has a stale grammar ecosystem — grammars lag behind the VS Code marketplace by months to years. Shiki’s grammar library is crowd-sourced from VS Code extensions and is far more current.


3. Documentation Platform Comparison Matrix

Section titled “3. Documentation Platform Comparison Matrix”

The following platforms were evaluated across dimensions weighted specifically for the Vox project’s needs as an AI-native, Rust-first codebase with an existing VS Code extension and a MENS training corpus pipeline.

#PlatformEngineShiki SupportNative Runtime
AmdBook (current)RustRequires custom preprocessorRust (single binary)
BZolaRust (Giallo)No (own TextMate engine)Rust (single binary)
CVitePressVite + VueBuilt-in, first-classNode.js
DStarlight (Astro)AstroVia Expressive Code pluginNode.js
EDocusaurusReact + Next.jsVia @shikijs/rehype pluginNode.js
FMkDocs MaterialPython (Pygments)Post-processing onlyPython
GNextraNext.js + ReactVia remark/rehype pluginNode.js

Scoring: 5 = Excellent, 4 = Good, 3 = Acceptable, 2 = Poor, 1 = Not supported

Weight definitions used for Vox:

  • SSOT Grammar: Can the platform consume vox.tmLanguage.json directly without a separate grammar definition?
  • AI/LLM Readability: Markdown-first source, no proprietary format, plays well with RAG indexing.
  • Rust-native build: Can CI build docs without Node.js/Python as a required dependency?
  • Doctest integration: Can .vox or .rs code blocks be executed as tests?
  • Vox extension alignment: Does the platform use the same artifact pipeline as vox-vscode?
  • Versioning: Built-in or first-class versioned documentation (v0.5, v1.0, etc.).
  • Search quality: Built-in search sufficient for a technical audience; bonus for AI/RAG readiness.
  • Migration cost: Effort to migrate from current mdBook (~200 pages, SUMMARY.md, book.toml).
  • Long-term momentum: Community health, GitHub stars/activity trend in 2026.
  • Interactive components: Ability to embed demos, live REPLs, or rich interactive examples.
FeatureWeightmdBookZolaVitePressStarlightDocusaurusMkDocs
SSOT Grammar (Shiki/TM)10235541
AI/LLM Readability9555555
Rust-native build8551111
Doctest / vox-doc-pipeline8522221
Vox extension alignment7225441
Versioning6123353
Search quality63345 (Pagefind)54
Migration cost (inverted)7533322
Long-term momentum (2026)5335554
Interactive components4114552
i18n support3124543
Offline / no-CDN build5554434
Weighted Total248219281285256181

mdBook (248): Strong on Rust-native build, doctest integration (critical for vox-doc-pipeline), zero migration cost, and offline builds. Weakest on SSOT grammar, versioning, and interactive components. No Shiki path without building a custom preprocessor that shells to Node.js — which reintroduces the Node dependency and breaks the “single Rust binary” CI story.

Zola (219): Also Rust-native. Uses Giallo, its own TextMate grammar engine (Rust-based, VS Code-compatible). Can load a custom vox.tmLanguage.json via extra_grammars. However it’s a general SSG, not a documentation tool — it lacks mdBook-style doctest integration entirely, has no SUMMARY.md equivalent without significant theme work, and has lower momentum in the technical docs space. Zola is a good option if you’re building a landing/marketing site, less so for API references.

VitePress (281): Shiki is built-in; loading vox.tmLanguage.json is a 3-line config change. First-class Markdown. Strong Vue ecosystem. Loses heavily on Rust-native build and doctest. The vox-vscode webview already ships React+Radix (not Vue), introducing a Vue dependency for docs creates a polyglot frontend footprint.

Starlight (285, highest): Shiki via Expressive Code (first-class). Framework- agnostic — works with React components, which aligns with the vox-vscode webview stack. Built-in Pagefind search (excellent for AI/RAG because it generates a static JSON index that can be ingested by vox-arca/Scientia). Strong i18n and versioning via plugins. Loses on Rust-native build and doctest. Growing faster than VitePress in 2026.

Docusaurus (256): Best versioning out-of-the-box. React-native. Heavier than VitePress. Shiki via @shikijs/rehype. High migration cost from mdBook.

MkDocs Material (181): Python runtime is antithetical to the Vox VoxScript-first philosophy (Python is a retired automation surface per AGENTS.md). Shiki only via post-processing. Lowest score overall.


4. The Hybrid Path: mdBook + Shiki Preprocessor

Section titled “4. The Hybrid Path: mdBook + Shiki Preprocessor”

Because mdBook is so deeply embedded in the CI pipeline (GitHub Actions, GitLab CI, vox-doc-pipeline doctests), a full platform migration has real costs.

The hybrid option: Build mdbook-shiki-vox — a thin Rust mdBook preprocessor that:

  1. Scans all chapter content for ```vox fenced blocks.
  2. Calls the Shiki Node.js CLI (or WASM bindings) at build time to produce pre-rendered <span> HTML.
  3. Replaces the fenced block with the rendered HTML fragment.

Pros:

  • Zero migration: all .md files, SUMMARY.md, and book.toml stay unchanged.
  • Shiki consumes vox.tmLanguage.json directly — SSOT achieved.
  • doctest (mdbook test) and vox-doc-pipeline remain fully functional.

Cons:

  • Reintroduces a Node.js build dependency for docs (even if thin).
  • No community-maintained mdbook-shiki exists as of April 2026 — we would own it.
  • The preprocessor API shells JSON through stdin/stdout — adds latency to mdbook build on large books.
  • Does not solve versioning, search quality improvements, or interactive component gaps.

This is a low-risk tactical fix that defers the platform migration question.


vox-doc-pipeline runs .vox doctests from Markdown files using mdbook test-compatible mechanics. This is cited in AGENTS.md as a mandatory quality gate: “All vox blocks in documentation must compile cleanly via vox-doc-pipeline’s dynamic doctest runner.”

No alternative platform replicates mdbook test semantics out of the box. Any migration plan must solve the doctest constraint before switching.

Options:

  • Build a standalone vox doctest-md subcommand that reads Markdown files directly and runs .vox blocks, decoupled from any specific SSG.
  • This is the prerequisite gate for any platform migration. It is also the right long-term architecture (SSG-agnostic doctests).

6.1 Immediate (No Migration): Eliminate Grammar Drift Now

Section titled “6.1 Immediate (No Migration): Eliminate Grammar Drift Now”

Regardless of which platform is chosen, the highlight-vox.js grammar drift must be fixed now. The path:

  1. Add mdbook-shiki-vox preprocessor (custom, ~200 lines of Rust) that replaces highlight.js with Shiki + vox.tmLanguage.json at build time.
  2. Delete docs/theme/highlight-vox.js and remove it from book.toml additional-js.
  3. The vox.tmLanguage.json in apps/editor/vox-vscode/syntaxes/ becomes the single source of truth for all five rendering contexts (VS Code, Cursor, Neovim, GitHub, and now docs portal).

6.2 Medium-Term (6–12 months): Migrate to Starlight

Section titled “6.2 Medium-Term (6–12 months): Migrate to Starlight”

Once vox doctest-md is a standalone subcommand (decoupling doctests from mdBook), migrate the documentation portal to Starlight (Astro):

  • Shiki via Expressive Code: load vox.tmLanguage.json as shiki.langs.
  • Pagefind search: static JSON index consumable by vox-arca/Scientia RAG.
  • React component support: aligns with vox-vscode webview stack (React 19).
  • llms.txt / llms-full.txt generation: Astro content collections make this trivial to automate.
  • Plugin ecosystem: starlight-versions, starlight-typedoc, starlight-links-validator cover the gaps mdBook cannot.
  • Do not migrate to MkDocs: Python is a retired automation surface.
  • Do not migrate to VitePress: Vue is a third frontend framework in the repo (alongside React in vox-vscode and Vox itself). Avoid.
  • Do not build a Zola-based docs site: Zola’s Giallo engine is excellent but it’s a general SSG — the documentation taxonomy work would be enormous and the doctest gap cannot be bridged.

7. AI-First Documentation Architecture (2026 Principles)

Section titled “7. AI-First Documentation Architecture (2026 Principles)”

Drawing from research on documentation for AI-native codebases:

  1. Markdown is the right storage format. Do not switch to AsciiDoc, RST, or proprietary XML. LLMs are pre-trained on GitHub Markdown at scale.
  2. Front matter is structured metadata. YAML front matter is how Markdown documents communicate machine-readable metadata without disrupting human readability. Continue enforcing it.
  3. llms.txt + llms-full.txt are increasingly mandatory. These give coding agents (Cursor, Copilot, Antigravity) a curated, token-efficient entry point into the docs corpus. Automate their generation from SUMMARY.md.
  4. Pagefind is superior to mdBook’s built-in search for RAG. Pagefind produces a static JSON index that can be ingested programmatically by vox-arca’s Scientia pipeline, enabling “ask the docs” without a hosted backend.
  5. Syntax highlighting fidelity matters for training data. When .md files are ingested into the MENS training corpus, Shiki-highlighted HTML code blocks carry token-scope metadata (via TextMate scope names) that improves the model’s ability to learn Vox syntax. highlight.js output is semantically impoverished by comparison.
  6. Interactive components are the next frontier. Starlight’s Islands architecture allows embedding live Vox REPLs and type-hover examples (Twoslash) without shipping JS to non-interactive pages.

The following documents were verified to exist before being linked:


PriorityActionEffortDependency
P0Build mdbook-shiki-vox preprocessor in Rust2–3 daysvox.tmLanguage.json (exists)
P0Remove highlight-vox.js / update book.toml1 hourAfter P0 above
P1Build vox doctest-md standalone subcommand1 weekUnlock platform migration
P1Add llms-full.txt auto-generation to vox-doc-pipeline1 dayNone
P2Evaluate Starlight migration with pilot (e.g., tutorials section)2 weeksP1
P2Integrate Pagefind into Starlight for Scientia RAG indexing3 daysP2 pilot
P3Full Starlight migration4–6 weeksP2 pilot success