Comprehensive Vox audit and improvement plan (April 2026, v2)
Comprehensive Vox audit and improvement plan (April 2026, v2)
Section titled “Comprehensive Vox audit and improvement plan (April 2026, v2)”Bert — full audit triangulating crates/, the 280 docs in docs/src/architecture/, ADRs 001–023, the CHANGELOG, parser/build artifacts, and the 43 golden examples. Goal: name the real gaps blocking Vox is the premier programming language for LLMs to write web apps and give a sequenced improvement list you can actually execute.
This v2 supersedes the prior file; it corrects two count errors (crates=63, ADRs=23) and notes the vox-doc-pipeline build break is fixed as of today.
Part I — Executive diagnosis
Section titled “Part I — Executive diagnosis”What is real and good
Section titled “What is real and good”- A real Rust compiler (lexer/parser/HIR/IR/typeck + dual Rust+TS codegen) in
crates/vox-compiler/. - A real ~17 kLOC orchestrator with A2A messaging, observability, daemon parity (ADR 022), trust telemetry.
- A real Populi mesh control plane with NVML probes and HTTP scope tenancy (ADR 008/017/018).
- A real native MENS training pipeline (Candle QLoRA, Qwen 3.5 4B on 16 GB) with deterministic corpus and OpenAI-compatible serve.
- A real RAG stack (Scientia/
vox-search) with BM25 + FTS5 + vector + RRF + CRAG + Socrates gating, plus the brand-new symbol-proximity split-brain detector. - Serious governance: 23 ADRs,
.voxignoreSSoT, Clavis secret resolution, TOESTUB detectors, completion-policy gates,vox ci ssot-drift. - A working
@table→ SQL → typed client → typed server collapse on the golden corpus. This is the moat. - 102 MCP tools + VS Code extension + OpenCode bridge + dashboard.
The five biggest structural problems
Section titled “The five biggest structural problems”- The parser still rejects top-level statements.
parse_top_level_itemaccepts onlyfn/import/type.RunMode::Scriptexists at the CLI but no implicitmain()wrapping or script grammar exists. AGENTS.md mandatesvox run scripts/foo.voxas the only project glue, yetexamples/actor.voxandexamples/mcp_tool.voxwere archived as “non-parseable on current grammar” andVOX_EXAMPLES_STRICT_PARSEis opt-in. You are simultaneously banning Python/PowerShell glue and shipping a parser that can’t run the replacement. - The web-UI pillar embeds React + JSX directly inside
.voxsource. Authors write<div className=…>,use_state,onClick. The “Path C / Vox-native reactivity” recommended inweb-architecture-analysis-2026.mdis documented but optional. This contaminates the MENS corpus with React idioms and undermines the K-complexity thesis. - LLM-alignment scaffolding is 60–70 % unshipped. GBNF export is ~30 lines (expressions only) with CVE-2026-2069 in downstream llama.cpp; XGrammar-2 not adopted.
correction_hintis populated in 1 of 31 HIR validation sites.@llmdecorator parses but emits no runtime dispatch. AST-token weighted loss is checkboxes-only. There is no public HumanEval-Vox or SWE-bench-Vox showing that Vox-generating models hallucinate less than TS/Python-generating models at matched compute. - Doc/code divergence is becoming a governance crisis. 280 architecture docs, 43 currently passing goldens, ~6 parallel SSoT generators (operations / MCP / CLI / capability / language-surface / completion-policy), retired surfaces (HTMX, Pico, TanStack virtual routes) still ghost-referenced in many pages, research → blueprint pipeline unclear. The doc tree is growing faster than the code tree is shipping.
- Scope is too wide for the evidence base. 63 workspace crates, 280+ CLI subcommands, 23 ADRs, telemetry subsystem, cryptography SSoT, gamification, speech-to-code, vision lane, plugins marketplace, MENS, mesh, RAG, publication pipeline, workflow runtime, package manager. There is no plausible world in which all of these are simultaneously stable.
feature-growth-boundaries.mdandvox-bell-curve-strategy.mdare trying to restore discipline; the horse left the barn months ago.
Single-sentence verdict
Section titled “Single-sentence verdict”Vox has the best architectural governance of any v0.5 language I’ve seen, the only real compiler+training+orchestrator stack at this tier, and zero empirical evidence for its headline LLM-native claim — while being too sprawled to ship that evidence at current scope.
Part V — Reshaped plan, given the answers
Section titled “Part V — Reshaped plan, given the answers”Bert answered on 2026-04-18:
| Question | Answer |
|---|---|
| Audience | Solo devs shipping web apps |
| Web stack | React-embedded forever (status quo) |
| Team | Solo + AI agents |
| v1.0 gate | Real production users (N external apps deployed) |
This is a coherent posture. The plan collapses significantly.
What those four answers together mean
Section titled “What those four answers together mean”- “Real production users” + “solo + AI agents” + “solo devs shipping web apps” means the top-line KPI is now time-from-
vox newto a public URL for a solo dev. Everything that doesn’t move that number is secondary. - Choosing “React-embedded forever” kills a dozen items that presuppose a Vox-native component model. No Vox-native reactivity DSL. No multi-backend codegen. Web-stack work is narrow: make @island + TanStack Start boring and reliable, and stop the raw JSX leakage into
.voxsource. - “Solo + AI agents” means the 63-crate → 10-crate compression (item 93) is no longer a suggestion, it’s the governing constraint. Every non-core crate needs a dated freeze/retire decision.
- “v1.0 = production users” means the runtime, deploy, and support story outrank everything else — including MENS and Scientia — because a user’s app going down on Saturday is the only thing that matters at that point.
Items promoted (now P0 or P1 regardless of original tag)
Section titled “Items promoted (now P0 or P1 regardless of original tag)”- 1, 2, 3, 4 — parser statement-position regression,
vox-doc-pipelinestale artifact, examples v0.5.0 migration, golden corpus. Still first. Nothing matters without a parser. - 28 —
vox new webscaffold →vox deploy→ public URL in one command. Now the single most important feature in the repo. Promote from P0 to “this is the product.” - 33 — TanStack Start integration hardening. Promoted to P0.
- 16 — reduce raw JSX leakage from
.voxsource into compiler-owned surface. Promoted to P1 (was P2). Even though we’re React-embedded forever, LLM-authoring suffers when components are 50% JSX strings. - 84, 85, 86 — one marquee app in production, public deploy template, written case study. Promoted to P0/P1. These are the v1.0 gate.
- 93, 94 — 10-crate freeze + public v1.0 criteria. Promoted to Week 1.
- 49, 50, 51 — runtime crash recovery, supervisor trees, graceful degradation. Promoted to P1 (from P2). “Production users” means a running app that doesn’t die.
- 62, 63, 64 — observability basics: structured logs, error reporting, uptime ping. Promoted to P1. You cannot support users blind.
- 40 —
vox fmt+ linter stability. P1. Required for AI authoring to be clean.
Items demoted or shelved
Section titled “Items demoted or shelved”- 15 — Vox-native reactivity primitives. Shelved. User explicitly chose status quo.
- 23, 26, 35 — Solid / Svelte / Next adapters. Shelved indefinitely (multi-backend was the rejected option).
- 17 (Vox-native component DSL R&D) — Shelved.
- 18, 19, 20 — GBNF/XGrammar-2, AST-token alignment, correction-hint expansion. Demoted to P2. Still valuable but not on the solo-dev-to-production critical path. These become compelling after someone is shipping an app.
- 30, 31 — HumanEval-Vox, SWE-bench-Vox. Demoted to P2. Labs aren’t the audience.
- 60, 61 — MENS training pipeline hardening. Demoted to P2. MENS is a research pillar but no solo dev cares that their Vox code is training data on day one.
- 41 — Scientia/RAG hardening beyond current state. Demoted to P2.
- 77, 78 — SSoT generator consolidation. Demoted to P2 (was P1). Still needed for contributor sanity, but 1 contributor ≈ you, so the pain is smaller than the audit assumed.
- 96, 97 — crate ledger, resurrecting retired orchestrator codenames. P1 → P2; the freeze decision (93) supersedes the ledger.
Items deleted from scope entirely
Section titled “Items deleted from scope entirely”- Foundation governance (92), weekly status post (90), OSS onboarding funnel (91), Linguist submission (88), public MCP endpoint (89) — these were community-scale moves premised on either OSS contributors or labs as audience. With solo-dev-as-audience, they become post-v1.0.
- Everything downstream of a Foundation or funded-team shape.
The reshaped week-by-week
Section titled “The reshaped week-by-week”- Week 1 — bleed stops, scope honest. 1, 2, 3, 4, 93, 94, 69 (archive stale research). Publish a
crates/_frozen.mdlist. Publish three-criterion v1.0 gate: (a) 3 external solo-dev apps in production for 30 days, (b)vox new web→ public URL in under 10 minutes, (c) zero P0 open onvox-compiler/vox-actor-runtime. - Weeks 2–3 — the product is
vox new web. 28, 33, 40. One command from empty dir to a deployed TanStack Start app on a free tier (Fly.io / Vercel / Cloudflare). Golden path in docs. Record the video. - Weeks 4–6 — the first real user. 84, 85, 86. Find one solo dev externally (could be a friend, could be you in a second persona) and walk them through shipping a real app. File every paper-cut as an issue. Fix it or freeze it within the week.
- Months 2–3 — keep the app up. 49, 50, 51, 62, 63, 64. Runtime recovery, observability, error reporting. Users on production = support channel.
- Months 4–6 — second and third users. Repeat the loop. Count: production apps deployed, days-without-runtime-incident, time-from-clone-to-URL. Publish the number weekly to yourself (private dashboard, not public).
- After the third production user, then and only then — reopen the P2 shelf: grammar-constrained decoding, HumanEval-Vox, MENS hardening, SSoT consolidation. The research pillars become v1.x work.
The one-sentence product posture
Section titled “The one-sentence product posture”Vox v1.0 is the fastest way for a single human-plus-AI team to ship and operate a real web app, and we will not add any feature that does not measurably shorten that loop.
Everything in the audit is still valid as a long-term backlog. But the active working set, given these four answers, is ~25 items, not 100.