Compiler Architecture
Compiler Architecture
Section titled “Compiler Architecture”The Vox compiler follows a modular pipeline architecture with conceptual stages. The current implementation is consolidated under crates/vox-compiler/src/, where each stage is represented by explicit modules.
Current implementation note: the practical pipeline is currently consolidated under crates/vox-compiler/src/ for lexer, parser, AST, HIR, typecheck, and emitters. This document keeps conceptual stage boundaries while implementation modules may live in one crate.
Pipeline Overview
Section titled “Pipeline Overview”Source Code (.vox) │ ▼┌────────────────┐│ Lexer │ Tokenization (logos)└──────┬─────────┘ │ Vec<Token> ▼┌────────────────┐│ Parser │ Recursive descent parser → AST Module└──────┬─────────┘ │ Module (AST root) ▼┌────────────────┐│ AST │ Strongly-typed AST wrappers└──────┬─────────┘ │ Module (Decl, Expr, Stmt, Pattern) ▼┌────────────────┐│ HIR │ Desugaring + name resolution + dead code detection└──────┬─────────┘ │ HirModule ▼┌────────────────┐│ Typeck │ Bidirectional type checking + HM inference└──────┬─────────┘ │ Typed HIR + Vec<Diagnostic> ▼┌────────────────┐│ Web IR │ HIR→WebIR lower + validate└──────┬─────────┘ │ WebIrModule ▼┌────────────────┐│ App Contract │ HIR→AppContract (HTTP/RPC/server config)└──────┬─────────┘ │ AppContractModule ▼┌────────────────┐│ Runtime Proj │ HIR→RuntimeProjection (DB/task capability hints)└──────┬─────────┘ │ RuntimeProjectionModule ▼┌──────────────────┬─────────────────────┐│ vox-codegen-rust │ vox-codegen-ts ││ (quote! → .rs) │ (string → .ts/tsx) │└──────────────────┴─────────────────────┘Current path note:
codegen_tsis still the production TS emitter path.VOX_WEBIR_VALIDATEdefaults on (WebIR lower/validate gate); set=0/false/no/offto skip.app_contract::project_app_contractis the SSOT for route/RPC/server-config codegen inputs (viaprojection_bundlein emit paths).runtime_projection::project_runtime_from_hiris the SSOT for orchestration-facing DB capability projection (also bundled).- Reactive
view:uses the Web IR TSX bridge when validation is clean;VOX_WEBIR_EMIT_REACTIVE_VIEWSwas removed — there is no legacy-only emit path (seereactive.rs).
ML Training Pipeline
Section titled “ML Training Pipeline”Vox has a native ML training loop powered by Burn (a pure-Rust deep learning framework):
docs/src/*.md + examples/*.vox │ ▼vox mens corpus extract # produces validated.jsonl │ ▼vox mens corpus pairs # produces train.jsonl (instruction-response pairs) │ ▼vox mens train # native Burn / HF path (default CLI features) │ ▼mens/runs/v1/model_final.binThe training loop is defined in crates/vox-cli/src/training/native.rs.
Stage Details
Section titled “Stage Details”1. Lexer (vox-compiler::lexer)
Section titled “1. Lexer (vox-compiler::lexer)”Purpose: Converts source text into a flat stream of tokens.
Implementation: Uses the logos crate for high-performance, zero-copy tokenization.
Output: Vec<Token> — each token carries its kind and span.
2. Parser (vox-compiler::parser)
Section titled “2. Parser (vox-compiler::parser)”Purpose: Transforms a token stream into an AST module.
Implementation: A hand-written recursive descent parser producing ast::decl::Module. The parser is resilient to errors, meaning it continues parsing after encountering invalid syntax — this is critical for LSP support, where the user is actively typing.
Key features:
- Error recovery with synchronization points
- Trailing comma support in parameter lists
- Duplicate parameter name detection
- Indentation-aware formatting (
indent.rs)
See crates/vox-compiler/src/parser/descent/mod.rs for the implementation entrypoint.
Output: Module (AST root) with source spans on declarations and expressions.
3. AST (vox-compiler::ast)
Section titled “3. AST (vox-compiler::ast)”Purpose: Strongly-typed wrappers around the untyped CST nodes.
See crates/vox-compiler/src/ast/ for the node hierarchy.
6. Code Generation
Section titled “6. Code Generation”Rust Codegen (vox-compiler::codegen_rust)
Section titled “Rust Codegen (vox-compiler::codegen_rust)”Emits Rust source using the quote! macro. Each decorator maps to specific Rust constructs:
| Vox | Generated Rust |
|---|---|
@endpoint fn | Axum handler + route registration |
@table type | Struct + SQLite schema |
@test fn | #[test] function |
@deprecated | #[deprecated] attribute |
actor | Tokio task + mpsc mailbox |
workflow | Plain async function today; interpreted runtime provides partial durable step recording |
TypeScript Codegen (vox-compiler::codegen_ts)
Section titled “TypeScript Codegen (vox-compiler::codegen_ts)”Emits TypeScript/TSX in modular files:
| Module | Output |
|---|---|
jsx.rs | React JSX components |
component.rs | Component declarations and hooks |
activity.rs | Activity/workflow client wrappers |
emitter.rs | TanStack Router trees, optional server fns, islands metadata |
adt.rs | TypeScript discriminated union types |
Normative strategy for reducing frontend emitter complexity while preserving React interop: ADR 012 — Internal web IR strategy. Detailed implementation sequencing and weighted task quotas: Internal Web IR implementation blueprint. Ordered file-by-file execution map: WebIR operations catalog. Canonical current-vs-target representation mapping: Internal Web IR side-by-side schema. Quantified K-complexity delta for the canonical worked app: WebIR K-complexity quantification. Reproducible per-token-class computation: WebIR K-metric appendix.
Supporting Crates
Section titled “Supporting Crates”| Crate | Purpose |
|---|---|
vox-cli | vox command-line entry point — see ref-cli.md for the implemented subcommand set |
vox-lsp | Language Server Protocol implementation |
vox-actor-runtime | Tokio/Axum runtime: actors, scheduler, subscriptions, storage |
vox-package | Package manager: CAS store, dependency resolution, caching |
vox-db | Database abstraction layer |
vox-gamify | Gamification system |
vox-orchestrator | Multi-agent orchestration |
vox-code-audit | AI anti-pattern detector |
vox-tensor | Native ML tensors via Burn 0.19 (Wgpu/NdArray backends) |
vox-eval | Automated evaluation of training data quality |
vox-doc-pipeline | Rust-native doc extraction + SUMMARY.md generation |
vox-integration-tests | End-to-end pipeline tests |
Adding a Language Feature
Section titled “Adding a Language Feature”The full checklist for adding a new language construct:
- Lexer — Add tokens to
crates/vox-compiler/src/lexer/token.rs - Parser — Add grammar rules in
crates/vox-compiler/src/parser/descent/ - AST — Add node types in
crates/vox-compiler/src/ast/ - HIR — Map AST → HIR in
crates/vox-compiler/src/hir/lower/ - Type Check — Add inference rules in
crates/vox-compiler/src/typeck/ - WebIR — Add/update lowering + validation semantics in
crates/vox-codegen/src/web_ir/when the feature affects web-facing behavior - Codegen — Emit code in both
crates/vox-compiler/src/codegen_rust/andcrates/vox-codegen/src/codegen_ts/ - Test — Add integration coverage in
vox-integration-tests/tests/and WebIR/parity coverage where applicable - Docs — Add frontmatter + code example in
docs/src/ - Training — Run
vox mens corpus extractto include the new construct in ML data
Next Steps
Section titled “Next Steps”- Language Reference — Full syntax and feature reference
- Actors & Workflows — Workflow durability and actor persistence
- Ecosystem & Tooling — CLI commands, package manager, LSP
- Web IR operations catalog — numbered compiler/emitter tasks OP-0001–OP-0320 + supplemental OP-S049–OP-S220 batch map
- Web IR acceptance gates G1–G6 — parser, K-metric, parity, and rollout thresholds