Deep research (working definition for Vox) is an orchestrated pipeline that:
- Plans — decomposes a user topic into sub-queries (and optionally iterative refinements).
- Retrieves — searches the open web and/or local corpora with policy-gated backends (SearXNG → DuckDuckGo → optional Tavily; optional HTML extraction via
web-scrape).
- Iterates — when evidence is weak, expands queries (CRAG-style) up to a bounded hop count.
- Grounds — extracts claims, optionally verifies them against sources (when wired).
- Synthesizes — produces a cited answer and structured metadata (routing tier, diagnostics, judge score).
Optional dimensions aligned with commercial products: human checkpoints, async/long-running jobs, and mesh-durable execution. For Vox, mesh-durable execution is a forward hook only: @durable / workflow / activity are parsed and lowered per AGENTS.md §Grammar Unification, but durable replay/cron semantics are not production-complete — see durability-runtime-audit-2026.md and ADR-028 proposal.
Strategic anchor: The SCIENTIA self-publication program targets longitudinal provider observability and publication-quality outputs (scientia-self-publication-finalization-plan-2026.md). The deep-research pipeline is the substrate that can feed evidence bundles into that loop when paired with scientia-mesh-integration-research-2026.md signal families (DiscoverySignalFamily, FindingCandidateClass).
Non-duplication: Tavily endpoint shapes, secrets lifecycle, pricing, fail-open rules, and Firecrawl comparison live in docs/src/reference/tavily-integration-ssot.md. This document links there instead of copying tables.
Voice transcription often yields “Claw” without specifying the product. Three distinct references appear in industry and this repo:
| Row | What it is | Typical UX | Vox relevance |
|---|
| A. OpenClaw Deep Research Agent | Skill/agent pattern in the OpenClaw ecosystem (multi-round web search, structured report, configurable iterations). | Async-ish batch runs (minutes), markdown/HTML output. | Closest analog: vox-search::crag::CragRouter + WebSearchDispatcher — shipped in vox-search, must be driven from orchestrator research (§6). |
| B. SearchClaw / ScienceClaw / ClawHub “Academic Deep Research” | Research harnesses with explicit quality gates (citation counts, source diversity), many literature APIs, checkpointed workflows. | Long runs, academic citation styles. | Partial: diagnostics exist in run_research (RetrievalDiagnostics); not built: minimum citation diversity enforcement, APA tooling, 77-database integrations. |
| C. Anthropic Claude Research mode | Hosted Claude capability for web research with inline citations (consumer + API surfaces). | Sync/async report with citations. | Not orchestrated by Vox: we may call Anthropic as an LLM backend for synthesis/judge (stages.rs) but do not invoke Claude’s hosted “Research” product as a black box. |
OpenClaw docs refresh for operators already exists on the CLI path (db_research/refresh.rs — OPENCLAW_REFRESH_URLS).
Columns: triggering UX, planning, retrieval tools, memory/session, citations, cost/latency, access, limitations, Vox analog (file or verdict).
| Dimension | Notes |
|---|
| UX | Consumer app + API (Interactions / agent surfaces); “Max” variant emphasizes higher search/token budgets and async completion. |
| Planning | Iterative plan → search/read → gap fill → report. |
| Tools | Web search/browse; MCP connectors; Workspace connectors in consumer SKU. |
| Memory | Session-bound; export report artifacts. |
| Citations | Report-style citations (implementation details are vendor-side). |
| Cost/latency | High token + many search steps on “Max”; vendor-metered. |
| Access | Google AI / Gemini API; Google account / Cloud billing. |
| Limits | Vendor lock-in; enterprise data residency policies; eval claims require primary citations. |
| Vox analog | planner.rs::decompose_query_with_config — STUB (passthrough). Retrieval: web_gather.rs — now delegates to vox-search web tier (Phase 1 shipped in this workstream). |
Primary sources (appendix §10).
| Dimension | Notes |
|---|
| UX | Skill/config driven; multi-round search (often ~5), cross-source validation narrative. |
| Planning | Prompt/scaffolding defines rounds and output shape. |
| Tools | Gateway-discovered tools + HTTP skills; web search depends on deployment. |
| Memory | Skill/session dependent. |
| Citations | Markdown reports with links. |
| Cost/latency | Token-heavy; operator-hosted gateway. |
| Access | OpenClaw gateway + skills marketplace/docs. |
| Limits | Ecosystem fragmentation; skill quality varies by publisher. |
| Vox analog | CragRouter + policy web_search_max_hops — reuse from orchestrator after initial web gather (Phase 2 in this workstream). |
| Dimension | Notes |
|---|
| UX | Benchmark-oriented harnesses (e.g. BrowseComp claims for SearchClaw); academic checkpoints. |
| Planning | Decomposition + structured evidence trails. |
| Tools | Many APIs (Semantic Scholar, arXiv, news, …). |
| Memory | Persistent harness state across sessions (paper narrative). |
| Citations | Minimum counts / diversity constraints (SearchClaw “harness engineering”). |
| Cost/latency | API-rate-limit sensitive. |
| Access | GitHub / skill hubs. |
| Limits | Ops burden to keep API keys and rate limits healthy. |
| Vox analog | Not built as a dedicated harness; closest telemetry is RetrievalDiagnostics + future citation-diversity gate (Phase 2 backlog). |
| Dimension | Notes |
|---|
| UX | Hosted research reports from Claude apps/API. |
| Planning | Closed-source agent loop. |
| Tools | Web search / browsing (vendor-side). |
| Citations | Inline citations in output. |
| Limits | Not portable across providers; policy constraints. |
| Vox analog | Not integrated — Vox keeps retrieval in vox-search and uses LLM endpoints for synthesis/judge only. |
All endpoint and pricing tables: tavily-integration-ssot.md.
| Endpoint | In Vox today |
|---|
/search | Yes — TavilySearchClient::search via WebSearchDispatcher Tier 4 when policy enables and prior tiers empty. |
/extract | Not wired in orchestrator research (future: weak-snippet uplift). |
/research | Not wired (would collapse multi-hop into one vendor call; evaluate cost/benefit vs native CRAG). |
/crawl | Not wired into research pipeline (doc ingestion uses other paths). |
| Product | Role | Vox stance |
|---|
| Perplexity Pro / ChatGPT Deep Research / You.com | Closed UX + vendor search stacks | Benchmark UX only; no dependency for core pipeline. |
| Exa / Bright Data SERP | Alternative search/extract vendors | Policy comparison only; Tavily SSOT already notes SERP patterns. |
Canonical retrieval policy and corpus matrix: search-retrieval-ssot-2026.md.
| Source | In repo today | Slot | Secrets / env |
|---|
| SearXNG | Tier 2 in web_dispatcher.rs; sidecar via vox research up (research/infra.rs) | Primary self-hosted web tier | VOX_SEARCH_SEARXNG_URL etc. via SearchPolicy::from_env |
| DuckDuckGo | Tier 3 fallback | Free fallback when SearXNG empty/fails | Policy toggle duckduckgo_fallback_enabled |
| Tavily | Tier 4 when configured | Low-friction ranked snippets | TavilyApiKey + VOX_SEARCH_TAVILY_* — see Tavily SSOT |
| Wikipedia / Wikidata | Not wired | Tier 1.5 high-trust factual blurbs | Future: register read-only HTTP (likely no secret); add env registry row in contracts/config/env-vars.v1.yaml if introducing VOX_SEARCH_WIKI_* toggles |
| arXiv API | Not wired | STEM literature slice | Future SecretId only if using authenticated tier |
| Crossref REST | Not wired | DOI metadata | Polite pool + optional mailto — register env var if adding |
| Semantic Scholar Graph API | Not wired | Citation expansion | Plan mentions SecretId::SemanticScholarApiKey — not implemented |
| Internet Archive Wayback | Not wired | Dead-link recovery | Respect IA terms; throttle |
| Module | Role today | Replacement / delegation target |
|---|
planner.rs | Passthrough single subquery | Future: LLM/Mens decomposition — not replaced in this workstream |
provider.rs | Empty search/map_site | Future: unify with vox-search providers / mesh ProviderObservation — not replaced here |
web_gather.rs | Was empty | WebSearchDispatcher::search + CRAG refinements (web_dispatcher.rs, crag.rs) — implemented |
claims.rs | Empty claims | Future vox-claim-extractor per module header — stub |
verifier.rs | Empty verdicts | Future verifier wiring — stub |
model_select.rs | Static fallbacks | Future registry merge — stub |
pipeline_cache.rs | Always miss | Future list_memories_by_type — stub |
pipeline.rs | Session id 0, persistence comments | Future vox-db methods — stub |
| Capability | Status |
|---|
MCP vox_memory_search | Shipped — handlers_memory.rs |
MCP vox_research_run | Shipped (this workstream) |
CLI vox research run | Shipped (this workstream) |
CLI vox research eval | Shipped — eval.rs; golden queries extended |
run_search_with_verification performs a full corpus retrieval pass (memory, chunks, repo, web…) and requires a SearchRuntimeContext. The orchestrator web_gather path intentionally uses WebSearchDispatcher for bounded web retrieval without requiring DB/memory paths on every research caller. A future bridge can attach SearchRuntimeContext from MCP ServerState when research is invoked server-side.
- Implement
gather_web_hits_for_plan using SearchPolicy::from_env() + WebSearchDispatcher::search.
- Respect
ResearchScope::Local (skip web) and ResearchQuery::site_scope (post-filter host).
- Map
HybridSearchHit → ResearchHit.
- After initial subqueries, while hops remain, call
CragRouter::expand_queries_from_partial_evidence / should_continue against average score vs target 0.75, capped by policy.web_search_max_hops.
vox research run <query> [--json] [--scope ...]
vox_research_run MCP tool returning JSON ResearchResult.
- Operations catalog + MCP registry rows regenerated via
vox ci operations-sync --target cli --write.
| Risk | Mitigation |
|---|
| Web ToS / robots | Honor scraper_robots_txt_respect; prefer APIs for Wikipedia/arXiv when added |
| Tavily spend | Session budget + fail-open behavior — Tavily SSOT |
| Secret leakage | Never std::env::var("TAVILY_API_KEY") in consumers — secrets policy (.cursor/rules/secrets-policy.mdc) |
| Prompt injection from pages | Treat snippets as untrusted; truncate per policy |
| Non-deterministic CI | Smoke tests allow empty web hits offline; live test #[ignore] |
| Command | Purpose |
|---|
cargo test -p vox-orchestrator | Unit + integration smoke |
cargo test -p vox-search | Retrieval regression |
vox research run "..." --json | Manual end-to-end (needs network / keys) |
vox research eval | Harness writes metrics rows — extend golden queries in eval.rs |
vox ci command-compliance / vox ci operations-verify | Contract hygiene after catalog edits |
Captured 2026-05-11 (verify periodically; vendor URLs drift).
| Topic | URL |
|---|
| Gemini Deep Research (developers blog) | https://blog.google/innovation-and-ai/technology/developers-tools/deep-research-agent-gemini-api/ |
| Gemini Deep Research Max announcement | https://blog.google/innovation-and-ai/models-and-research/gemini-models/next-generation-gemini-deep-research/ |
| Gemini consumer overview | https://gemini.google/overview/deep-research/ |
| OpenClaw docs (gateway — refresh list in CLI) | https://openclawlab.com/en/docs/gateway/protocol/ |
| OpenClaw Deep Research skill (tutorial mirror) | https://openclaw.com/en/skills/deepresearchagent.html |
| SearchClaw repository | https://github.com/RUC-NLPIR/SearchClaw |
| ScienceClaw repository | https://github.com/Zaoqu-Liu/ScienceClaw |
| Anthropic news index (search “Research”) | https://www.anthropic.com/news |
| Tavily docs | https://docs.tavily.com/ |
| SearXNG project | https://github.com/searxng/searxng |
| DuckDuckGo | https://duckduckgo.com |
| Wikipedia API | https://www.mediawiki.org/wiki/API:Main_page |
| Wikidata API | https://www.wikidata.org/wiki/Wikidata:Data_access |
| arXiv API | https://info.arxiv.org/help/api/index.html |
| Crossref REST API | https://github.com/CrossRef/rest-api-doc |
| Semantic Scholar API | https://api.semanticscholar.org/ |
| Internet Archive Wayback | https://archive.org/help/wayback_api.php |