Your codebase,
40–70× smaller
for AI agents
NeuralMind turns any repository into a queryable neural index. AI coding agents answer code questions in ~800 tokens instead of loading 50,000+ tokens of raw source.
LLMs are flying blind
on large codebases
Without NeuralMind, every code question forces an AI agent to load raw source files — burning tokens and budget on irrelevant context.
Without NeuralMind
Raw file loading on every query
With NeuralMind
Smart semantic context retrieval
Two ways to know
you need this
Start with what's annoying you (symptoms), or start with what you're trying to achieve (goals). Every row maps to the specific command that fixes it.
Symptoms — "This is happening to me"
Goals — "What am I solving for?"
Built for developers
paying per token
If one of these fits, NeuralMind was built for you.
Claude Code user
Watching your token bill climb. PostToolUse hooks compress every Read/Bash/Grep on top of ~60× query savings.
Cursor user
Want semantic retrieval outside Cursor too. CLI + MCP work in any agent with the same index.
Cline / Continue user
No built-in codebase index. Drop-in MCP tools — query, skeleton, wakeup.
GPT / Gemini / local models
Pipe wakeup or query output into any chat — model-agnostic, works anywhere.
Solo dev, growing monorepo
Incremental rebuilds + learning that adapts to your real query patterns.
Team lead tracking LLM spend
Measurable per-query reduction with neuralmind benchmark.
Security-conscious / regulated
100% local, offline, no outbound calls. Nothing leaves the machine — ever.
Researcher / hobbyist
Open-source reference implementation of two-phase token optimization. MIT.
4-layer progressive
disclosure
NeuralMind loads only what's relevant to each query. Static orientation layers always load; dynamic layers respond to your specific question.
Identity — Always Loaded
Project name, description, graph size, entry points, main patterns
Architecture Summary — Always Loaded
Module overview, key components, dependencies, data flow, top clusters
Relevant Modules — Query-Specific
Code clusters most semantically similar to your question via community detection
Semantic Search — Query-Specific
Direct vector similarity hits, reranked by learned cooccurrence patterns
Cut tokens at the
source and the output
Most tools optimize only retrieval. NeuralMind compresses both what agents fetch and what they consume from tool outputs.
What to fetch
What agents see
See exactly what
agents receive
Every response includes a token footer showing real-time savings. No guesswork — you always know the efficiency of context.
Automatic on session start
Run neuralmind wakeup . once. The agent orients itself without reading a single source file.
Query-aware context
Different questions get different context. Asking about auth returns auth clusters. Asking about payments returns payment logic.
Gets smarter over time
The cooccurrence reranker learns which modules appear together in your queries and boosts their relevance automatically.
What this means for
your API bill
Based on 100 queries/day. NeuralMind runs entirely offline — no additional API costs beyond your model provider.
| Model | Without NeuralMind | With NeuralMind | Monthly Savings |
|---|---|---|---|
| Claude 3.5 Sonnet | $450 / mo | $7 / mo | $443 saved |
| GPT-4o | $750 / mo | $12 / mo | $738 saved |
| Claude Opus | $2,250 / mo | $36 / mo | $2,214 saved |
| GPT-4.5 | $11,250 / mo | $180 / mo | $11,070 saved |
Every claim measured
in CI on every PR
NeuralMind benchmarks itself automatically — a real fixture codebase, real tokenizer calls (tiktoken), real retrieval. The numbers below are never hardcoded. CI fails if reduction drops below the floor.
Community benchmarks — real repos, zero telemetry
Your code never leaves your machine. You submit only the numbers.
| Project | Language | Nodes | Avg query tokens | Reduction | Model |
|---|---|---|---|---|---|
| cmmc20 | JavaScript | 241 | 739 | 65.6× | Claude 3.5 Sonnet |
| mempalace | Python | 1,626 | 891 | 46.0× | Claude 3.5 Sonnet |
Works directly in
Claude Desktop & Cursor
Native Model Context Protocol server. Call NeuralMind tools directly from your AI agent session — no wrappers, no middleware.
neuralmind_wakeup
Session-start orientation. Returns project context in ~365–600 tokens without reading any source files.
~400 tokensneuralmind_query
Answer any code question. Returns L0–L3 structured context with token count and reduction ratio.
~800–1100 tokensneuralmind_skeleton
Explore a file's functions, call graph, and cross-file dependencies without loading full source.
5–15× cheaperneuralmind_search
Semantic entity search. Finds functions, classes, and routes by concept — ranked by similarity.
ranked resultsneuralmind_build
Incremental index update. Only re-embeds changed nodes — fast after small code changes.
incrementalneuralmind_benchmark
Measure per-query token counts and reduction ratios on your actual codebase.
metricsStep-by-step walkthroughs
for your workflow
Command-driven guides matched to how people actually use NeuralMind. Copy, run, done.
Claude Code user
Full two-phase setup, daily workflow, and a before/after table for every tool call.
Read walkthrough →Cost optimization
Baseline → measure → stakeholder-ready savings report. Built for tracking LLM spend.
Read walkthrough →Any LLM (ChatGPT / Gemini / local)
Copy-paste and CLI-piped context for non-MCP chats and mixed-model workflows.
Read walkthrough →Offline / regulated work
Air-gapped install, compliance properties table, audit trail — nothing leaves the box.
Read walkthrough →Growing monorepo
Three freshness strategies + large-repo tuning for codebases that change daily.
Read walkthrough →NeuralMind vs.
Heuristic-only retrieval
Both approaches reduce context. The tradeoff is retrieval quality vs. zero dependencies. NeuralMind runs fully offline — no API calls, no cloud services, no data leaves your machine.
| Feature | Heuristic-only | 🧠 NeuralMind |
|---|---|---|
| Token reduction | ~33× (97% fewer tokens) | 40–70× |
| Retrieval accuracy | 70–80% top-5 | Higher (semantic) |
| External dependencies | ✓ None | ChromaDB (local) |
| Runs offline | ✓ Yes | ✓ Yes |
| Learns from usage | ✗ No | ✓ Cooccurrence reranking |
| MCP server | ✗ No | ✓ Native |
| PostToolUse compression | ✗ No | ✓ Phase 2 hooks |
| File skeleton view | ✗ No | ✓ Call graph + deps |
NeuralMind vs. everything else
Honest one-liners on every tool developers evaluate alongside NeuralMind. Each links to a full comparison page with feature matrix and trade-offs.
vs. Cursor @codebase
Works only in Cursor. NeuralMind works in any agent and adds tool-output compression on top.
Read full comparison →vs. GitHub Copilot
Copilot is hosted completions tied to your GitHub account. NeuralMind is local context for any agent, any model.
Read full comparison →vs. Aider repo-map
Aider's repo-map is syntactic only (tree-sitter + PageRank). NeuralMind adds semantic retrieval and compression.
Read full comparison →vs. Sourcegraph Cody
Cody is server-hosted and org-wide. NeuralMind is local and per-project — different scale, different deployment model.
Read full comparison →vs. Continue / Cline
Those are agent runtimes. NeuralMind is the context and compression layer underneath — they compose.
Read full comparison →vs. Windsurf / Codeium
Vertically integrated IDE with server-side indexing. NeuralMind is editor- and model-agnostic, fully local.
Read full comparison →vs. Claude Projects
Projects reloads all attached files every turn. NeuralMind retrieves only what the query needs — ~800 tokens vs tens of thousands.
Read full comparison →vs. Prompt caching
Caching amortizes a big prompt. NeuralMind makes the prompt small in the first place — combine both for the cheapest workload.
Read full comparison →vs. LangChain / LlamaIndex
Frameworks you assemble yourself. NeuralMind is the assembled, opinionated default for code agents.
Read full comparison →vs. Long context (1M / 2M)
Possible ≠ cheap. A 50K-token repo at Claude Sonnet rates costs $0.15 every turn; NeuralMind drops that to $0.002.
Read full comparison →vs. Generic RAG
Text chunking loses structure. NeuralMind keeps the call graph, clusters, and cross-file edges intact.
Read full comparison →vs. tree-sitter / ctags / grep
Deterministic but syntactic. Use alongside NeuralMind for exact-name lookups — not instead of it for natural-language questions.
Read full comparison →Honest answers
to common questions
The questions we get asked most. Click any question to expand.
How much does NeuralMind actually reduce Claude / GPT token costs?
Measured on real repos: 40–70× reduction per query. For a team running 100 queries/day on Claude Sonnet, that is roughly $450/month → $7/month. With PostToolUse hooks layered on top, combined savings are typically 5–10× beyond the query reduction. Exact numbers depend on codebase size and model pricing — run neuralmind benchmark . --json on your project for a concrete figure.
Does NeuralMind work outside Claude Code?
Yes. The CLI runs anywhere Python runs. The MCP server integrates with Cursor, Cline, Continue, Claude Desktop, and any MCP-compatible agent. For non-MCP tools like ChatGPT or Gemini, pipe neuralmind wakeup . | pbcopy into a regular chat window.
Only the PostToolUse compression hooks are Claude-Code-specific — everything else is model- and agent-agnostic.
Does my code leave my machine?
No. NeuralMind is fully offline — no API calls, no cloud services, no telemetry. Embeddings run locally via ChromaDB, the knowledge graph is stored in graphify-out/ inside your project, and query memory (optional, opt-in) is written to .neuralmind/ on disk.
Is this just RAG? How is it different from LangChain or LlamaIndex?
It is a form of RAG, but specialized for code. Instead of chunking text, NeuralMind retrieves over a knowledge graph of code entities (functions, classes, clusters) with a fixed 4-layer structure. The call graph stays intact, and the output is a token-budgeted context — not a flat list of chunks.
See the full vs. LangChain/LlamaIndex comparison.
I have a 1M context window now — do I still need this?
Long context makes it possible to stuff a whole repo in; it does not make it cheap. A 50K-token repo at Claude Sonnet rates costs ~$0.15 every turn. NeuralMind drops that to ~$0.002.
See vs. long context for the full math.
What languages does it support?
Any language graphify supports (Python, JavaScript/TypeScript, and others via tree-sitter). NeuralMind consumes graphify-out/graph.json — if graphify can index it, NeuralMind can query it.
What is the difference between wakeup, query, and skeleton?
wakeup — ~400 tokens of project orientation (L0 + L1). Run it at session start.
query — ~800–1,100 tokens for a specific natural-language question (L0–L3).
skeleton — compact view of a single file (functions + call graph + cross-file edges). Use before Read.
How does the PostToolUse compression work?
When neuralmind install-hooks . has been run, Claude Code invokes NeuralMind after every Read/Bash/Grep tool call but before the agent sees the output. Read becomes a skeleton (~88% smaller). Bash keeps errors + the last 3 lines (~91% smaller). Grep caps at 25 matches.
Set NEURALMIND_BYPASS=1 on any command to opt out temporarily.
Does it auto-update when I change code?
Only if you install the git post-commit hook with neuralmind init-hook .. Otherwise run neuralmind build . manually — it's incremental, and only re-embeds changed nodes.
What if retrieval quality is poor on my repo?
First, check that neuralmind stats . reports all your nodes indexed. Then run neuralmind benchmark . to see reduction ratios on real queries. Enable query memory (it prompts on first TTY run) and periodically run neuralmind learn . — cooccurrence-based reranking improves relevance on your actual patterns.
If it still feels off, open an issue with the query and expected result. Retrieval quality is the thing we most want to improve.