v0.3.4 · pip install neuralmind

Your codebase,
40–70× smaller
for AI agents

NeuralMind turns any repository into a queryable neural index. AI coding agents answer code questions in ~800 tokens instead of loading 50,000+ tokens of raw source.

40–70×
Token Reduction
~800
Tokens per Query
97%
Cost Savings

LLMs are flying blind
on large codebases

Without NeuralMind, every code question forces an AI agent to load raw source files — burning tokens and budget on irrelevant context.

Without NeuralMind

Raw file loading on every query

Tokens per query
50,000+
Cost (Claude Sonnet)
$0.15–$3.75
Monthly (100 queries/day)
~$450

With NeuralMind

Smart semantic context retrieval

Tokens per query
~800
Cost (Claude Sonnet)
$0.002–$0.06
Monthly (100 queries/day)
~$7

Two ways to know
you need this

Start with what's annoying you (symptoms), or start with what you're trying to achieve (goals). Every row maps to the specific command that fixes it.

Symptoms — "This is happening to me"

Claude Code hits context limits mid-taskinstall-hooks
My monthly LLM bill keeps climbingquery + hooks
I re-paste project context every sessionwakeup
Agent reads a 2,000-line file to answer one questionskeleton
Grep floods the agent with 100+ matchesinstall-hooks
I want to query my code from ChatGPT / Geminiwakeup | pbcopy
Retrieval feels random across similar questionslearn

Goals — "What am I solving for?"

Cut LLM spend on code Q&A5–10× cheaper
Faster, more grounded agent responsesfewer hallucinations
Keep all code local — no SaaS, no telemetry100% offline
Work across Claude + GPT + Gemini with one indexmodel-agnostic
Make retrieval adapt to our actual queriesneuralmind learn
Measure and report savings to stakeholdersbenchmark --json
Auto-refresh the index on every commitinit-hook

Built for developers
paying per token

If one of these fits, NeuralMind was built for you.

Claude Code user

Watching your token bill climb. PostToolUse hooks compress every Read/Bash/Grep on top of ~60× query savings.

Cursor user

Want semantic retrieval outside Cursor too. CLI + MCP work in any agent with the same index.

Cline / Continue user

No built-in codebase index. Drop-in MCP tools — query, skeleton, wakeup.

GPT / Gemini / local models

Pipe wakeup or query output into any chat — model-agnostic, works anywhere.

Solo dev, growing monorepo

Incremental rebuilds + learning that adapts to your real query patterns.

Team lead tracking LLM spend

Measurable per-query reduction with neuralmind benchmark.

Security-conscious / regulated

100% local, offline, no outbound calls. Nothing leaves the machine — ever.

Researcher / hobbyist

Open-source reference implementation of two-phase token optimization. MIT.

Not a fit if: you need cross-repo search across a whole organization (try Sourcegraph Cody), or you only want inline completions (try GitHub Copilot).

4-layer progressive
disclosure

NeuralMind loads only what's relevant to each query. Static orientation layers always load; dynamic layers respond to your specific question.

L0

Identity — Always Loaded

Project name, description, graph size, entry points, main patterns

~100 tokens
L1

Architecture Summary — Always Loaded

Module overview, key components, dependencies, data flow, top clusters

~300 tokens
L2

Relevant Modules — Query-Specific

Code clusters most semantically similar to your question via community detection

~300 tokens
L3

Semantic Search — Query-Specific

Direct vector similarity hits, reranked by learned cooccurrence patterns

~300 tokens

Cut tokens at the
source and the output

Most tools optimize only retrieval. NeuralMind compresses both what agents fetch and what they consume from tool outputs.

Phase 1 — Retrieval

What to fetch

neuralmind wakeup .~365 tokens
neuralmind query "?"~800 tokens
neuralmind skeleton <file>5–15× cheaper
Phase 2 — Compression

What agents see

Read (file)~88% savings
Bash (output)~91% savings
Grep (matches)capped at 25

See exactly what
agents receive

Every response includes a token footer showing real-time savings. No guesswork — you always know the efficiency of context.

Automatic on session start

Run neuralmind wakeup . once. The agent orients itself without reading a single source file.

Query-aware context

Different questions get different context. Asking about auth returns auth clusters. Asking about payments returns payment logic.

Gets smarter over time

The cooccurrence reranker learns which modules appear together in your queries and boosts their relevance automatically.

What this means for
your API bill

Based on 100 queries/day. NeuralMind runs entirely offline — no additional API costs beyond your model provider.

ModelWithout NeuralMindWith NeuralMindMonthly Savings
Claude 3.5 Sonnet$450 / mo$7 / mo$443 saved
GPT-4o$750 / mo$12 / mo$738 saved
Claude Opus$2,250 / mo$36 / mo$2,214 saved
GPT-4.5$11,250 / mo$180 / mo$11,070 saved

Every claim measured
in CI on every PR

NeuralMind benchmarks itself automatically — a real fixture codebase, real tokenizer calls (tiktoken), real retrieval. The numbers below are never hardcoded. CI fails if reduction drops below the floor.

 CI passing — measured with tiktoken o200k_base + cl100k_base on every pull request
NeuralMind token reduction by tokenizer — bar chart auto-generated in CI

Community benchmarks — real repos, zero telemetry

Your code never leaves your machine. You submit only the numbers.

ProjectLanguageNodesAvg query tokensReductionModel
cmmc20JavaScript24173965.6×Claude 3.5 Sonnet
mempalacePython1,62689146.0×Claude 3.5 Sonnet

Works directly in
Claude Desktop & Cursor

Native Model Context Protocol server. Call NeuralMind tools directly from your AI agent session — no wrappers, no middleware.

neuralmind_wakeup

Session-start orientation. Returns project context in ~365–600 tokens without reading any source files.

~400 tokens

neuralmind_query

Answer any code question. Returns L0–L3 structured context with token count and reduction ratio.

~800–1100 tokens

neuralmind_skeleton

Explore a file's functions, call graph, and cross-file dependencies without loading full source.

5–15× cheaper

neuralmind_search

Semantic entity search. Finds functions, classes, and routes by concept — ranked by similarity.

ranked results

neuralmind_build

Incremental index update. Only re-embeds changed nodes — fast after small code changes.

incremental

neuralmind_benchmark

Measure per-query token counts and reduction ratios on your actual codebase.

metrics

NeuralMind vs.
Heuristic-only retrieval

Both approaches reduce context. The tradeoff is retrieval quality vs. zero dependencies. NeuralMind runs fully offline — no API calls, no cloud services, no data leaves your machine.

FeatureHeuristic-only🧠 NeuralMind
Token reduction~33× (97% fewer tokens)40–70×
Retrieval accuracy70–80% top-5Higher (semantic)
External dependencies NoneChromaDB (local)
Runs offline Yes Yes
Learns from usage No Cooccurrence reranking
MCP server No Native
PostToolUse compression No Phase 2 hooks
File skeleton view No Call graph + deps

NeuralMind vs. everything else

Honest one-liners on every tool developers evaluate alongside NeuralMind. Each links to a full comparison page with feature matrix and trade-offs.

vs. Cursor @codebase

Works only in Cursor. NeuralMind works in any agent and adds tool-output compression on top.

Read full comparison →

vs. GitHub Copilot

Copilot is hosted completions tied to your GitHub account. NeuralMind is local context for any agent, any model.

Read full comparison →

vs. Aider repo-map

Aider's repo-map is syntactic only (tree-sitter + PageRank). NeuralMind adds semantic retrieval and compression.

Read full comparison →

vs. Sourcegraph Cody

Cody is server-hosted and org-wide. NeuralMind is local and per-project — different scale, different deployment model.

Read full comparison →

vs. Continue / Cline

Those are agent runtimes. NeuralMind is the context and compression layer underneath — they compose.

Read full comparison →

vs. Windsurf / Codeium

Vertically integrated IDE with server-side indexing. NeuralMind is editor- and model-agnostic, fully local.

Read full comparison →

vs. Claude Projects

Projects reloads all attached files every turn. NeuralMind retrieves only what the query needs — ~800 tokens vs tens of thousands.

Read full comparison →

vs. Prompt caching

Caching amortizes a big prompt. NeuralMind makes the prompt small in the first place — combine both for the cheapest workload.

Read full comparison →

vs. LangChain / LlamaIndex

Frameworks you assemble yourself. NeuralMind is the assembled, opinionated default for code agents.

Read full comparison →

vs. Long context (1M / 2M)

Possible ≠ cheap. A 50K-token repo at Claude Sonnet rates costs $0.15 every turn; NeuralMind drops that to $0.002.

Read full comparison →

vs. Generic RAG

Text chunking loses structure. NeuralMind keeps the call graph, clusters, and cross-file edges intact.

Read full comparison →

vs. tree-sitter / ctags / grep

Deterministic but syntactic. Use alongside NeuralMind for exact-name lookups — not instead of it for natural-language questions.

Read full comparison →

Honest answers
to common questions

The questions we get asked most. Click any question to expand.

How much does NeuralMind actually reduce Claude / GPT token costs?

Measured on real repos: 40–70× reduction per query. For a team running 100 queries/day on Claude Sonnet, that is roughly $450/month → $7/month. With PostToolUse hooks layered on top, combined savings are typically 5–10× beyond the query reduction. Exact numbers depend on codebase size and model pricing — run neuralmind benchmark . --json on your project for a concrete figure.

Does NeuralMind work outside Claude Code?

Yes. The CLI runs anywhere Python runs. The MCP server integrates with Cursor, Cline, Continue, Claude Desktop, and any MCP-compatible agent. For non-MCP tools like ChatGPT or Gemini, pipe neuralmind wakeup . | pbcopy into a regular chat window.

Only the PostToolUse compression hooks are Claude-Code-specific — everything else is model- and agent-agnostic.

Does my code leave my machine?

No. NeuralMind is fully offline — no API calls, no cloud services, no telemetry. Embeddings run locally via ChromaDB, the knowledge graph is stored in graphify-out/ inside your project, and query memory (optional, opt-in) is written to .neuralmind/ on disk.

Is this just RAG? How is it different from LangChain or LlamaIndex?

It is a form of RAG, but specialized for code. Instead of chunking text, NeuralMind retrieves over a knowledge graph of code entities (functions, classes, clusters) with a fixed 4-layer structure. The call graph stays intact, and the output is a token-budgeted context — not a flat list of chunks.

See the full vs. LangChain/LlamaIndex comparison.

I have a 1M context window now — do I still need this?

Long context makes it possible to stuff a whole repo in; it does not make it cheap. A 50K-token repo at Claude Sonnet rates costs ~$0.15 every turn. NeuralMind drops that to ~$0.002.

See vs. long context for the full math.

What languages does it support?

Any language graphify supports (Python, JavaScript/TypeScript, and others via tree-sitter). NeuralMind consumes graphify-out/graph.json — if graphify can index it, NeuralMind can query it.

What is the difference between wakeup, query, and skeleton?

wakeup — ~400 tokens of project orientation (L0 + L1). Run it at session start.

query — ~800–1,100 tokens for a specific natural-language question (L0–L3).

skeleton — compact view of a single file (functions + call graph + cross-file edges). Use before Read.

How does the PostToolUse compression work?

When neuralmind install-hooks . has been run, Claude Code invokes NeuralMind after every Read/Bash/Grep tool call but before the agent sees the output. Read becomes a skeleton (~88% smaller). Bash keeps errors + the last 3 lines (~91% smaller). Grep caps at 25 matches.

Set NEURALMIND_BYPASS=1 on any command to opt out temporarily.

Does it auto-update when I change code?

Only if you install the git post-commit hook with neuralmind init-hook .. Otherwise run neuralmind build . manually — it's incremental, and only re-embeds changed nodes.

What if retrieval quality is poor on my repo?

First, check that neuralmind stats . reports all your nodes indexed. Then run neuralmind benchmark . to see reduction ratios on real queries. Enable query memory (it prompts on first TTY run) and periodically run neuralmind learn . — cooccurrence-based reranking improves relevance on your actual patterns.

If it still feels off, open an issue with the query and expected result. Retrieval quality is the thing we most want to improve.

One install.
Dramatically less context.