neuralmind

Honest assessment

The skeptic’s companion to BUSINESS-CASE.md. The business case makes the compelling, fact-based argument for NeuralMind. This page is the counterpart: when NeuralMind isn’t worth installing, what the headline numbers don’t measure, and where the evidence is still thin. Read both before deciding.

This document represents the project’s stance, drafted with AI assistance and reviewed by the maintainer. Pull requests that sharpen the honesty are welcome.


TL;DR

NeuralMind is most useful if you are a Claude Code or Cursor user with a codebase larger than ~10K lines who is feeling token cost or context-limit pressure today. It is not very useful if your codebase fits in a single context window, you don’t pay for inference, or you’ve already invested in prompt caching plus a long-context model.

The headline “40–70×” reduction is real, but it’s a reduction in retrieval input tokens, not a reduction in your total LLM bill. What you actually save depends on how much of your spend is retrieval vs. generation, which varies wildly by workload. For a typical Claude Code session the realistic end-to-end savings is 3–10× total cost, not 40–70×.

The community-benchmark table is currently two entries from the maintainer’s own projects. Numbers from outside contributors are the single most valuable thing you can give back if NeuralMind ends up working for you.


When NeuralMind is worth setting up

You’ll likely see real benefit if all of these are true:

If you check 3 of 5, marginal. If you check 4–5, run bash scripts/demo.sh and then neuralmind benchmark . on your repo.

When NeuralMind is not worth it

What “40–70× reduction” actually means

The number is honest for what it measures:

Retrieval-stage input tokens vs. a “load every code file” baseline, on the same query, measured with tiktoken.

What it does not mean:

A realistic mental model: NeuralMind shrinks the “what context to load” decision from O(repo) to O(query). If your agent makes 100 context-loading calls a day on a 50K-token repo, that compounds. If it makes 5 calls a day on a 5K-token repo, it doesn’t.

The community benchmark caveat

The table in README.md currently has two entries, both from repositories owned by the project maintainer. This is honest disclosure, not a flaw — the project is new and outside benchmarks take time to accumulate. But it means:

If you run the benchmark, please contribute your numbers — even disappointing ones. A “I tried NeuralMind on my Rust monorepo and got 8×, not 50×” entry is more valuable to the next visitor than a “55× on my hand-picked Python repo” entry.

neuralmind benchmark . --contribute

What we haven’t measured well yet

The current benchmark suite covers token reduction rigorously (self-benchmark in CI, regression-gated). It covers retrieval quality weakly (top-k hit rate on a 10-query fixture). It does not yet cover:

These are tracked on ROADMAP.md under “Next” and are open contribution targets.

Setup cost (realistic)

Step First time Re-run / re-build
pip install neuralmind graphifyy ~30s n/a
graphify update . (knowledge graph) 10s–2min depending on repo size seconds, incremental
neuralmind build . (vector index) 30s–5min depending on graph size seconds, incremental
Editor / agent integration 5–10min n/a
Total to first query ~10–20 min for a 50K-line repo seconds

Re-runs after code changes are fast (incremental). First-time setup is the friction point. If your monthly LLM bill is under $50, that ~15 min may not pay back; if it’s over $500, it almost certainly will.

Versus the obvious alternatives, honestly

See docs/comparisons/ for longer side-by-sides on each.

What would change our minds

We’d downgrade our own claims if:

We’d upgrade them if:

Decision in three lines