Your team or product spends measurable money on LLM API calls for code questions. You need to reduce spend without moving to a cheaper model or cutting features, and report the savings to a stakeholder.
Before installing NeuralMind, capture what you’re spending. Pick a representative workday:
# Count tokens per query today by logging your agent's input/output
# (most agents have a debug mode or you can estimate with tiktoken)
Compute: avg_tokens_per_query × queries_per_day × 30 × $_per_MTok. That’s your monthly floor.
pip install neuralmind
neuralmind build .
neuralmind install-hooks . # Claude Code users only
neuralmind benchmark . --json
Returns:
{
"wakeup_tokens": 341,
"avg_query_tokens": 739,
"avg_reduction_ratio": 65.6,
"results": [...]
}
Compare avg_query_tokens to your pre-install baseline. This is the retrieval-side savings.
PostToolUse hooks compress Read/Bash/Grep output. Rough numbers:
| Tool | Typical reduction |
|---|---|
| Read | ~88% (file → skeleton) |
| Bash | ~91% (errors + tail) |
| Grep | Capped at 25 matches |
Combined retrieval + consumption is typically 5–10× total reduction vs baseline.
A one-page summary template:
NeuralMind rollout — token cost impact
- Baseline:
{avg_tokens} × {queries/day} × 30 × ${price}/MTok = ${monthly}- After NeuralMind:
{new_tokens} × {queries/day} × 30 × ${price}/MTok = ${new_monthly}- Reduction: {ratio}× on retrieval, {total_ratio}× combined with PostToolUse hooks
- Setup cost: one-time
neuralmind build(~minutes)- Ongoing cost: incremental rebuild on git commit (seconds)
- Risk: fully local, no new SaaS dependency, MIT-licensed
neuralmind init-hook . auto-rebuilds on every commit.neuralmind install-hooks .) so the synapse layer learns from your actual usage automatically — no manual step, and stale associations decay instead of lingering.neuralmind benchmark again when you switch models — absolute dollar savings scale with input price.If a query returns 5K tokens when you’d expect 800, you used to be debugging by reading log files. v0.6.0 makes it visual.
neuralmind serve . in a separate terminal.The graph view highlights the L3 hits the agent received. The
diagnosis is usually obvious from the pulse pattern — a stale
cluster boundary that grabbed too many nodes, a missing structural
edge that forced the retriever wider, or an unexpected hub node
pulling in unrelated context. Fix the underlying issue (rebuild the
index, update CLAUDE.md, or tune the cluster boundary) and the
next replay shows a tighter result.
The pulse-rings live feed is also useful during normal use: if you notice the canvas going quiet during sessions you’d expect to be busy, that’s a signal the agent isn’t actually using NeuralMind retrieval (maybe the MCP server isn’t wired up, or the hooks didn’t install). A visual heartbeat is faster than checking a log.