The standard recipe: chunk files by lines or tokens, embed them, store in a vector DB, retrieve top-k by cosine similarity, concatenate into the prompt. Works fine for documentation and long-form text.
Code is not prose. It has structure (call graphs, imports, class hierarchies) that text chunking destroys. NeuralMind keeps that structure and uses it.
| Dimension | Generic RAG | NeuralMind |
|---|---|---|
| Unit of retrieval | Text chunk (e.g. 500 chars) | Graph node (function, class) with metadata |
| Context | Flat list of chunks | Progressive: identity → summary → clusters → hits |
| Call graph | Lost at chunking | Preserved, used in skeletons |
| Community/cluster awareness | None | First-class (top clusters by relevance in L2) |
| Cross-file edges | Not encoded | Explicit (imports_from, shares_data_with) |
| Token budget | You enforce it | Built-in, reported per query |
| Consumption-side savings | None | Read/Bash/Grep PostToolUse compression |
If you already have a generic RAG pipeline and only want the compression half, NeuralMind’s PostToolUse hooks can run standalone without the retrieval layer.