GraphRAG for Code: Knowledge Graphs for AI

Retrieval-Augmented Generation — RAG — is the most widely adopted technique for giving AI systems access to external knowledge. The idea is simple: instead of relying on what the model memorized during training, retrieve relevant documents at query time and inject them into the context. It works well for documents. For code, it has a fundamental structural mismatch that creates subtle but serious failure modes.

GraphRAG is a different approach. Instead of retrieving flat text chunks, it queries a graph of entities and relationships. Applied to codebases, GraphRAG builds a knowledge graph of your source code — functions, classes, modules, and the relationships between them — and lets AI query this graph before generating anything.

Here's why the difference matters, and how it works in practice.

Why Traditional RAG Fails for Code

Standard RAG works by embedding documents as vectors, storing them in a vector database, and retrieving the most semantically similar documents when a query comes in. The retrieved chunks are then injected into the LLM's context alongside the user's prompt.

For code, this approach has three structural problems:

Code is not semantically flat. When you split a codebase into chunks and embed them, you lose the structural relationships between those chunks. moduleA.py imports from utils.py. classX inherits from baseY. functionZ is called from 23 other places. These relationships are not recoverable from vector similarity — they require understanding the structure of the code, not just its content.

Semantic similarity ≠ architectural relevance. A function that appears semantically similar to your query might be architecturally irrelevant. Conversely, a function that's critical to understand for your change might use completely different vocabulary and score low on semantic similarity. Retrieval based purely on text similarity frequently returns the wrong context for architectural reasoning.

Chunks lose their context. When you chunk a source file and embed the chunks independently, each chunk loses its position in the larger structure. A 100-line class method doesn't tell you which class it belongs to, what that class's responsibilities are, or what other methods in the class need to stay consistent with any change you make.

These aren't edge cases — they're the core of what makes coding hard. The architectural structure of a codebase is precisely what flat text retrieval discards.

What is GraphRAG?

GraphRAG is a retrieval approach where the knowledge store is a graph rather than a flat collection of embedded documents. Instead of storing text chunks as vectors, you store entities (nodes) and relationships (edges) in a graph database.

Retrieval in a graph system works differently from vector search. Instead of "find the N most similar chunks to this query," you execute graph traversals: "find all functions that call this function," "find all modules that depend on this class," "find all components that would be affected by a change to this interface." These structural queries return exactly the contextually relevant information — not the most semantically similar text, but the architecturally connected entities.

The term GraphRAG was popularized by Microsoft Research in the context of document knowledge graphs for general QA. Applied to codebases, the concept translates naturally and gains additional precision because code has explicit, machine-parsable structure that natural language documents don't.

GraphRAG Applied to Code: Nodes and Relationships

A code knowledge graph represents your codebase as a network of entities and relationships. The core entities are:

Modules — the top-level structural units. In Python, these are files and packages. In JavaScript/TypeScript, these are files and ES modules. Modules have names, paths, and export lists.

Classes — type definitions that encapsulate state and behavior. They have attributes, methods, and inheritance relationships to other classes.

Functions — the atomic units of executable logic. They have signatures, return types, and call relationships to other functions.

Dependencies — explicit import relationships between modules. These form the backbone of architectural understanding: what depends on what, and how changes propagate.

The relationships between these entities are where the structural intelligence lives:

Module → IMPORTS → Module
Module → CONTAINS → Class
Module → CONTAINS → Function
Class → INHERITS_FROM → Class
Class → HAS_METHOD → Function
Function → CALLS → Function
Function → USES → Class

With this graph, you can answer questions that flat RAG cannot:

What would break if I change the signature of this function?
What classes depend on this interface?
What is the full dependency chain between moduleA and moduleZ?
What functions in this module are called from outside the module?

These are the questions that matter for software development. They're architectural questions, and they require structural answers.

RAG vs GraphRAG for Code: A Comparison

Dimension	Traditional RAG	GraphRAG (Code Knowledge Graph)
Knowledge representation	Flat text chunks + embeddings	Entity graph with typed relationships
Retrieval mechanism	Vector similarity search	Graph traversal + semantic search
Structural relationships	Lost during chunking	Explicitly modeled and queryable
Impact analysis	Not possible	Traversal of CALLS / IMPORTS graphs
Dependency resolution	Not possible	Direct graph query
Architectural coherence	Not preserved	Core property of the representation
Context quality	Semantically similar, architecturally noisy	Architecturally precise, structurally complete

The Role of Neo4j, Qdrant, and LangGraph

Building a GraphRAG system for code requires three distinct infrastructure components, each serving a different part of the retrieval pipeline.

Neo4j — the graph database. Neo4j is the industry standard for graph databases, and it's the natural choice for storing a code knowledge graph. It stores your codebase as a property graph: nodes with labels (Module, Class, Function) and properties (name, path, language), connected by typed, directed relationships. The query language (Cypher) is purpose-built for graph traversal and makes structural queries natural to express.

When you ask "what modules import from this module?" or "what is the full call chain from entrypoint to this function?", you're running Cypher queries against the Neo4j graph. The structural intelligence lives here.

Qdrant — the vector database for semantic search. Neo4j gives you structural queries. But sometimes you don't know the exact entity you're looking for — you know what it does, not what it's called. For semantic search ("find all functions that handle authentication"), you need vector embeddings.

Qdrant stores vector embeddings of code entities — functions, classes, documentation strings — and supports fast approximate nearest-neighbor search. When a query is semantic rather than structural, you search Qdrant first, find the relevant entities, then use those entities as starting points for graph traversal in Neo4j.

The combination of Neo4j and Qdrant gives you both structural precision and semantic flexibility. You can traverse the graph from a semantically retrieved starting point — the best of both retrieval approaches.

LangGraph — the orchestration pipeline. Having the right knowledge stores isn't enough on its own. You need an orchestration layer that decides when to query the graph, which structural queries to run, and how to compose the retrieved context before handing it to the LLM.

LangGraph is a framework for building stateful, deterministic AI pipelines. It models the AI workflow as a state machine — explicit states, explicit transitions, explicit nodes that execute specific operations. This determinism is critical: it ensures the AI cannot shortcut the context retrieval step. Every generation goes through the same pipeline: parse the request → identify relevant entities → query the graph → retrieve semantic context → compose structured context → generate.

The determinism also makes the system debuggable and auditable. When something goes wrong, you can inspect exactly what context was retrieved and how it was composed — something that's impossible with unstructured prompt engineering.

What Changes When AI Has a Graph of Your Codebase

The practical difference between flat RAG and GraphRAG for code shows up most clearly in the quality of AI-generated changes on non-trivial tasks.

With flat RAG, the AI generates code based on semantically similar examples. It might produce code that works in isolation but inadvertently violates a convention, ignores a dependency constraint, or makes a change that breaks callers the AI didn't know existed. These failures are hard to detect in code review because the generated code is locally correct — the problem is architectural, not syntactic.

With a code knowledge graph, the AI has access to structural context before generation:

It knows what modules import the function it's about to change
It knows the full inheritance chain of the class it's modifying
It knows what architectural decisions (ADRs) are relevant to the component
It knows which other functions call the function being refactored

This structural awareness produces architecturally coherent changes, not just syntactically correct ones. The AI generates code that fits the system, not just code that compiles.

It also enables impact analysis before generation: before making any change, the system can compute what would be affected. This changes the workflow from "generate and debug" to "understand, then generate" — a fundamental shift in how AI-assisted development works at scale.

Cerebro

Cerebro builds this graph automatically

Parse your codebase into a Neo4j + Qdrant knowledge graph. Force the AI to query it before generating. BYOK — bring your own API key, no vendor lock-in.

Get early access

Building a Code Knowledge Graph: The Pipeline

Constructing a code knowledge graph from a real codebase is an engineering problem with several distinct stages.

Parsing. The first step is parsing source files into Abstract Syntax Trees (ASTs). AST parsing gives you the full syntactic structure of the code — every function definition, class declaration, import statement, and call expression — in a machine-readable form. This is the raw material for the graph.

Different languages require different parsers. Python has libraries like LibCST that provide robust, round-trip-safe AST parsing. JavaScript and TypeScript have Babel and TypeScript's own compiler API. The parser outputs a structural representation of each file that can be mapped to graph nodes and edges.

Entity extraction and normalization. From the AST, you extract entities (modules, classes, functions) with their properties (names, signatures, docstrings) and relationships (imports, calls, inheritance). Normalization is important: you need to resolve references across files — when moduleA calls a function from moduleB, you need to recognize the call target as the same entity as the function definition in moduleB's graph.

Graph construction. The normalized entities and relationships are written to Neo4j as nodes and edges. The graph is built incrementally and updated as the codebase changes. This incremental maintenance is what keeps the graph accurate over time — it's not a one-time snapshot but a live representation.

Embedding generation. In parallel, meaningful text representations of code entities (function signatures + docstrings, class descriptions, module summaries) are embedded using a code-optimized embedding model and stored in Qdrant. These embeddings enable semantic search alongside structural traversal.

Query interface. The query interface wraps the Neo4j and Qdrant stores and exposes them to the LangGraph orchestration pipeline. Queries are parameterized by the current task context — what files are being edited, what change is being requested, what entities are mentioned.

The Limits of GraphRAG for Code

GraphRAG is a significant improvement over flat RAG for code, but it's not a complete solution to every problem in AI-assisted development.

A code knowledge graph represents the static structure of the codebase at a point in time. Dynamic behavior — how the system behaves at runtime, what data flows through it, how it responds to different inputs — is not captured by the graph. Understanding runtime behavior requires different approaches (traces, logs, runtime analysis).

The graph also doesn't automatically capture semantic constraints — invariants the codebase maintains that aren't explicitly represented in the structure. These need to be captured as documentation artifacts (Architecture Decision Records, conventions, design notes) linked to the relevant graph nodes. This is a human-in-the-loop element of a full context engineering system.

Finally, graph quality depends on parser quality. Dynamically-typed languages and heavy use of metaprogramming or dynamic dispatch can make precise static analysis difficult. A well-built system handles these cases gracefully with partial information rather than failing silently, but it's important to understand that a code graph is a best-effort structural approximation, not a perfect model.

Summary

Traditional RAG treats source code as flat text and loses structural information during chunking. GraphRAG builds a knowledge graph of your codebase — modules, classes, functions, and their relationships — and retrieves structurally precise context via graph traversal.

Key takeaways:

Flat RAG fails for code because code is inherently structural, not flat text
GraphRAG stores entities and relationships explicitly, enabling structural queries that vector search cannot answer
The three infrastructure components are: Neo4j (graph store), Qdrant (vector embeddings), LangGraph (deterministic orchestration)
A code knowledge graph enables impact analysis, dependency resolution, and architecturally-aware generation
The practical result is AI that generates code that fits the system — not just code that compiles

We're building Cerebro around this architecture. It parses your codebase, builds the graph, maintains it as the code changes, and forces the AI to query it before every generation. Read more about why we built it this way in the philosophy section and in our companion article: What is Context Engineering for AI-Assisted Development?

If you're building something in this space or just want to follow the progress, join the waitlist. We're building in public and we share real technical decisions in the engineering section.