Sources

Grounding, citations, and further reading for GraphRAG: When the Index Is a Graph.

All of this is optional. The article itself is the tutorial. This page exists for readers who want to follow the citation trail back to the primary sources, see the original wording of the published claims, and read deeper into the survey literature.

Nothing on this page is required reading. The numbered references in the article hyperlink to the corresponding entries here, so you can jump in at the point of interest and follow the back-to-article link to return.

About the Sources

Edge et al.: From Local to Global (anchor paper)

Edge, D., Trinh, H., Cheng, N., Bradley, J., Chao, A., Mody, A., Truitt, S., Metropolitansky, D., Ness, R. O., & Larson, J. (2024). arXiv:2404.16130.

The Microsoft Research paper that defined the modern GraphRAG architecture. Names the global-vs-local query distinction, proposes the two-stage indexing pipeline (entity-graph extraction plus community summarization), and reports the 281-minute indexing benchmark that has framed every cost conversation since. Available at arxiv.org/abs/2404.16130.

Microsoft GraphRAG documentation and repository

Microsoft. Official docs site and MIT-licensed reference implementation.

The canonical implementation. Documentation at microsoft.github.io/graphrag enumerates the indexing pipeline stages and the four query modes (Global, Local, DRIFT, Basic). Repository at github.com/microsoft/graphrag; the entity-extraction and community-clustering modules are cited by file path in the article.

Peng et al.: GraphRAG survey

Peng, B., Zhu, Y., Liu, Y., Bo, X., Shi, H., Hong, C., Zhang, Y., & Tang, S. (2024). arXiv:2408.08921.

Formalizes the three-stage GraphRAG workflow (graph-based indexing, graph-guided retrieval, graph-enhanced generation). Useful as a structural map of the field. Available at arxiv.org/abs/2408.08921.

Han et al.: Retrieval-Augmented Generation with Graphs

Han, H., Wang, Y., Shomer, H., et al. (2024). arXiv:2501.00309.

Decomposes a GraphRAG system into five components: query processor, retriever, organizer, generator, data source. Names the graph itself as the data source rather than as a retriever choice, which sharpens the architectural framing. Available at arxiv.org/abs/2501.00309.

Zhang et al.: customized GraphRAG survey

Zhang, Q., Chen, S., Bei, Y., et al. (2025). arXiv:2501.13958.

Surveys GraphRAG variants for customized LLM applications. Names three specific failure modes of flat text retrieval that motivate the graph alternative: complex query understanding in professional contexts, knowledge integration across distributed sources, and system efficiency at scale. Available at arxiv.org/abs/2501.13958.

Microsoft Research blog posts (2024 launch coverage)

Larson, J., Truitt, S., Edge, D., & Trinh, H. (2024). Microsoft Research Blog.

Two posts accompanying the public launch and GitHub release of GraphRAG. The first (February 2024) frames GraphRAG against baseline RAG defined as vector-similarity search. The second (July 2024) frames whole-dataset questions as the place where top-k retrieval is the wrong primitive. Useful as a vendor-side rhetorical record alongside the academic paper.

Microsoft RAI transparency document

Microsoft. RAI_TRANSPARENCY.md in the GraphRAG repository.

The project's Responsible AI transparency document. Enumerates the indexing-cost concern, the extraction-prompt-quality dependency, and the model-level risk surface that the article calls out as load-bearing operational facts. Linked at github.com/microsoft/graphrag/blob/main/RAI_TRANSPARENCY.md.

Bratanic / LangChain knowledge-graph writeup

Bratanic, T. (2024, March 15). LangChain blog.

Reference pattern for combining graph and vector retrieval in a LangChain / Neo4j pipeline using LLMGraphTransformer. Articulates the motivation for hybrid graph-plus-vector retrieval in plain practitioner terms. Available at blog.langchain.com.

HippoRAG paper

Gutierrez, B. J., Shu, Y., Gu, Y., Yasunaga, M., & Su, Y. (2024). NeurIPS 2024. arXiv:2405.14831.

Replaces community summarization with Personalized PageRank over an LLM-extracted graph. Reports gains of up to 20% on multi-hop question answering and 10 to 30 times cheaper retrieval than iterative methods. Available at arxiv.org/abs/2405.14831.

LightRAG paper

Guo, Z., Xia, L., Yu, Y., Ao, T., & Huang, C. (2024). EMNLP 2025. arXiv:2410.05779.

Dual-level retrieval (low-level entity, high-level theme) plus incremental indexing. Positions explicitly against Microsoft GraphRAG and reports dramatic cost reductions on retrieval and on corpus updates. Available at arxiv.org/abs/2410.05779.

Leiden algorithm paper (Traag, Waltman, van Eck)

Traag, V. A., Waltman, L., & van Eck, N. J. (2019). Scientific Reports 9:5233.

The community-detection algorithm Microsoft GraphRAG uses. Proves that Leiden guarantees connected communities, which is the property that justifies treating each community summary as a coherent thematic region. Available at arxiv.org/abs/1810.08473.

Neo4j GraphRAG Python documentation

Neo4j. Official documentation site.

Official Neo4j-maintained Python entry point for building GraphRAG systems on Neo4j. The user guide documents nine retriever classes, illustrating the pluralism of retrieval primitives that the same graph index can support. Available at neo4j.com/docs/neo4j-graphrag-python/current/.

LlamaIndex PropertyGraphIndex documentation

LlamaIndex. Framework documentation.

Documents pluggable extractors (SimpleLLMPathExtractor, ImplicitPathExtractor, DynamicLLMPathExtractor, SchemaLLMPathExtractor) and four parallel sub-retrievers. The framework-level abstraction that makes the schema-free / schema-driven tradeoff explicit as an API choice. Available at developers.llamaindex.ai.

Comparative-benchmark papers (Xiang, Han, Zeng, da Cruz)

Four 2025 evaluations that contextualize the original Microsoft results.

Xiang et al. (ICLR 2026) introduces the four-level task taxonomy used to predict which workloads favor graph retrieval. Han et al. supplies a unified evaluation protocol. Zeng et al. critiques the evaluation methodology and finds reported GraphRAG gains overstated. Da Cruz et al. compares Microsoft GraphRAG against ontology-driven graphs and against vector RAG. Each is cited individually below.

HybridRAG, ORAN, and Towards Practical GraphRAG

Three papers grounding the hybrid graph-plus-vector pattern.

Sarmah et al. (HybridRAG, 2024) evaluates the hybrid pattern on financial earnings-call transcripts. Ahmad et al. (ORAN, 2025) replicates the finding on telecom specifications. Min et al. (Towards Practical GraphRAG, 2025) proposes dependency-parsing extraction plus Reciprocal Rank Fusion as a lower-cost alternative pipeline.

What "Graph" Means Here

2Peng et al. survey definition of GraphRAG

Peng et al. open their survey with a definition that captures the load-bearing intuition behind the architecture: "Graph, by its intrinsic 'nodes connected by edges' nature, encodes massive heterogeneous and relational information, making it a golden resource for RAG in tremendous real-world applications." The survey formalizes the three-stage GraphRAG workflow: graph-based indexing, graph-guided retrieval, and graph-enhanced generation.

Peng et al. (2024), Graph Retrieval-Augmented Generation: A Survey. arXiv:2408.08921

Sources

About the Sources

Edge et al.: From Local to Global (anchor paper)

Microsoft GraphRAG documentation and repository

Peng et al.: GraphRAG survey

Han et al.: Retrieval-Augmented Generation with Graphs

Zhang et al.: customized GraphRAG survey

Microsoft Research blog posts (2024 launch coverage)

Microsoft RAI transparency document

Bratanic / LangChain knowledge-graph writeup

HippoRAG paper

LightRAG paper

Leiden algorithm paper (Traag, Waltman, van Eck)

Neo4j GraphRAG Python documentation

LlamaIndex PropertyGraphIndex documentation

Comparative-benchmark papers (Xiang, Han, Zeng, da Cruz)

HybridRAG, ORAN, and Towards Practical GraphRAG

What "Graph" Means Here

2Peng et al. survey definition of GraphRAG

3Han et al. five-component decomposition

4Zhang et al. on the limits of flat text retrieval

1Microsoft's "structured, hierarchical" framing

6Microsoft Research launch announcement

The GraphRAG Anchor Paper

5Edge et al. on the global-query failure mode

1Microsoft documentation: four query modes

The Indexing Pipeline

8Microsoft repository: entity extraction module

9RAI document on prompt-dependence

14Bratanic / LangChain on LLMGraphTransformer

8Microsoft repository: cluster_graph module

The Retrieval Mechanism

7Microsoft GitHub-release announcement

13Neo4j: nine retriever classes

15LlamaIndex PropertyGraphIndex sub-retrievers

Leiden and Why It Matters

10Traag, Waltman & van Eck on the Louvain defect

The Indexing-Cost Economics

19Min et al. on practical extraction costs

23Xiang et al. per-query token cost table

26LightRAG on incremental updates

The Implementation Ecosystem

12Neo4j GraphRAG package positioning

16LlamaIndex KnowledgeGraphIndex deprecation

Hybrid GraphRAG

17Sarmah et al. on financial earnings calls

18Ahmad et al. on ORAN specifications

HippoRAG and LightRAG

25HippoRAG's PageRank-based retrieval

Comparative Benchmarks

22Han et al. systematic evaluation

24Zeng et al. on overstated gains

20Da Cruz et al. ontology comparison

Acknowledged Limitations

5Edge et al. on generalization scope

9RAI document on model-level risk