← All Articles

When Unstructured Search Isn't Enough

Vector search finds text that sounds like the user's question. Some questions need an arithmetic answer or a join across structured fields that no embedding can produce. The query-router layer recognizes which kind of question is being asked and dispatches it to the right backend: vector retrievers for conceptual lookup, SQL for aggregation and precise filtering, graph traversal for relationship questions, all glued by a single provenance schema.

The Happy Path and the Unhappy Path

Ask a RAG system8 "What is our refund policy?" and it will perform admirably.9 The question maps neatly to a chunk of text in a knowledge base. The embedding captures the semantic intent, the retriever finds the relevant passage, and the language model synthesizes a clear answer. This is the happy path, and it works because the answer lives in a paragraph somewhere, waiting to be found.

Now ask it "What were total Q3 sales in the Northeast region?" The answer does not live in any paragraph. It lives in rows of a database, scattered across thousands of transaction records that must be filtered by date and region, then summed. No chunk of text contains the number. It must be computed.10

Or ask "Who reports to the VP of Engineering?" The answer is a set of relationships in an organizational hierarchy. It is not a passage to retrieve but a graph to traverse. The VP node connects to her direct reports via "manages" edges, and those edges are the answer.

Three question shapes, three architectures. The topology is identical; the data store is what changes.

These are not edge cases. In enterprise environments, the majority of high-value questions require structured data: databases, knowledge graphs, APIs that return precise records. A RAG system that can only search unstructured text is solving half the problem. This article covers the other half.

The Limits of Vector Search

Vector search works by encoding text into high-dimensional embeddings and finding passages whose embeddings are close to the query embedding.11 This captures semantic similarity remarkably well. Semantic similarity is not the only kind of relevance, and for several categories of questions, it is not even the right kind.12

Aggregation questions require computation across many records. "How many support tickets were opened last month?" cannot be answered by finding a similar-sounding paragraph. The answer requires counting rows in a table.13 Even if your knowledge base contains a monthly report with last month's number, that report goes stale the moment a new ticket is opened.

Precise filtering questions need exact matches on structured fields. "Show me all orders over $10,000 from customers in California" requires filtering on amount, state, and entity type. Vector search will find text about large orders and California customers, but it cannot guarantee the precision of a SQL WHERE clause.14

Relational questions involve traversing connections between entities. "Which products were purchased by customers who also bought Product X?" requires following edges in a graph of purchase relationships. Embedding similarity cannot represent multi-hop relational reasoning.15

Real-time questions need current data. "What is the current inventory level for SKU-4892?" requires querying a live database, not retrieving a text chunk that may have been embedded days or weeks ago.

QUESTION TYPE EXAMPLE VECTOR SEARCH CATEGORY 1 Aggregation Count, sum, average across rows "How many tickets last month?" No paragraph contains the count FAILS no chunk has the answer CATEGORY 2 Precise filtering Exact match on fields "Orders over $10K in California" Similarity is not WHERE FAILS no exact-match guarantee CATEGORY 3 Relational Traverse entity connections "Who reports to the VP?" Edges, not embeddings FAILS no multi-hop reasoning CATEGORY 4 Real-time Current state of live data "Current inventory for SKU-4892?" Embeddings are stale FAILS indexed days ago
Four question types vector search cannot serve.

The common thread is that these questions require structure: schemas, types, relationships, and operations that go beyond "find me text that means something similar." Recognizing this boundary is the first step toward building RAG systems that actually work in production environments where structured data dominates.16

Text-to-SQL: Querying Databases with Natural Language

The most direct approach to structured data in RAG is Text-to-SQL, where a language model translates a natural language question into a SQL query, executes it against a database, and returns the result. The idea is older than the current generation of LLMs. Researchers have been working on natural language interfaces to databases since the 1970s, with systems like LUNAR5 and CHAT-80.6 What has changed is that modern LLMs make it practical.17

The Mechanics

A Text-to-SQL pipeline has four stages. First, the system describes the database schema to the LLM, enabling schema linking. Second, the LLM generates a SQL query based on the user's question and the schema. Third, the system executes the query against the database. Fourth, the results are returned to the user, optionally passed through the LLM again for natural language formatting.

The schema description is where most of the engineering effort goes. The LLM needs to understand what tables exist, what columns they contain, what the data types are, and how tables relate to each other. The most effective representation is the one developers already use: CREATE TABLE statements.

# Schema representation for a sales database
SCHEMA = """
CREATE TABLE customers (
    customer_id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    region TEXT NOT NULL,        -- e.g., 'Northeast', 'West', 'Southeast'
    segment TEXT NOT NULL,       -- 'Enterprise', 'SMB', 'Consumer'
    created_at DATE NOT NULL
);

CREATE TABLE orders (
    order_id INTEGER PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(customer_id),
    order_date DATE NOT NULL,
    total_amount DECIMAL(10,2) NOT NULL,
    status TEXT NOT NULL         -- 'completed', 'pending', 'cancelled'
);

CREATE TABLE order_items (
    item_id INTEGER PRIMARY KEY,
    order_id INTEGER REFERENCES orders(order_id),
    product_id INTEGER REFERENCES products(product_id),
    quantity INTEGER NOT NULL,
    unit_price DECIMAL(10,2) NOT NULL
);

CREATE TABLE products (
    product_id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    category TEXT NOT NULL,      -- 'Software', 'Hardware', 'Services'
    list_price DECIMAL(10,2) NOT NULL
);
"""

Notice the inline comments. These are critical. Column names like region and segment are ambiguous without context. The comment -- e.g., 'Northeast', 'West', 'Southeast' tells the LLM exactly what values to expect, which means the difference between a working query and a broken one. Empirical evaluation has shown that schema descriptions enriched with column descriptions and sample values significantly improve Text-to-SQL accuracy across multiple LLMs.119

Some teams go further and include a few sample rows alongside the schema, giving the LLM concrete examples of what the data looks like. This is especially helpful when column naming conventions are non-obvious or when the data contains domain-specific codes.18

A Complete Text-to-SQL Pipeline

↗ docs
import sqlite3
import json
from openai import OpenAI

class TextToSQL:
    """Translates natural language questions into SQL queries."""

    def __init__(self, db_path, schema_description):
        self.db_path = db_path
        self.schema = schema_description
        self.client = OpenAI()

    def generate_sql(self, question):
        """Ask the LLM to produce a SQL query for the question."""
        prompt = f"""You are a SQL expert. Given the following database schema,
write a SQL query that answers the user's question.

Schema:
{self.schema}

Rules:
- Return ONLY the SQL query, no explanation
- Use standard SQL syntax
- Always use explicit JOIN syntax (not implicit joins)
- Include column aliases for clarity
- Use ISO date format (YYYY-MM-DD) for date comparisons

Question: {question}

SQL:"""

        response = self.client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.0,
            max_tokens=500
        )
        sql = response.choices[0].message.content.strip()

        # Strip markdown code fences if present
        if sql.startswith("```"):
            sql = sql.split("\n", 1)[1]
            sql = sql.rsplit("```", 1)[0]

        return sql.strip()

    def validate_sql(self, sql):
        """Basic safety and syntax checks before execution."""
        sql_upper = sql.upper().strip()

        # Only allow SELECT statements
        if not sql_upper.startswith("SELECT"):
            raise ValueError("Only SELECT queries are allowed")

        # Block dangerous operations
        forbidden = ["DROP", "DELETE", "INSERT", "UPDATE",
                     "ALTER", "CREATE", "EXEC", "GRANT"]
        for keyword in forbidden:
            if keyword in sql_upper:
                raise ValueError(
                    f"Forbidden SQL keyword detected: {keyword}"
                )

        # Syntax check via EXPLAIN
        conn = sqlite3.connect(self.db_path)
        try:
            conn.execute(f"EXPLAIN {sql}")
        except sqlite3.OperationalError as e:
            raise ValueError(f"SQL syntax error: {e}")
        finally:
            conn.close()

        return True

    def execute(self, sql):
        """Execute the query and return results."""
        conn = sqlite3.connect(self.db_path)
        conn.row_factory = sqlite3.Row
        try:
            cursor = conn.execute(sql)
            columns = [desc[0] for desc in cursor.description]
            rows = [dict(row) for row in cursor.fetchall()]
            return {"columns": columns, "rows": rows}
        finally:
            conn.close()

    def ask(self, question):
        """Full pipeline: question -> SQL -> execution -> answer."""
        # Step 1: Generate SQL
        sql = self.generate_sql(question)
        print(f"Generated SQL:\n{sql}\n")

        # Step 2: Validate
        self.validate_sql(sql)

        # Step 3: Execute
        results = self.execute(sql)

        # Step 4: Format results with LLM
        format_prompt = f"""The user asked: "{question}"

The SQL query returned these results:
{json.dumps(results['rows'], indent=2)}

Provide a clear, natural language answer based on these results.
Include specific numbers. If the results are empty, say so."""

        response = self.client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": format_prompt}],
            temperature=0.0
        )

        return {
            "question": question,
            "sql": sql,
            "raw_results": results,
            "answer": response.choices[0].message.content
        }

Using this pipeline, the question "What were total Q3 sales in the Northeast region?" becomes:

# Generated SQL:
SELECT SUM(o.total_amount) AS total_q3_sales
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE c.region = 'Northeast'
  AND o.order_date >= '2024-07-01'
  AND o.order_date < '2024-10-01'
  AND o.status = 'completed';

# Result: [{"total_q3_sales": 2847391.50}]
# Answer: "Total Q3 2024 sales in the Northeast region
#          were $2,847,391.50."

No amount of vector search could have produced that number. It did not exist as text anywhere in the system. It was computed on demand from structured records.20

Security: The SQL Injection Problem

Text-to-SQL introduces a security surface that does not exist in traditional RAG. The language model is performing semantic parsing: generating executable code from natural language. If a user asks "Show me all customers; DROP TABLE orders;", a naive system might generate and execute that SQL. This is prompt injection meets SQL injection, and the consequences are obvious.

The defenses are layered. First, the database connection should use a read-only user with no permissions to modify data. This is the most important safeguard because it works regardless of what SQL the model generates.21 Second, validate the generated SQL before execution: reject anything that is not a SELECT statement. Third, use query parameterization where possible, though this is harder when the entire query is dynamically generated. Fourth, impose resource limits: set query timeouts and row count caps to prevent denial-of-service through expensive queries like unrestricted cross joins.

Even with these safeguards, Text-to-SQL should not be deployed against databases containing sensitive data without additional access controls. The LLM might generate a valid SELECT that returns data the user is not authorized to see. Row-level security in the database and result filtering in the application layer are both necessary for production deployments.

Limitations of Text-to-SQL

Text-to-SQL accuracy is impressive but not perfect. On the Spider benchmark, a widely used evaluation dataset for Text-to-SQL, the best models achieve around 80-85% exact-match accuracy.2 That sounds good until you consider that one in five queries is wrong, and wrong SQL does not produce a graceful fallback. It produces a wrong answer with the confidence of a correct one.22

Complex queries are where accuracy drops. Multi-table joins with nested subqueries, window functions, and CTEs challenge even the strongest models. Ambiguous column names are another failure point: if two tables both have a name column, the model may pick the wrong one. Domain-specific terminology adds yet another layer. "Churn rate" means something precise to a business analyst but requires mapping to specific SQL operations (counting customers who cancelled divided by total customers over a time period) that the model may not infer correctly.23

The practical response to these limitations is verification. Log every generated query. Compare results against known answers for common questions. Implement a feedback loop where incorrect queries are added to a few-shot prompt, teaching the model from its own mistakes. Some teams maintain a library of validated query templates and use the LLM only to fill in parameters, reducing the surface area for errors while still supporting natural language input.

Knowledge Graphs: When Relationships Are the Answer

Some questions are not about retrieving text or computing aggregates. They are about navigating relationships. "Which suppliers provide components for products in our automotive line?" "What research papers cite the original transformer paper and were published after 2021?" "Which team members have collaborated on more than three projects together?"

These questions involve entities connected by typed relationships, and answering them requires traversing those connections. This is the domain of knowledge graphs.24

What a Knowledge Graph Is

A knowledge graph represents information as a network of entities (nodes) and relationships (edges).7 Each node has a type and properties. Each edge has a type and optionally properties of its own. The power of this representation is that it makes relationships first-class citizens, not implicit patterns buried in text but explicit, queryable structures.25

Consider an organizational knowledge graph. An employee node might have properties like name, title, department, and hire date. A "REPORTS_TO" edge connects the employee to their manager. A "WORKS_ON" edge connects the employee to project nodes. A "SKILLED_IN" edge connects the employee to technology nodes. The same employee participates in multiple relationships, each of which is independently queryable.

# A simple knowledge graph represented as triples
# (subject, predicate, object)

triples = [
    # Organizational structure
    ("Alice Chen",    "HAS_TITLE",    "VP of Engineering"),
    ("Bob Martinez",  "REPORTS_TO",   "Alice Chen"),
    ("Bob Martinez",  "HAS_TITLE",    "Senior Engineer"),
    ("Carol Davis",   "REPORTS_TO",   "Alice Chen"),
    ("Carol Davis",   "HAS_TITLE",    "Engineering Manager"),
    ("Dan Wilson",    "REPORTS_TO",   "Carol Davis"),

    # Project assignments
    ("Bob Martinez",  "WORKS_ON",    "Project Atlas"),
    ("Carol Davis",   "WORKS_ON",    "Project Atlas"),
    ("Dan Wilson",    "WORKS_ON",    "Project Beacon"),

    # Skills and technologies
    ("Bob Martinez",  "SKILLED_IN",  "Python"),
    ("Bob Martinez",  "SKILLED_IN",  "Kubernetes"),
    ("Project Atlas", "USES_TECH",   "Kubernetes"),
    ("Project Atlas", "USES_TECH",   "PostgreSQL"),
]

Now the question "Who reports to the VP of Engineering?" becomes a graph traversal: find the node with title "VP of Engineering," follow all incoming "REPORTS_TO" edges, and return the source nodes. The question "Who on Project Atlas has Kubernetes experience?" becomes: find all nodes connected to "Project Atlas" via "WORKS_ON," intersect with nodes connected to "Kubernetes" via "SKILLED_IN."

Try answering either of those questions with vector search. You would need a text passage that explicitly states who reports to whom, and that passage would need to be kept current as the org chart changes. The knowledge graph makes the relationships queryable without requiring anyone to write them out as prose.

Graph Databases and Cypher

While knowledge graphs can be represented as simple triple stores, production systems typically use graph databases like Neo4j, Amazon Neptune, or TigerGraph. These databases are optimized for relationship traversal, indexing edges so that following connections is fast even in graphs with millions of nodes.

Neo4j uses a query language called Cypher, which reads like ASCII art for graph patterns. Where SQL describes tables and joins, Cypher describes nodes and relationships directly:

↗ docs
// Find all direct reports of the VP of Engineering
MATCH (manager {title: 'VP of Engineering'})
      <-[:REPORTS_TO]-(report)
RETURN report.name, report.title

// Find 2-hop connections: who reports to people
// who report to Alice Chen?
MATCH (alice {name: 'Alice Chen'})
      <-[:REPORTS_TO]-(mid)
      <-[:REPORTS_TO]-(indirect)
RETURN indirect.name, mid.name AS through_manager

// Find team members with specific skills on a project
MATCH (p:Project {name: 'Project Atlas'})
      <-[:WORKS_ON]-(person)
      -[:SKILLED_IN]->(skill)
WHERE skill.name = 'Kubernetes'
RETURN person.name, person.title

The arrow notation (<-[:REPORTS_TO]-) makes the direction and type of each relationship explicit. Multi-hop queries, which would require nested subqueries or recursive CTEs in SQL, are expressed naturally as longer paths in the pattern. This is the fundamental advantage of a graph query language: relationship traversal is a first-class operation, not an afterthought built on top of table joins.

LLM-Powered Graph Querying

Just as LLMs can translate natural language to SQL, they can translate natural language to Cypher. The approach is structurally identical: describe the graph schema to the LLM, provide the user's question, and ask for a Cypher query.

↗ docs
from neo4j import GraphDatabase
from openai import OpenAI

class TextToCypher:
    """Translates natural language to Cypher graph queries."""

    def __init__(self, neo4j_uri, neo4j_auth):
        self.driver = GraphDatabase.driver(neo4j_uri, auth=neo4j_auth)
        self.client = OpenAI()
        self.graph_schema = self._extract_schema()

    def _extract_schema(self):
        """Pull the graph schema from Neo4j."""
        with self.driver.session() as session:
            # Get node labels and their properties
            nodes = session.run(
                "CALL db.schema.nodeTypeProperties()"
            ).data()
            # Get relationship types
            rels = session.run(
                "CALL db.schema.relTypeProperties()"
            ).data()
            return self._format_schema(nodes, rels)

    def _format_schema(self, nodes, rels):
        """Format schema as a readable description for the LLM."""
        schema_parts = ["Node types and properties:"]
        for n in nodes:
            schema_parts.append(
                f"  (:{n['nodeType']}) - {n['propertyName']}: "
                f"{n['propertyTypes']}"
            )
        schema_parts.append("\nRelationship types:")
        for r in rels:
            schema_parts.append(
                f"  [:{r['relType']}] - {r.get('propertyName', '')}"
            )
        return "\n".join(schema_parts)

    def generate_cypher(self, question):
        """Generate a Cypher query from natural language."""
        prompt = f"""You are a Neo4j Cypher expert. Given the following
graph schema, write a Cypher query that answers the question.

Graph Schema:
{self.graph_schema}

Rules:
- Return ONLY the Cypher query
- Use parameterized queries where possible
- Only use MATCH and RETURN (no CREATE, DELETE, SET)
- Use explicit relationship directions

Question: {question}

Cypher:"""

        response = self.client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.0
        )
        return response.choices[0].message.content.strip()

    def ask(self, question):
        """Full pipeline: question -> Cypher -> results -> answer."""
        cypher = self.generate_cypher(question)
        print(f"Generated Cypher:\n{cypher}\n")

        # Execute against Neo4j
        with self.driver.session() as session:
            results = session.run(cypher).data()

        # Format with LLM
        format_prompt = f"""The user asked: "{question}"

The graph query returned:
{json.dumps(results, indent=2, default=str)}

Provide a clear, natural language answer."""

        response = self.client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": format_prompt}],
            temperature=0.0
        )

        return {
            "question": question,
            "cypher": cypher,
            "raw_results": results,
            "answer": response.choices[0].message.content
        }

The pattern mirrors Text-to-SQL in structure. The difference is the query language and the types of questions it handles well. Cypher excels at relationship traversal. SQL excels at aggregation and filtering over tabular data. Neither replaces the other.

Building Knowledge Graphs from Text

One of the most powerful applications of LLMs in the knowledge graph space is automated construction. Given unstructured text, an LLM can extract entities and relationships, transforming prose into structured graph data.26

Consider a paragraph from a company wiki: "Sarah Kim joined the ML Platform team in March 2024. She reports to James Liu, who leads the AI Infrastructure division. The team is currently building a feature store using Apache Spark and Delta Lake."

From this single paragraph, an extraction pipeline can produce:

# Entities extracted
entities = [
    {"name": "Sarah Kim", "type": "Person",
     "properties": {"join_date": "2024-03"}},
    {"name": "James Liu", "type": "Person",
     "properties": {"role": "Division Lead"}},
    {"name": "ML Platform", "type": "Team"},
    {"name": "AI Infrastructure", "type": "Division"},
    {"name": "Feature Store", "type": "Project"},
    {"name": "Apache Spark", "type": "Technology"},
    {"name": "Delta Lake", "type": "Technology"},
]

# Relationships extracted
relationships = [
    ("Sarah Kim", "MEMBER_OF", "ML Platform"),
    ("Sarah Kim", "REPORTS_TO", "James Liu"),
    ("James Liu", "LEADS", "AI Infrastructure"),
    ("ML Platform", "PART_OF", "AI Infrastructure"),
    ("ML Platform", "BUILDS", "Feature Store"),
    ("Feature Store", "USES", "Apache Spark"),
    ("Feature Store", "USES", "Delta Lake"),
]

Seven entities and seven relationships from three sentences. This is information that was implicit in the text, made explicit and queryable through extraction. Once in a graph database, you can traverse these relationships programmatically: "What technologies does the AI Infrastructure division use?" becomes a two-hop query through division, team, project, and technology nodes.

The challenge is accuracy. Entity extraction is imperfect. The LLM might miss entities, hallucinate relationships, or create duplicate nodes for the same entity described differently ("James Liu" vs. "J. Liu" vs. "the division lead").27 Entity resolution, the process of merging duplicate references to the same real-world entity, is an active area of research and a significant engineering challenge in production knowledge graph construction.3

Hybrid Architectures: Routing Queries to the Right System

A production system rarely needs just one retrieval method.28 The typical enterprise has unstructured documents (policies, reports, wiki pages), structured databases (CRM, ERP, financial systems), and relational data that could benefit from a knowledge graph. The question is how to combine them.

The Router Pattern

The simplest and most effective architecture is a router: a component that classifies each incoming query and dispatches it to the appropriate retrieval system.29 The classification can be done by an LLM, a fine-tuned classifier, or even a rule-based system. The key insight is that different types of questions need fundamentally different retrieval strategies.

INPUT User question natural language ROUTER Classify the question CONCEPTUAL Vector search embeddings over document chunks QUANTITATIVE Text-to-SQL generate SQL, execute on database RELATIONAL Text-to-Cypher traverse graph of entities Synthesized answer to user
Router dispatches each question to its native retriever.
↗ docs
import json
from openai import OpenAI
from enum import Enum

class QueryType(Enum):
    VECTOR_SEARCH = "vector_search"
    SQL_QUERY = "sql_query"
    GRAPH_QUERY = "graph_query"
    HYBRID = "hybrid"

class QueryRouter:
    """Routes natural language queries to the appropriate
    retrieval system based on query classification."""

    def __init__(self, vector_retriever, sql_engine, graph_engine):
        self.vector = vector_retriever
        self.sql = sql_engine
        self.graph = graph_engine
        self.client = OpenAI()

    def classify(self, question):
        """Determine which retrieval system to use."""
        prompt = f"""Classify this question into one of four categories.
Return ONLY a JSON object with "type" and "reasoning" fields.

Categories:
- "vector_search": Questions about concepts, policies, procedures,
  explanations, or how-to guides. The answer is in a document.
- "sql_query": Questions requiring computation, aggregation, counting,
  filtering, or comparison across structured records. The answer
  needs to be calculated from data.
- "graph_query": Questions about relationships between entities,
  organizational hierarchies, dependencies, or connections.
  The answer requires traversing relationships.
- "hybrid": Questions that need multiple sources. E.g., "Summarize
  our refund policy and show how many refunds we processed last
  quarter" needs both a document and a database query.

Question: {question}

JSON:"""

        response = self.client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.0,
            max_tokens=200
        )
        result = json.loads(
            response.choices[0].message.content.strip()
        )
        return QueryType(result["type"]), result["reasoning"]

    def retrieve(self, question):
        """Route the question and retrieve from the right system."""
        query_type, reasoning = self.classify(question)
        print(f"Classified as: {query_type.value}")
        print(f"Reasoning: {reasoning}\n")

        if query_type == QueryType.VECTOR_SEARCH:
            return self._vector_retrieve(question)

        elif query_type == QueryType.SQL_QUERY:
            return self.sql.ask(question)

        elif query_type == QueryType.GRAPH_QUERY:
            return self.graph.ask(question)

        elif query_type == QueryType.HYBRID:
            return self._hybrid_retrieve(question)

    def _vector_retrieve(self, question):
        """Standard RAG: vector search + LLM generation."""
        chunks = self.vector.search(question, top_k=5)
        context = "\n\n".join(
            [chunk["text"] for chunk in chunks]
        )

        response = self.client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system",
                 "content": "Answer based on the provided context."},
                {"role": "user",
                 "content": f"Context:\n{context}\n\nQuestion: {question}"}
            ]
        )
        return {
            "type": "vector_search",
            "chunks": chunks,
            "answer": response.choices[0].message.content
        }

    def _hybrid_retrieve(self, question):
        """Combine results from multiple retrieval systems."""
        # Run all retrievers in parallel (simplified here)
        vector_results = self._vector_retrieve(question)
        sql_results = self.sql.ask(question)
        graph_results = self.graph.ask(question)

        # Merge all context for final synthesis
        combined_context = f"""Document search results:
{vector_results['answer']}

Database query results:
{sql_results['answer']}

Knowledge graph results:
{graph_results['answer']}"""

        response = self.client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system",
                 "content": "Synthesize a complete answer from all sources."},
                {"role": "user",
                 "content": f"Sources:\n{combined_context}\n\nQuestion: {question}"}
            ]
        )
        return {
            "type": "hybrid",
            "sources": {
                "vector": vector_results,
                "sql": sql_results,
                "graph": graph_results
            },
            "answer": response.choices[0].message.content
        }

The router pattern is powerful because it is modular. Each retrieval system operates independently, with its own data sources, query languages, and optimization strategies. The router's only job is classification, which is a simpler problem than building a single system that handles all query types natively.

A Decision Framework

When should you use which approach? The decision depends on the nature of the question and the available data sources.

QUESTION SHAPE Vector SQL Graph SHAPE 1 Conceptual "What is our retention policy?" USE no no SHAPE 2 Aggregation "Average order value by region?" no USE no SHAPE 3 Precise lookup "Price of SKU-7821?" no USE no SHAPE 4 Relationship traversal "Indirect reports of the CTO?" no no USE SHAPE 5 Multi-source "Summarize policy and show trends" USE USE MAYBE if entities
Question shape determines the retriever.

This matrix is a starting point, not a rulebook. Real queries are often ambiguous. "Tell me about our top customers" could mean "describe our customer success stories" (vector search) or "rank customers by revenue" (SQL). The router's classification will sometimes be wrong, and the system needs graceful fallbacks. Logging the router's decisions and the user's satisfaction with the results is essential for iterative improvement.

A Real-World Example: Enterprise Knowledge System

Consider a mid-size technology company building an internal assistant for its sales team. The company has three data sources:

A sales rep asks: "I have a meeting with Globex Corp next week. Give me a briefing."

The router classifies this as a hybrid query. The system fires three retrievals in parallel. Vector search pulls the most recent case study involving Globex and any competitive analysis relevant to their industry. The SQL engine queries the CRM: current opportunity value, last five interactions, deal stage, and historical purchase volume. The graph engine finds: who at the company has prior relationships with Globex contacts, what products Globex currently uses, and what competitor products they have evaluated.

The LLM synthesizes all three sources into a briefing document. The rep walks into the meeting knowing the account history, the relationship map, the relevant case studies, and the competitive landscape. No single retrieval method could have produced that briefing. Together, they create something genuinely useful.

Practical Considerations

Building a multi-modal retrieval system is architecturally elegant. Maintaining one is where the real work begins.

Schema Drift

Database schemas change. New columns are added, old ones are renamed, tables are split or merged. When the schema changes but the schema description in your Text-to-SQL prompt does not, the LLM will generate queries against a schema that no longer exists. The queries will fail, or worse, they will succeed by matching a column with the old name that now means something different.

The solution is automation. Extract the schema description from the database directly at startup or on a schedule, rather than maintaining it as a static string. The TextToCypher example above does this for Neo4j. The same principle applies to SQL schemas. A stale schema description is a silent bug that will erode trust in the system over time.

Consistency Across Sources

When your system queries multiple data sources, the results may contradict each other. The document says the product supports 10,000 concurrent users. The database shows the actual tested maximum was 7,500. The knowledge graph indicates the performance limit depends on which deployment configuration the customer uses.

These contradictions are not bugs in the retrieval system. They reflect genuine inconsistencies in the underlying data. The system should surface these contradictions rather than silently choosing one source over another. Including provenance information ("According to the product spec..." vs. "Based on load testing data...") lets the user make an informed judgment.30

Cost and Complexity

Each retrieval system adds operational cost. Vector search requires an embedding model and a vector database. Text-to-SQL requires a maintained database connection and schema descriptions. Knowledge graphs require entity extraction, graph storage, and ongoing curation. The router adds another LLM call per query.

Start with the retrieval method that covers the most common query type in your use case. If 80% of questions are answered by document search, start with vector RAG and add structured data retrieval only when users consistently ask questions that documents cannot answer. The query logs will tell you when that threshold has been crossed. Premature architecture is as wasteful in retrieval systems as it is anywhere else in software engineering.

Evaluation Across Modalities

Evaluating a multi-modal retrieval system is harder than evaluating a single-mode one. You need to test not only whether each retrieval system returns correct results, but also whether the router classifies queries correctly. A wrong classification sends the query to a system that cannot answer it, regardless of how well that system performs on its own domain.

Build evaluation sets for each component separately: vector search accuracy, Text-to-SQL correctness, graph query accuracy, and router classification accuracy. Then build an end-to-end evaluation set that tests the full pipeline from question to answer.31 The end-to-end set catches integration failures that component-level tests miss, like correct routing followed by incorrect query generation, or correct results from multiple sources synthesized into a contradictory answer.

Where This Is Heading

The boundaries between structured and unstructured retrieval are blurring. Recent work on table-augmented generation (TAGe) explores how LLMs can reason over both textual and tabular data simultaneously, without the explicit routing step.4 Multi-modal embedding models that can encode tables, text, and graph structures into a shared vector space are an active research area.

Tool-using agents represent another convergence point. Instead of a static router that classifies queries up front, an agent can iteratively decide which data source to query, examine the results, and decide whether additional queries are needed.32 This mirrors how a human analyst works: start with a document search, notice a gap, query the database, spot an inconsistency, check the knowledge graph. The agentic approach is more flexible but also harder to control and evaluate.

For now, the practical architecture remains straightforward. Classify the query, route it to the right retrieval system, execute the appropriate query language, and synthesize the results. Each component is well-understood, independently testable, and incrementally deployable. The elegance is not in the individual pieces but in their composition, and in the recognition that not every question is a text retrieval problem.

. . .

References

Textbook grounding, chapter-level citations, and further reading for each numbered reference in this article live on the companion sources page.

  1. Rajkumar, N., Li, R., & Bahdanau, D. (2022). "Evaluating the Text-to-SQL Capabilities of Large Language Models." arXiv:2204.00498.
  2. Yu, T., Zhang, R., Yang, K., Yasunaga, M., Wang, D., Li, Z., ... & Radev, D. (2018). "Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task." EMNLP 2018.
  3. Pan, S., Luo, L., Wang, Y., Chen, C., Wang, J., & Wu, X. (2024). "Unifying Large Language Models and Knowledge Graphs: A Roadmap." IEEE Transactions on Knowledge and Data Engineering.
  4. Li, Z., Zhang, W., Zhang, C., & Song, D. (2024). "TAGe: Table-Augmented Generation." arXiv:2408.14717.
  5. Woods, W. A. (1973). "Progress in Natural Language Understanding: An Application to Lunar Geology." AFIPS Conference Proceedings, Vol. 42.
  6. Warren, D. H. D. & Pereira, F. C. N. (1982). "An Efficient Easily Adaptable System for Interpreting Natural Language Queries." Computational Linguistics, 8(3-4).
  7. Hogan, A., Blomqvist, E., Cochez, M., d'Amato, C., De Melo, G., Gutierrez, C., ... & Zimmermann, A. (2021). "Knowledge Graphs." ACM Computing Surveys, 54(4).
  8. Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." NeurIPS 2020.
ML 101