DSPy Part 3: AI Librarian – DSPy + Databases = Hallucination Killer
Or: How to Turn a "Know-It-All" Into a "Know-It-From-Books"
Photo by Hieu Vu Minh on Unsplash
The Problem: AI’s Greatest Flaw (It Thinks It’s Aristotle)
Our fact-checking intern, Chatty, has a fatal flaw: it never checks the bookshelf. Ask it “Did Napoleon own a pet kangaroo?” and it’ll confidently answer:
“No, kangaroos are native to Australia, not Europe. Napoleon preferred poodles.”
Spoiler: Napoleon did have a pet kangaroo. (True story. Look it up.)
Why this happens:
Language models are overconfident guessers. They answer from memory, not research.
Without retrieval (looking things up), they’re like students who skip the library and bluff essays.
The Fix: Give Chatty a Library Card
To stop hallucinations, we need to teach Chatty to retrieve facts before answering—like a librarian who cross-references books.
Manual Approach: Beg the model to “search the database first.”
Result: Chatty writes a poem about databases and ignores them.
DSPy Approach: Program retrieval into the system. No begging.
Step 1: Build the AI Librarian
Let’s upgrade our FactCheck
module with a retrieval step. Think of it as forcing Chatty to visit the library before answering.
class FactCheckWithRetrieval(dspy.Module):
def __init__(self):
super().__init__()
# Step 1: Retrieve 3 relevant facts (like grabbing 3 books)
self.retrieve = dspy.Retrieve(k=3)
# Step 2: Chain of Thought: "Hmm, let me check the books..."
self.generate_answer = dspy.ChainOfThought("claim, context -> is_correct, explanation")
def forward(self, claim):
# Fetch context (books)
context = self.retrieve(claim).passages # "Go find 3 relevant facts!"
# Generate answer using context
return self.generate_answer(claim=claim, context=context)
What’s happening:
dspy.Retrieve(k=3)
: “Chatty, fetch 3 relevant passages for this claim.”- Works with any database (Wikipedia, your internal docs, etc.).
dspy.ChainOfThought
: Forces the model to reason using the context.- “First, I’ll check the retrieved facts. Then, I’ll conclude…”
Step 2: Train the Librarian
Let’s compile the module with examples. DSPy will learn when to retrieve and how to use the context.
trainset = [
dspy.Example(
claim="Napoleon had a pet kangaroo",
context=["Napoleon's menagerie included a kangaroo gifted by Australia in 1803."],
is_correct=True,
explanation="Historical records confirm the kangaroo was part of his collection."
),
# More examples...
]
teleprompter = dspy.teleprompt.BootstrapFewShot()
compiled_factcheck = teleprompter.compile(FactCheckWithRetrieval(), trainset=trainset)
What DSPy does:
Learns retrieval triggers: “If the claim involves historical facts, retrieve from the history database.”
Optimizes reasoning: “After retrieving, compare the claim to the context before answering.”
Step 3: Test the Librarian
Let’s ask about Napoleon’s kangaroo:
response = compiled_factcheck(claim="Napoleon had a pet kangaroo")
print(response.explanation)
Output:
“True. Napoleon received a kangaroo as a gift from Australia in 1803, documented in historical records.”
No more bluffing! Chatty now:
Retrieves facts.
Reasons over them.
Answers grounded in evidence.
Why This Beats Manual Prompt Engineering
Manual Approach:
prompt = """
Verify this claim: {claim}.
Steps:
1. Search the database for relevant facts.
2. Compare the claim to the facts.
3. If unsure, say 'Unknown'.
"""
# Chatty’s response: "Step 1: I love databases! Let me write a song about them instead."
DSPy Approach:
Retrieval is programmatic: No need to beg—the
Retrieve
module forces the model to fetch data.Reasoning is structured:
ChainOfThought
breaks the task into steps, like a teacher guiding a student.
The Bigger Picture: DSPy Is a LEGO Master
DSPy modules (Predict
, Retrieve
, ChainOfThought
) snap together like LEGO bricks. Want to add a fact-checking step? Just plug in another module:
self.validate_sources = dspy.Predict("context -> is_reliable")
No more fragile pipelines. DSPy compiles the entire workflow into a robust system.
What’s Next?
Our librarian is now reliable… until it meets ambiguous claims (“Some say Earth is flat”) or trolls (“Bananas are a government spy tool”).
In Part 4, we’ll arm Chatty with assertions and safeguards to handle chaos. Sneak peek:
dspy.Assert(
len(context) > 0,
"Don’t answer if you have no sources!"
)
TL;DR: DSPy doesn’t just ask models to be truthful—it programs them to be. Retrieval + reasoning = hallucinations evicted.
Stay tuned for Part 4, where we’ll teach Chatty to say “I don’t know” (and shut down conspiracy theories).
Homework: Try the code above with a claim like “Einstein invented the internet.” Watch DSPy retrieve facts about Einstein (no, he didn’t) and Tim Berners-Lee (sort of). Magic? No—just good engineering.