RAG

The Simplest RAG Stack That Actually Works (Complete Guide)

A comprehensive tutorial on building a hybrid RAG agent in Python using MongoDB, Pydantic AI, and Docling that combines semantic and keyword search capabilities.

Published Dec 10, 2025 by Cole Medin

Key Insights

  • Hybrid RAG combines both semantic and keyword search to get the best of both worlds: semantic search for understanding concepts and keyword search for finding exact terms.

  • The tech stack uses MongoDB as both a database and vector database, Pydantic AI as the agent framework, and Docling for file processing and hybrid chunking.

  • Keyword search guarantees finding exact terms with high accuracy but may miss conceptual relationships, while semantic search excels at finding related concepts but might miss specific terms.

  • MongoDB offers specialized features for hybrid search, including Reciprocal Rank Fusion (RRF) in preview, which helps combine and rank results from different search types.

  • The hybrid search pipeline processes queries through both semantic and keyword search pipelines, then uses RRF to merge results based on normalized ranking positions rather than raw scores.

  • Hybrid search is a form of agentic RAG, giving the AI agent flexibility to choose the appropriate search strategy for different types of queries.

  • The entire hybrid search process is fast and efficient despite running two separate search pipelines and merging results.

0:00

Introducing Hybrid RAG

“Today I'm going to show you how to implement a hybrid search RAG agent. This is one of my favorite RAG strategies. It's really powerful and I'm going to break it down for you very simply.”

Cole Medin introduces the concept of hybrid search RAG (Retrieval Augmented Generation), explaining that it combines semantic search capabilities with keyword search. This approach gives AI agents the ability to both understand the relationship between concepts semantically and accurately find specific information using keywords.

The video sets up the premise that hybrid search is a powerful, yet simple strategy that works well across various use cases. Cole mentions that as he's evolved his own RAG strategies, he's found that simplifying and focusing on approaches like hybrid search yields consistently good results regardless of the specific application.

Takeaways

  • Hybrid RAG combines semantic search (understanding concepts) with keyword search (finding exact information)

  • This combined approach provides the best of both worlds while still maintaining speed

  • The strategy is effective and consistently works well across different use cases

  • Cole has simplified his RAG approaches over time to focus on strategies that consistently perform well

0:52

The Complete Agent Template for the Video

“I also built a complete AI agent for this that demonstrates hybrid search. So, I'll use this to explain all the concepts and then this is also a template that you can feel free to use for yourself.”

Cole introduces the AI agent template he's created specifically to demonstrate hybrid search. This template serves dual purposes - it helps explain the concepts covered in the video while also being available as a starting point for viewers to implement their own hybrid search systems.

The template is designed to showcase different types of queries that can be processed using the hybrid approach. Cole mentions that while the architecture might seem complex, he'll break it down in a simple and understandable way, making it accessible for viewers to understand how the agent handles large amounts of data efficiently.

Takeaways

  • A complete, ready-to-use AI agent template is available for viewers to adapt for their own projects

  • The template demonstrates various query types to show hybrid search capabilities in action

  • Though the architecture may appear complex, it will be explained simply throughout the video

  • The template is designed to efficiently handle large amounts of data

1:44

Our Tech Stack - MongoDB + Pydantic AI + Docling

“The first big decision we have to make for any RAG agent is what is our database going to be? MongoDB is a platform that I have never covered on my channel, but I've used it a lot in the past and there are some components built into it that will specifically help us with hybrid search.”

This section details the three key components of the tech stack for the hybrid RAG agent. MongoDB serves as both the primary database and vector database, offering specific features that facilitate hybrid search capabilities. Cole notes that MongoDB's NoSQL structure allows for efficient storage of document records and chunks with embeddings, while providing fast text and semantic search functionality.

Pydantic AI is highlighted as Cole's favorite agent framework, praised for its continuous updates, improving documentation, and growing integrations. The third component, Docling, handles file processing in the RAG pipeline, extracting text from various document formats (PDFs, Word documents, markdown, audio) and implementing hybrid chunking to properly segment larger documents for storage in the vector database.

Takeaways

  • MongoDB serves as both the NoSQL database and vector database, with built-in features that specifically support hybrid search

  • Pydantic AI provides a flexible and continuously improving framework for building the agent

  • Docling handles text extraction from multiple file formats and implements hybrid chunking for optimal document segmentation

  • The three components work together to create an efficient and powerful RAG pipeline

  • Hybrid chunking ensures that document segments have clean starts and ends, making retrieval more effective

4:45

Pros and Cons of Semantic and Keyword Search

“With keyword search, the big benefit here is pretty obvious. We're able to find exact terms with very high accuracy because if I search for a certain term and that exact word or phrase appears in my knowledge base, I am guaranteed to find that chunk.”

Cole explains the fundamental differences between keyword and semantic search approaches. Keyword search excels at finding exact terms with guaranteed accuracy when those specific words exist in the database. This makes it ideal for locating precise information like character names, legal statutes, or specific terminology.

Semantic search, on the other hand, operates as a conceptual search using embedding models to find related ideas and concepts, even when exact terminology differs. However, it's not guaranteed to find exact terms. The hybrid approach combines both strategies to overcome their respective limitations - finding both concepts and exact terms for more comprehensive results.

Takeaways

  • Keyword search guarantees finding exact terms but may miss conceptual relationships and synonyms

  • Semantic search excels at finding concepts and related ideas but isn't guaranteed to locate exact terminology

  • Combining both approaches allows the agent to find both concepts and exact terms simultaneously

  • The hybrid strategy requires a method to merge results from both search types, which MongoDB facilitates

6:41

Live Demo of Our Hybrid RAG AI Agent

“Let's go ahead and ask our first question. What is the revenue breakdown by service line? And you can see that it uses the search knowledgebase tool. The agent defines a query and then for the search type it is specifying hybrid because we are combining keyword and semantic search for this single tool call.”

Cole demonstrates the hybrid RAG agent in action through a series of practical examples. The agent is shown answering queries like "What is the revenue breakdown by service line?" and "What is the Neuroflow revenue for 2025?" by using the hybrid search approach. For each query, the agent defines a search strategy and uses the hybrid method to combine results from both keyword and semantic searches.

The demo also highlights how different types of queries benefit from different search approaches. For example, when asked about a "timeline for Converse Pro launch prep," the semantic search component was crucial because the term "timeline" wasn't explicitly mentioned in the document. Instead, the agent needed to conceptually understand that launch plans corresponded to a timeline. Cole also explains how the agent can cite sources based on metadata retrieved from the original documents.

Takeaways

  • The agent successfully handles various query types by combining both search approaches

  • The search knowledge base tool can specify which search type to use: semantic, keyword, or hybrid

  • Temporal queries (like finding data from specific years) often benefit more from keyword search

  • Conceptual queries where exact terminology differs benefit more from semantic search

  • The agent can cite sources by retrieving metadata from the original documents

10:43

Hybrid RAG is a Form of Agentic RAG

“I would actually consider hybrid search a form of agentic RAG. It's just generally the idea of you give your agent the ability to choose how it explores your knowledge base.”

Cole positions hybrid RAG as a subset of agentic RAG, explaining that it represents an approach where the AI agent has flexibility in how it explores the knowledge base. In this implementation, the agent can decide between using keyword search, semantic search, or combining both approaches based on the nature of the query.

While the demo system prompt instructs the agent to always use hybrid search for demonstration purposes, Cole notes that in real-world applications, the agent could be given the autonomy to choose the most appropriate search strategy for specific types of queries. This flexibility could optimize for speed or token efficiency in cases where one search method is clearly superior for certain query types.

Takeaways

  • Hybrid search is considered a form of agentic RAG, giving the agent choices in how it retrieves information

  • The agent can be configured to select between semantic, keyword, or hybrid search strategies

  • For demonstration purposes, the system prompt instructs the agent to always use hybrid search

  • In practical applications, allowing the agent to choose the search strategy can optimize for speed and efficiency

11:51

When to Use Semantic vs. Keyword Search

“When does vector search or classic rank search do well? Well, it's when we want to connect concepts together. Like, if we search for king, we will find records that mention queens as well.”

Cole provides detailed examples of when each search strategy excels. Semantic (vector) search shines when connecting related concepts: searching for "king" finds mentions of "queens," "Han Solo" retrieves information about "Chewbacca," "Berlin" connects to "Germany," and "microservices" links to "architecture." It can even find conceptual opposites, such as searching for "slow PC" and finding articles about making PCs run quickly.

Keyword search demonstrates superiority when dealing with highly specific information: finding exact error codes (like "409 error"), specific products in databases, stock symbols ("AAPL" finding the stock not the fruit), legal statutes, or precise geographic information. Cole also highlights that keyword search includes fuzzy matching capabilities to accommodate typos and spelling variations, making it more flexible than pure exact matching.

Takeaways

  • Semantic search excels at finding conceptually related information, even when terminology differs

  • Semantic search can connect opposites (e.g., "slow PC" finding information about making PCs faster)

  • Keyword search is superior for finding specific error codes, product names, stock symbols, and legal references

  • Fuzzy matching in keyword search allows for flexibility with typos and spelling variations

  • Understanding these strengths helps determine when to use each approach or combine them

14:42

Deep Dive: How Hybrid RAG Works with MongoDB

“I want to dive a little bit deeper into the pipeline with you. So we're going to cover the pipeline specifically for the semantic search and then the keyword search pipeline as well because it is very similar.”

Cole takes a technical deep dive into how the hybrid search system works with MongoDB, focusing on the pipeline architecture. The semantic search pipeline consists of four main stages: first creating vector representations of queries to find similar chunks, then joining with the documents collection to associate chunks with their source documents, followed by an "unwind" operation to transform array data into objects, and finally extracting similarity scores which will be used for merging results.

The explanation includes a look at the actual code implementation showing how the agent defines pipelines for both semantic and text searches separately, then executes them on the MongoDB database. Cole emphasizes that this architecture allows not just for retrieval of relevant information but also for transforming it into structures optimized for the AI agent's use, including metadata that enables the agent to cite sources accurately.

Takeaways

  • The MongoDB pipeline for semantic search has four stages: initial lookup, joining with documents collection, data transformation, and score extraction

  • The keyword search pipeline follows a similar structure but uses fuzzy search instead of vector similarity

  • MongoDB allows for both data retrieval and transformation into structures optimized for the AI agent

  • The pipeline retrieves metadata from source documents, enabling the agent to accurately cite sources

  • The implementation keeps semantic and keyword search pipelines separate before merging results

20:40

Understanding Reciprocal Rank Fusion

“The reason that we need an algorithm in the first place is because the similarity scores from our two pipelines have a completely different scale. Your similarity score for a vector search is going to be between zero and one. But for a text search or our keyword search, it's going to be something different like 15 or 13 or 11.”

This section explores the critical challenge of merging results from different search types and introduces Reciprocal Rank Fusion (RRF) as the solution. Cole explains that semantic search produces similarity scores between 0 and 1, while keyword search generates entirely different scales (like 15, 13, or 11), making direct comparison impossible. RRF addresses this by using rank positions instead of raw scores to normalize results across different search methods.

Cole mentions that MongoDB is working on integrating RRF directly into their platform as a feature called $rankFusion, currently in preview. While this native implementation doesn't work with the free tier of MongoDB (which is why Cole implemented the algorithm manually for the demonstration), its development highlights MongoDB's commitment to supporting hybrid search capabilities. The RRF implementation allows for properly merging and ranking results from both search pipelines to create a unified, optimally ordered set of chunks to present to the agent.

Takeaways

  • Reciprocal Rank Fusion (RRF) solves the problem of comparing results across different scoring scales

  • RRF uses rank positions rather than raw scores to normalize results from different search methods

  • MongoDB is developing a native $rankFusion feature to integrate this capability directly into the platform

  • The algorithm creates a third, normalized score that properly ranks chunks from both semantic and keyword searches

  • This approach ensures the most relevant chunks are provided to the agent regardless of which search method found them

22:49

Final Overview of the RAG Flow (it's Fast)

“That is our complete hybrid search flow. We have the user query. The agent is going to send some query that it defines based on this into both of the pipelines. And then we use RRF to combine by rank and then send those to our agent to give us the final response.”

In this section, Cole summarizes the complete hybrid search flow and emphasizes its impressive speed despite its complexity. The process begins with a user query, which the agent interprets and formulates into appropriate search queries for both the semantic and keyword search pipelines. The results from both pipelines are then combined using Reciprocal Rank Fusion (RRF), with the most relevant chunks sent to the agent to generate the final response.

To demonstrate the system's efficiency, Cole performs a live timing test, showing that the entire process from query to response takes only a few seconds. He points out that the actual database query, including the merging operation, takes less than a second - the majority of the processing time is spent on the agent's reasoning rather than the search operation itself. This highlights how practical and efficient the hybrid search approach is, even with its sophisticated dual-pipeline architecture.

Takeaways

  • The complete hybrid search flow processes user queries through both search pipelines, then combines results using RRF

  • Despite the complexity of running dual search pipelines and merging results, the entire process is very fast

  • The actual database query including merging takes less than a second

  • Most of the processing time is spent on the agent's reasoning rather than the search operations

  • The speed and efficiency make hybrid search practical for real-world applications

23:44

Outro

“That is a wrap for our hybrid search agent and please use this as a template to get started or just take the concepts here if you want to apply it to a different tech stack.”

In the final section, Cole concludes the video by encouraging viewers to use his template as a starting point or to adapt the concepts to their preferred tech stack. He reiterates his appreciation for the tech stack components used in the demonstration - MongoDB, Pydantic AI, and Dockling - and mentions that viewers can expect more content on these technologies in the future.

Cole acknowledges MongoDB's collaboration on the video, expressing his appreciation for working with teams behind products he genuinely uses. He closes by asking viewers who found value in the content to like and subscribe for more videos on building AI agents and leveraging AI coding assistants.

Takeaways

  • The template is available for viewers to use as a starting point for their own hybrid RAG implementations

  • The concepts can be applied to different tech stacks beyond the specific tools demonstrated

  • More content on MongoDB, Pydantic AI, and Dockling will be coming in future videos

  • The video was created in collaboration with MongoDB, whose team helped ensure accurate representation of their technology

Conclusion

Hybrid RAG represents a significant advancement in how AI agents can interact with knowledge bases by combining the strengths of semantic and keyword search approaches. This combination allows for unprecedented flexibility and accuracy, enabling agents to find both conceptual relationships and exact information within the same system. The approach elegantly solves longstanding limitations of traditional RAG implementations, where semantic search might miss specific terms and keyword search might miss conceptual connections.

The implementation demonstrated in this video, using MongoDB, Pydantic AI, and Docling, provides a practical, efficient, and surprisingly fast solution that can be adapted to various use cases. The Reciprocal Rank Fusion algorithm addresses the core challenge of merging results from different search methodologies with different scoring systems, ensuring that the most relevant information is consistently surfaced regardless of which search method found it.

So what? For developers building AI systems that need to interact with knowledge bases, hybrid RAG offers a clear path to more robust and capable applications. Rather than choosing between semantic understanding and precise information retrieval, this approach delivers both in a single, efficient system. By adopting this strategy – either using the provided template or adapting the concepts to your preferred tech stack – you can significantly improve the accuracy, comprehensiveness, and usefulness of your RAG-powered AI agents across a wide range of applications.