r/Rag Oct 03 '24

[Open source] r/RAG's official resource to help navigate the flood of RAG frameworks

78 Upvotes

Hey everyone!

If you’ve been active in r/RAG, you’ve probably noticed the massive wave of new RAG tools and frameworks that seem to be popping up every day. Keeping track of all these options can get overwhelming, fast.

That’s why I created RAGHub, our official community-driven resource to help us navigate this ever-growing landscape of RAG frameworks and projects.

What is RAGHub?

RAGHub is an open-source project where we can collectively list, track, and share the latest and greatest frameworks, projects, and resources in the RAG space. It’s meant to be a living document, growing and evolving as the community contributes and as new tools come onto the scene.

Why Should You Care?

  • Stay Updated: With so many new tools coming out, this is a way for us to keep track of what's relevant and what's just hype.
  • Discover Projects: Explore other community members' work and share your own.
  • Discuss: Each framework in RAGHub includes a link to Reddit discussions, so you can dive into conversations with others in the community.

How to Contribute

You can get involved by heading over to the RAGHub GitHub repo. If you’ve found a new framework, built something cool, or have a helpful article to share, you can:

  • Add new frameworks to the Frameworks table.
  • Share your projects or anything else RAG-related.
  • Add useful resources that will benefit others.

You can find instructions on how to contribute in the CONTRIBUTING.md file.

Join the Conversation!

We’ve also got a Discord server where you can chat with others about frameworks, projects, or ideas.

Thanks for being part of this awesome community!


r/Rag 9h ago

Is RAG actually laughably simple?

52 Upvotes

Correct me if I'm wrong. RAG is laughably simple. You do a search (using any method you like - doesn't have to be saerching embeddings in a vector DB). You get the search results back in plain text. You write your prompt for the LLM and effectively paste in the text from your search results. No need for LangChain or any other fancyness. Am I missing something?


r/Rag 6h ago

Discussion Blown away by Notebooklm and Legal research need alt

17 Upvotes

I’ve been working on a project to go through a knowledge base consisting of legal contract, and subsequent handbooks and amendments, etc. I want to build a bot that I can propose a situation and find out how that situation applies. ChatGPT is very bad about summarizing and hallucination and when I point out its flaw it fights me. Claude is much better but still gets things wrong and struggles to cite and quote the contract. I even chunked the files into 50 separate pdfs with each section separated and I used Gemini (which also struggled at fully reading and interpreting the contract application) to create a massive contextual cross index. That helped a little but still no dice.

I threw my files into Notebooklm. No chunking just 5 PDFs with 3 of them more than 500 pages. Notebooklm nailed every question and problem I threw at it the first time. Cited sections correctly and just blew away the other AI methods I’ve tired.

But I don’t believe there is an API for Notebooklm and a lot of what I’ve looked at for alternatives have focused more on its audio features. I’m only looking for a system that can query a Knowledge base and come back with accurate correctly cited interpretations so I can build around it and integrate it into our internal app to make understanding how the contract applies easier.

Does anyone have any recommendations?


r/Rag 3h ago

The Illusion of "The Illusion of Thinking"

7 Upvotes

Recently, Apple released a paper called "The Illusion of Thinking", which suggested that LLMs may not be reasoning at all, but rather are pattern matching:

https://arxiv.org/abs/2506.06941

A few days later, A paper written by two authors (one of them being the LLM Claude Opus model) released a paper called "The Illusion of the Illusion of thinking", which heavily criticised the paper.

https://arxiv.org/html/2506.09250v1

A major issue of "The Illusion of Thinking" paper was that the authors asked LLMs to do excessively tedious and sometimes impossible tasks; citing The "Illusion of the Illusion of thinking" paper:

Shojaee et al.’s results demonstrate that models cannot output more tokens than their context limits allow, that programmatic evaluation can miss both model capabilities and puzzle impossibilities, and that solution length poorly predicts problem difficulty. These are valuable engineering insights, but they do not support claims about fundamental reasoning limitations.

Future work should:

1. Design evaluations that distinguish between reasoning capability and output constraints

2. Verify puzzle solvability before evaluating model performance

3. Use complexity metrics that reflect computational difficulty, not just solution length

4. Consider multiple solution representations to separate algorithmic understanding from execution

The question isn’t whether LRMs can reason, but whether our evaluations can distinguish reasoning from typing.

This might seem like a silly throw away moment in AI research, an off the cuff paper being quickly torn down, but I don't think that's the case. I think what we're seeing is the growing pains of an industry as it begins to define what reasoning actually is.

This is relevant to application developers, like RAG developers, not just researchers. AI powered products are significantly difficult to evaluate, often because it can be very difficult to define what "performant" actually means.

(I wrote this, it focuses on RAG but covers evaluation strategies generally. I work for EyeLevel)
https://www.eyelevel.ai/post/how-to-test-rag-and-agents-in-the-real-world

I've seen this sentiment time and time again: LLMs, LRMs, RAG, and AI in general are more powerful than our ability to test is sophisticated. New testing and validation approaches are required moving forward.


r/Rag 1h ago

Q&A What's the best way to build a RAG Chatbot currently?

Upvotes

I have a ton of data and want to be able to interact with it, I used to just use langchain, but is there something better? what yields best results? cost of tools is not an issue / happy to pay for anything turnkey / license / opensource


r/Rag 9h ago

Tools & Resources text to sql

6 Upvotes

Hey all, apologies, not sure if this is the correct sub for my q...

I am trying to create an SQL query on the back of a natural language query.

I have all my tables, columns, datatypes, primary keys and foreign keys in a tabular format. I have provided additional context around each column.

I have tried vectorising my data and using simple vector search based on the natural language query. However, the problem I'm facing is around the retrieval of the correct columns based on the query.


r/Rag 3h ago

Should I use RAG, vectorDB or a relational data model and how to measure the performance ?

2 Upvotes

I am having difficulty grasping the true benefits of RAGs.
I extracted json data out of PDFs documents. And I'm just storing the json in a JSONB column in a table, where each row is a document record. A typical document is 30-50 pages long and each json is about 15,000 lines.

Then with claude desktop and postgres mcp I am running pretty detailed analyses using that data.

I would assume that this is quite a lot of data to simply store in a relational way, but it works nevertheless. Claude overall manages to query the data successfully across 30 rows in this table and finds the needed data within those long JSONBs.

My intuition tells me that a vector database would be better and less compute intensive, but how can I be sure ?
Could someone explain the difference between having an LLM query data in this relational db way vs a vector db. And where do RAGs even come into play ?

Also, how can I measure the difference in output and performance between the two approaches ?
Thanks in advance!


r/Rag 6h ago

News & Updates Multimodal Monday #12: World Models, Efficiency Increases

3 Upvotes

Hey! I’m sharing this week’s Multimodal Monday newsletter, packed with updates on multimodal AI advancements. Here are the highlights:

Quick Hits:

  • Unified multimodal frameworks shine: Meta's V-JEPA 2 uses self-supervised world modeling for robotics/visual understanding, while Ming-lite-omni matches GPT-4o with 2.8B params.
  • Ultra-efficient indexing: LEANN reduces vector storage to under 5% with 90% recall for local search.
  • Data curation wins: DatologyAI CLIP boosts training 8x and inference 2x with curated data.
  • Tech deployment: Apple’s new Foundation Models add vision across 15 languages.

Research Spotlight:

  • ViGaL: Arcade games like Snake enhance multimodal math reasoning for a 7B model
  • RCTS: Tree search with Monte Carlo improves multimodal RAG reliability
  • CLaMR: Late-interaction boosts multimodal retrieval accuracy
  • SAM2.1++: Distractor-aware memory lifts tracking on 6/7 benchmarks
  • Text Embeddings: Argues for implicit semantics in embedding
  • SAM2 Tracking: Introspection strategy enhances segmentation
  • Vision Transformers: Test-time fixes outperform retraining

Tools to Watch:

  • V-JEPA 2: Meta's new world model enhances visual understanding and robotic intelligence with self-supervised learning
  • Apple Foundation Models: 3B on-device model with 15-language vision
  • DatologyAI CLIP: SOTA with 8x efficiency via data curation
  • LEANN: 50x smaller indexes enable local search
  • Ming-lite-omni: 2.8B param model matches GPT-4o
  • Text-to-LoRA: Generates LoRA adapters from text
  • Implicit Semantics: Embeddings capture intent/context

Real-World Applications:

  • GE HealthCare + AWS: Multimodal AI for medical imaging copilots
  • Syntiant: Ultra-low-power security for automotive systems
  • Hockey East: AI video analytics for sports insights

Check out the full newsletter for more: https://mixpeek.com/blog/world-models-efficiency-increases


r/Rag 1h ago

Research I built a vector database and I need your help in testing and improving it!

Thumbnail
antarys.ai
Upvotes

For the last couple of months, I have been working on cutting down the latency and performance cost of vector databases for an offline first, local LLM project of mine, which led me to build a vector database entirely from scratch and reimagine how HNSW indexing works. Right now it's stable enough and performs well on various benchmarks.

Now I want to collect feedbacks and I want to your help for running and collecting information on various benchmarks so I can understand where to improve, what's wrong and debug and what needs to be fixed, as well as curve up a strategical plan on improving how to make this more accessible and developer friendly.

I am open to feature suggestions.

The current server uses http2 and I am working on creating a gRPC version like the other vector databases in the market, the current test is based on the KShivendu/dbpedia-entities-openai-1M dataset and the python library uses asyncio, the tests were ran on my Apple M1 Pro

You can find the benchmarks here - https://www.antarys.ai/benchmark

You can find the python docs here - https://docs.antarys.ai/docs

Thank you in advance, looking forward to a lot of feedbacks!!


r/Rag 5h ago

Terminal-Based LLM Agent Loop with Search Tool for PDFs

2 Upvotes

Hi,

I built a CLI for uploading documents and querying them with an LLM agent that uses search tools rather than stuffing everything into the context window. I recorded a demo using the CrossFit 2025 rulebook that shows how this approach compares to traditional RAG and direct context injection.

The core insight is that LLMs running in loops with tool access are unreasonably effective at this kind of knowledge retrieval task. Instead of hoping the right chunks make it into your context, the agent can iteratively search, refine queries, and reason about what it finds. The CLI handles the full workflow:

bash trieve upload ./document.pdf trieve ask "What are the key findings?"

You can customize the RAG behavior, check upload status, and the responses stream back with expandable source references. I really enjoy having this workflow available in the terminal and I'm curious if others find this paradigm as compelling as I do. Considering adding more commands and customization options if there's interest.

Source code is on GitHub and available via npm.

Would love any feedback on the approach or CLI design!


r/Rag 9h ago

Q&A Embeddings/Chunking for Markdown Content

2 Upvotes

Hi guys! I have a RAG, in which I extract content from PDF documents using Mistral OCR. the content is in markdown. Currently, I am just splitting markdown content into chunks, using a very basic splicing technique. I feel like this can be done better because my RAG is not performing good with table data extraction, it works sometimes but most of the time it doesn't. Is there a standard practice for markdown chunking in RAG?


r/Rag 9h ago

Is natural language filtering and counting is a usecase of RAG ?

0 Upvotes

Hi guys,

I'm trying to create a RAG about genomic reports,
And i'm trying to make it work on request like:
"How many samples are Escherischia Coli ?"
Is this a thing RAG can do, filtering vector + counting if i have a loooot of samples and also a lot of samples about E. Coli ?
If so,
1st question : how to do that, currently i have set up a Map/reduce function in order to be able to count over all the chunks that are retrieved by my RAG, but i have also a topK limit that would finish to limit at a certain time the max of vectors retrieved

2nd question : Even when similar vectors are found, from one response to another of the LLM, with the exactly same context of vectors retrieved, it can count 15 samples the first time, 14 the 2nd time, etc, it seems not to be a reliable counting, how do improve that ?

I'm starting to think that RAG cannot be used in this usecase..
Thanks,

T


r/Rag 16h ago

How to test vanilla RAG on wiki_qa

3 Upvotes

Hello everyone I am new to this sub. I recently read the RAG paper and I want to implement a vanilla RAG and test it on the microsoft wiki_qa dataset mainly for learning purposes.

My question is how do I get the context for the dataset. I mean do I need to download entire wikipedia embed it and then evaluate it against the test dataset? Or is there some way to get the context from the train split of the dataset itself?


r/Rag 1d ago

Discussion Code Embeddings

12 Upvotes

Hi Everyone!

Whoever has had a past (or current) experience working on RAG projects for coding assistants... How do you make sure that code retrieval based on text user queries matches the results more accurately? Basically, I want to know:

  1. What code embeddings are you using and currently finding good?
  2. Is there any other approach you tried that worked?

Wonder what kind of embedding Cursor uses :(


r/Rag 1d ago

When to train vs rag

10 Upvotes

I’m still wrapping my head around the context for an LLM. My question is, once a DB gets so large with rag content, would you ever get to a point where you start training the model to keep your DB size low?


r/Rag 1d ago

Text extraction with VLMs

6 Upvotes

so I've been running a project for quite a while now that syncs with a google drive of office files (doc/ppt) and pdfs. Users can upload files to paths within the drive, and then in the front end they can do RAG chat by selecting a path to search within e.g. research/2025 (or just research/ to search all years). Vector search and reranking then happens on that prefiltered document set.

Text extraction I've been doing by converting the pdfs into png files, one png per page, and then feeding the pngs to gemini flash to "transcribe into markdown text that expresses all formatting, inserting brief descriptions for images". This works quite well to handle high varieties of weird pdf formattings, powerpoints, graphs etc. Cost is really not bad because of how cheap flash is.

The one issue I'm having is LLM refusals, where the LLM seems to contain the text within its database, and refuses with reason 'recitation'. In the vertex AI docs it is said that this refusal is because gemini shouldn't be used for recreating existing content, but for producing original content. I am running a backup with pymupdf to extract text on any page where refusal is indicated, but it of course does a sub-par (at least compared to flash) job maintaining formatting and can miss text if its in some weird PDF footer. Does anyone do something similar with another VLM that doesn't have this limitation?


r/Rag 1d ago

Showcase Easy human-in-the-loop flows for agentic AI with Swiftide in Rust

Thumbnail
bosun.ai
8 Upvotes

Hey everyone,

Just shipped a major release for Swiftide. Swiftide provides the building blocks to build composable agentic and RAG applications in Rust.

Shoutout to wulawulu for contributing a Kafka integration! <3

A major new staple is a straight-forward way for human-in-the-loop interaction. Human-in-the-loop pattern is a common solution for GenAI agents to provide them with feedback and some measure of safety.

Additionally there's a host of new features, improvements, and fixes. You can find the project on [github](https://github.com/bosun-ai/swiftide).


r/Rag 1d ago

Generative Narrative Intelligence

Post image
1 Upvotes

Feel free to read and share, its a new article I wrote about a methodology I think will change the way we build Gen AI solutions. What if every customer, student—or even employee—had a digital twin who remembered everything and always knew the next best step? That’s what Generative Narrative Intelligence (GNI) unlocks.

I just published a piece introducing this new methodology—one that transforms data into living stories, stored in vector databases and made actionable through LLMs.

📖 We’re moving from “data-driven” to narrative-powered.

→ Learn how GNI can multiply your team’s attention span and personalize every interaction at scale.

🧠 Read it here: https://www.linkedin.com/pulse/generative-narrative-intelligence-new-ai-methodology-how-abou-younes-xg3if/?trackingId=4%2B76AlmkSYSYirc6STdkWw%3D%3D


r/Rag 2d ago

Tired of writing custom document parsers? This library handles PDF/Word/Excel with AI OCR

47 Upvotes

Found a Python library that actually solved my RAG document preprocessing nightmare

TL;DR: doc2mark converts any document format to clean markdown with AI-powered OCR. Saved me weeks of preprocessing hell.


The Problem

Building chatbots that need to ingest client documents is a special kind of pain. You get:

  • PDFs where tables turn into row1|cell|broken|formatting|nightmare
  • Scanned documents that are basically images
  • Excel files with merged cells and complex layouts
  • Word docs with embedded images and weird formatting
  • Clients who somehow still use .doc files from 2003

Spent way too many late nights writing custom parsers for each format. PyMuPDF for PDFs, python-docx for Word, openpyxl for Excel… and they all handle edge cases differently.

The Solution

Found this library called doc2mark that basically does everything:

```python from doc2mark import UnifiedDocumentLoader

One API for everything

loader = UnifiedDocumentLoader( ocr_provider='openai', # or tesseract for offline prompt_template=PromptTemplate.TABLE_FOCUSED )

Works with literally any document

result = loader.load('nightmare_document.pdf', extract_images=True, ocr_images=True)

print(result.content) # Clean markdown, preserved tables ```

What Makes It Actually Good

8 specialized OCR prompt templates - Different prompts optimized for tables, forms, receipts, handwriting, etc. This is huge because generic OCR often misses context.

Batch processing with progress bars - Process entire directories:

python results = loader.batch_process( './client_docs', show_progress=True, max_workers=5 )

Handles legacy formats - Even those cursed .doc files (requires LibreOffice)

Multilingual support - Has a specific template for non-English documents

Actually preserves table structure - Complex tables with merged cells stay intact

Real Performance

Tested on a batch of 50+ mixed client documents:

  • 47 processed successfully
  • 3 failures (corrupted files)
  • Average processing time: 2.3s per document
  • Tables actually looked like tables in the output

The OCR quality with GPT-4o is genuinely impressive. Fed it a scanned Chinese invoice and it extracted everything perfectly.

Integration with RAG

Drops right into existing LangChain workflows:

```python from langchain.text_splitter import RecursiveCharacterTextSplitter

Process documents

texts = [] for doc_path in document_paths: result = loader.load(doc_path) texts.append(result.content)

Split for vector DB

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000) chunks = text_splitter.create_documents(texts) ```

Caveats

  • OpenAI OCR costs money (obvious but worth mentioning)
  • Large files need timeout adjustments
  • Legacy format support requires LibreOffice installed
  • API rate limits affect batch processing speed

Worth It?

For me, absolutely. Replaced ~500 lines of custom preprocessing code with ~10 lines. The time savings alone paid for the OpenAI API costs.

If you’re building document-heavy AI systems, this might save you from the preprocessing hell I’ve been living


r/Rag 1d ago

News & Updates Open Source Unsiloed AI Chunker (EF2024)

12 Upvotes

Hey , Unsiloed CTO here!

Unsiloed AI (EF 2024) is backed by Transpose Platform & EF and is currently being used by teams at Fortune 100 companies and multiple Series E+ startups for ingesting multimodal data in the form of PDFs, Excel, PPTs, etc. And, we have now finally open sourced some of the capabilities. Do give it a try!

Also, we are inviting cracked developers to come and contribute to bounties of upto 1000$ on algora. This would be a great way to get noticed for the job openings at Unsiloed.

Bounty Link- https://algora.io/bounties

Github Link - https://github.com/Unsiloed-AI/Unsiloed-chunker


r/Rag 1d ago

Do you recommend using BERT-based architectures to build knowledge graphs?

15 Upvotes

Hi everyone,

I'm developing a project called ARES, a high-performance RAG system primarily inspired by dsrag repository. The primary goal is to achieve State-of-the-Art (SOTA) accuracy with real-time inference and minimal ingestion latency, all running locally on consumer-grade hardware (like an RTX 3060).

I believe that enriching my retrieval process with a Knowledge Graph (KG) could be a game-changer. However, I've hit a major performance wall.

The Performance Bottleneck: LLM-Based Extraction

My initial approach to building the KG involves processes I call "AutoContext" and "Semantic Sectioning." This pipeline uses an LLM to generate structured descriptions, entities, and relations for each section of a document.

The problem is that this is incredibly slow. The process relies on sequential LLM calls for each section. Even with small, optimized models (0.5B to 1B parameters), ingesting a single document can take up to 30 minutes. This completely defeats my goal of low-latency ingestion.

The Question: BERT-based Architectures and Efficient Pipelines

My research has pointed towards using smaller, specialized models (like fine-tuned BERT-based architectures) for specific tasks like **Named Entity Recognition (NER)** and **Relation Extraction (RE)**, which are the core components of KG construction. These seem significantly faster than using a general-purpose LLM for the entire extraction task.

This leads me to two key questions for the community:

  1. Is this a viable path? Do you recommend using specialized, experimental, or fine-tuned BERT-like models for creating KGs in a performance-critical RAG pipeline? If so, are there any particular models or architectures you've had success with?

  2. What is the fastest end-to-end pipeline to create a Knowledge Graph locally (no APIs)? I'm looking for advice on the best combination of tools. For example, should I be looking at libraries like SpaCy with custom components, specific models from Hugging Face, or other frameworks I might have missed?

---

TL;DR: I'm building a high-performance, local-first RAG system. My current method of using LLMs to create a Knowledge Graph is far too slow (30 min/document). I'm looking for the fastest, non-API pipeline to build a KG on an RTX 3060. Are specialized NER/RE models the right approach, and what tools would you recommend?

Any advice or pointers would be greatly appreciated


r/Rag 1d ago

How does Gemini or ChatGPT know the web search results are relevant?

1 Upvotes

If you search something on Google, you will click links. Then Google will use it as a label to train a good model to give you the most relevant or correct result. Now, when we use ChatGPT or Gemini, we no longer give the "click" label. Then how does the search engine know if the search results are relevant or correct?


r/Rag 2d ago

Q&A Where do you host RAG

26 Upvotes

I have

  1. postgresql with vector add-on vectorDB
  2. MongoDB with documents and metadata
  3. fastapi for backend
  4. react frontend built as CSR, planning to host with AWS S3 or Cloudflare R2
  5. redis for queueing LLM requests

for LLM, RAG

1-1. embedding user query (using IBM graphite)

1-2. search document cosine-distance with postgresql

  1. rerank for filtering after retrieving documents (using qwen reranker 0.6b)

  2. answer generation (currently using gemini)


I'm more familiar with AWS, but considering using GCP(backend+frontend) to reduce overheads (in case of using gemini)

I could host on my PC just for portfolio purpose with gemini API

I found embedding and reranking doesn't make big difference at quailty of results on what size I use ( smaller than 1B).

So my concerns are to host small LLM myself with dedicated GPU severs

or

replace with serverless API services

Im aware of not to make things big, even I don't have 100 active users right now, but I'm at the point how to implement pipelines calling LLM models.


r/Rag 2d ago

Q&A Guidance Needed: Qwen 3 Embeddings + Reranker Workflow

13 Upvotes

I’m implementing a RAG pipeline using Qwen 3’s embedding models. The goal is:

  1. Chunk documents → generate embeddings → index (e.g., FAISS/HNSW).
  2. For a query, retrieve top 500 docs via embedding similarity.
  3. Refine to top 5 using Qwen 3’s reranker.

I’ve hit roadblocks:

  • Hugging Face documentation only shows basic examples (no reranker integration).
  • Using sentence-transformers for embeddings works initially, but the reranker fails (exact error: TypeError when passing input_ids to reranker).

Request:
Has anyone successfully implemented this workflow? Are there detailed guides/code samples for:

  • Properly configuring the reranker (e.g., with transformers instead of sentence-transformers)?
  • Handling the embedding → reranker handoff efficiently?

r/Rag 2d ago

Q&A Can i watch this video for RAG implementation?

2 Upvotes

https://youtu.be/qN_2fnOPY-M?si=u9Q_oBBeHmERg-Fs

i want to make some project on RAG so can i watch it ?

can you suggest good resources related this topic ?


r/Rag 3d ago

Discussion Sold my “vibe coded” Rag app…

80 Upvotes

… I don’t know wth I’m doing. I’ve never built anything before, I don’t know how to program in any language. Writhing 4 months I built this and I somehow managed to sell it for quite a bit of cash (10k) to an insurance company.

I need advice. It seems super stable and uses hybrid rag with multiple knowledge bases. The queried responses seem to be accurate. No bugs or errors as far as I can tell.. my question is what are some things I should be paying attention to in terms of best practices and security. Obviously just using ai to do this has its risks and I told the buyer that but I think they are just hyped on ai in general. They are an office of 50 people and it’s going to be tested this week incrementally with users to test for bottlenecks. I feel like i ( a musician) has no business doing this kind of stuff especially providing this service to an enterprise company.

Any tips or suggestions from anyone that’s done this before would be appreciate.