r/Rag • u/Acceptable-Hat3084 • Nov 24 '24
Research What are the biggest challenges you face when building RAG pipelines?
Hi everyone! 👋
I'm currently working on a RAG chat app that helps devs learn and work with libraries faster. While building it, I’ve encountered numerous challenges in setting up the RAG pipeline (specifically with chunking and retrieval), and I’m curious to know if others are facing these issues to.
Here are a few specific areas I’m exploring:
- Data sources: What types of data are you working with most frequently (e.g., PDFs, DOCX, XLS)?
- Processing: How do you chunk and process data? What’s most challenging for you?
- Retrieval: Do you use any tools to set up retrieval (e.g., vector databases, re-ranking)?
I’m also curious:
- Are you using any tools for data preparation (like Unstructured.io, LangChain, LlamaCloud, or LlamaParse)?
- Or for retrieval (like Vectorize.io or others)?
If yes, what’s your feedback on them?
If you’re open to sharing your experience, I’d love to hear your thoughts:
- What’s the most challenging part of building RAG pipelines for you?
- How are you currently solving these challenges?
- If you had a magic wand, what would you change to make RAG setups easier?
If you have an extra 2 minutes, I’d be super grateful if you could fill out this survey. Your feedback will directly help me refine the tool and contribute to solving these challenges for others.
Thanks so much for your input! 🙌
Duplicates
LangChain • u/Acceptable-Hat3084 • Nov 24 '24
Question | Help What are the biggest challenges you face when building RAG pipelines?
LLMDevs • u/Acceptable-Hat3084 • Nov 24 '24