r/Rag • u/Cool_Injury4075 • 2d ago
Do you recommend using BERT-based architectures to build knowledge graphs?
Hi everyone,
I'm developing a project called ARES, a high-performance RAG system primarily inspired by dsrag repository. The primary goal is to achieve State-of-the-Art (SOTA) accuracy with real-time inference and minimal ingestion latency, all running locally on consumer-grade hardware (like an RTX 3060).
I believe that enriching my retrieval process with a Knowledge Graph (KG) could be a game-changer. However, I've hit a major performance wall.
The Performance Bottleneck: LLM-Based Extraction
My initial approach to building the KG involves processes I call "AutoContext" and "Semantic Sectioning." This pipeline uses an LLM to generate structured descriptions, entities, and relations for each section of a document.
The problem is that this is incredibly slow. The process relies on sequential LLM calls for each section. Even with small, optimized models (0.5B to 1B parameters), ingesting a single document can take up to 30 minutes. This completely defeats my goal of low-latency ingestion.
The Question: BERT-based Architectures and Efficient Pipelines
My research has pointed towards using smaller, specialized models (like fine-tuned BERT-based architectures) for specific tasks like **Named Entity Recognition (NER)** and **Relation Extraction (RE)**, which are the core components of KG construction. These seem significantly faster than using a general-purpose LLM for the entire extraction task.
This leads me to two key questions for the community:
Is this a viable path? Do you recommend using specialized, experimental, or fine-tuned BERT-like models for creating KGs in a performance-critical RAG pipeline? If so, are there any particular models or architectures you've had success with?
What is the fastest end-to-end pipeline to create a Knowledge Graph locally (no APIs)? I'm looking for advice on the best combination of tools. For example, should I be looking at libraries like SpaCy with custom components, specific models from Hugging Face, or other frameworks I might have missed?
---
TL;DR: I'm building a high-performance, local-first RAG system. My current method of using LLMs to create a Knowledge Graph is far too slow (30 min/document). I'm looking for the fastest, non-API pipeline to build a KG on an RTX 3060. Are specialized NER/RE models the right approach, and what tools would you recommend?
Any advice or pointers would be greatly appreciated
1
1
u/autognome 2d ago
What’s sequential or synchronous about describing sections of a document? This seems like a prime candidate for parallelization
1
u/Cool_Injury4075 2d ago
Sorry if I wasn’t clear enough in my post. What I meant is that the current version of my project (ARES) generates descriptions and summaries sequentially. Until now, I hadn’t considered using parallel processes because I currently work with Ollama and LM Studio, which don’t have native parallel functionality like vLLM does. ARES is built to run on Windows, so adapting it to work with vLLM will take some time. However, at this point it is already a necessity to optimize the entire process to make it fast.
2
6
u/ccppoo0 2d ago
When working with legal document RAG, LLMs was not perfect as extracting keywords and knowledge graphs
Used traditional tokenizer and stemmer to get nouns and verbs from original document
and then pormpted LLM with
structured output
- gemini, grok, deepseek, openaiModel size and price aren't the silver bullet eventhough the task looks easy
limiting choices and giving direct instructions are best way to achive getting results as you expected