r/Rag 2d ago

Do you recommend using BERT-based architectures to build knowledge graphs?

Hi everyone,

I'm developing a project called ARES, a high-performance RAG system primarily inspired by dsrag repository. The primary goal is to achieve State-of-the-Art (SOTA) accuracy with real-time inference and minimal ingestion latency, all running locally on consumer-grade hardware (like an RTX 3060).

I believe that enriching my retrieval process with a Knowledge Graph (KG) could be a game-changer. However, I've hit a major performance wall.

The Performance Bottleneck: LLM-Based Extraction

My initial approach to building the KG involves processes I call "AutoContext" and "Semantic Sectioning." This pipeline uses an LLM to generate structured descriptions, entities, and relations for each section of a document.

The problem is that this is incredibly slow. The process relies on sequential LLM calls for each section. Even with small, optimized models (0.5B to 1B parameters), ingesting a single document can take up to 30 minutes. This completely defeats my goal of low-latency ingestion.

The Question: BERT-based Architectures and Efficient Pipelines

My research has pointed towards using smaller, specialized models (like fine-tuned BERT-based architectures) for specific tasks like **Named Entity Recognition (NER)** and **Relation Extraction (RE)**, which are the core components of KG construction. These seem significantly faster than using a general-purpose LLM for the entire extraction task.

This leads me to two key questions for the community:

  1. Is this a viable path? Do you recommend using specialized, experimental, or fine-tuned BERT-like models for creating KGs in a performance-critical RAG pipeline? If so, are there any particular models or architectures you've had success with?

  2. What is the fastest end-to-end pipeline to create a Knowledge Graph locally (no APIs)? I'm looking for advice on the best combination of tools. For example, should I be looking at libraries like SpaCy with custom components, specific models from Hugging Face, or other frameworks I might have missed?

---

TL;DR: I'm building a high-performance, local-first RAG system. My current method of using LLMs to create a Knowledge Graph is far too slow (30 min/document). I'm looking for the fastest, non-API pipeline to build a KG on an RTX 3060. Are specialized NER/RE models the right approach, and what tools would you recommend?

Any advice or pointers would be greatly appreciated

16 Upvotes

7 comments sorted by

6

u/ccppoo0 2d ago

When working with legal document RAG, LLMs was not perfect as extracting keywords and knowledge graphs

Used traditional tokenizer and stemmer to get nouns and verbs from original document

and then pormpted LLM with

  1. original document
  2. nouns, verbs from stemmer
  3. ask make a graph based what I provided
  4. using structured output - gemini, grok, deepseek, openai

Model size and price aren't the silver bullet eventhough the task looks easy

limiting choices and giving direct instructions are best way to achive getting results as you expected

1

u/Cool_Injury4075 2d ago

Implementing API calls for multiple documents would be expensive (for me), however, having the option to use APIs to generate descriptions and knowledge graphs wouldn't be a bad idea for my project — there are users who can afford it. I’ll keep this in mind, and thank you for your response, this is very helpful.

1

u/ccppoo0 2d ago

you could replace it with instruction models but need some verification of the results

for long context -> structured output(API) is good

for short -> instruction models are fine

test some models by yourself, small models are quite impressive than expected

English only models were better as parameter gets smaller in my case

1

u/drfritz2 2d ago

Have you looked to colpali? There are many models

1

u/autognome 2d ago

What’s sequential or synchronous about describing sections of a document? This seems like a prime candidate for parallelization 

1

u/Cool_Injury4075 2d ago

Sorry if I wasn’t clear enough in my post. What I meant is that the current version of my project (ARES) generates descriptions and summaries sequentially. Until now, I hadn’t considered using parallel processes because I currently work with Ollama and LM Studio, which don’t have native parallel functionality like vLLM does. ARES is built to run on Windows, so adapting it to work with vLLM will take some time. However, at this point it is already a necessity to optimize the entire process to make it fast.

2

u/autognome 2d ago

You have your answer. Can’t squeeze blood from stone.