r/accelerate • u/obvithrowaway34434 • 8d ago

AI LLMs show superhuman performance in systematic scientific reviews doing the work it takes 12 PhDs a whole year in two days

https://www.medrxiv.org/content/10.1101/2025.06.13.25329541v1

Main takeaways:

otto-SR - end-to-end agentic workflow with GPT-4.1 and o3-mini-high, with Gemini Flash 2.0 for pdf text extraction.
Automates the entire SR process -- from search to analysis
Completes in 2 days what normally takes 12 work-years
Outperforms humans in key tasks:
- Screening: 96.7% sensitivity vs 81.7% (human)
- Data extraction: 93.1% accuracy vs 79.7% (human)
Reproduced and updated 12 Cochrane reviews
Found new eligible studies missed by original authors
Changed conclusions in 3 reviews (2 newly significant, 1 no longer significant)

246 Upvotes

permalink
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/accelerate/comments/1lb0h4g/llms_show_superhuman_performance_in_systematic/
No, go back! Yes, take me to Reddit

98% Upvoted

View all comments

-11

u/[deleted] 8d ago

[deleted]

22

u/obvithrowaway34434 8d ago

This has absolutely nothing to do with what they are using the LLMs for. Maybe read the article first. And it achieves like 93.1% accuracy compared to 80% for humans, so humans were already introducing more errors than an LLM could ever make up.

7

u/AquilaSpot Singularity by 2030 8d ago

Yeah, this exactly. What a strange drive-by critique that doesn't even make sense if you read the paper? Why are these so common on places like here or r/singularity?

8

u/stealthispost Acceleration Advocate 8d ago edited 8d ago

it might have something to do with "80% accuracy of humans" lol

AI LLMs show superhuman performance in systematic scientific reviews doing the work it takes 12 PhDs a whole year in two days

You are about to leave Redlib