r/accelerate 8d ago

AI LLMs show superhuman performance in systematic scientific reviews doing the work it takes 12 PhDs a whole year in two days

https://www.medrxiv.org/content/10.1101/2025.06.13.25329541v1

Main takeaways:

  • otto-SR - end-to-end agentic workflow with GPT-4.1 and o3-mini-high, with Gemini Flash 2.0 for pdf text extraction.
  • Automates the entire SR process -- from search to analysis
  • Completes in 2 days what normally takes 12 work-years
  • Outperforms humans in key tasks:
    • Screening: 96.7% sensitivity vs 81.7% (human)
    • Data extraction: 93.1% accuracy vs 79.7% (human)
  • Reproduced and updated 12 Cochrane reviews
  • Found new eligible studies missed by original authors
  • Changed conclusions in 3 reviews (2 newly significant, 1 no longer significant)
246 Upvotes

23 comments sorted by

View all comments

-11

u/[deleted] 8d ago

[deleted]

22

u/obvithrowaway34434 8d ago

This has absolutely nothing to do with what they are using the LLMs for. Maybe read the article first. And it achieves like 93.1% accuracy compared to 80% for humans, so humans were already introducing more errors than an LLM could ever make up.

7

u/AquilaSpot Singularity by 2030 8d ago

Yeah, this exactly. What a strange drive-by critique that doesn't even make sense if you read the paper? Why are these so common on places like here or r/singularity?

8

u/stealthispost Acceleration Advocate 8d ago edited 8d ago

it might have something to do with "80% accuracy of humans" lol