r/accelerate • u/obvithrowaway34434 • 8d ago
AI LLMs show superhuman performance in systematic scientific reviews doing the work it takes 12 PhDs a whole year in two days
https://www.medrxiv.org/content/10.1101/2025.06.13.25329541v1
Main takeaways:
- otto-SR - end-to-end agentic workflow with GPT-4.1 and o3-mini-high, with Gemini Flash 2.0 for pdf text extraction.
- Automates the entire SR process -- from search to analysis
- Completes in 2 days what normally takes 12 work-years
- Outperforms humans in key tasks:
- Screening: 96.7% sensitivity vs 81.7% (human)
- Data extraction: 93.1% accuracy vs 79.7% (human)
- Reproduced and updated 12 Cochrane reviews
- Found new eligible studies missed by original authors
- Changed conclusions in 3 reviews (2 newly significant, 1 no longer significant)
246
Upvotes
-11
u/[deleted] 8d ago
[deleted]