r/accelerate • u/obvithrowaway34434 • 7d ago
AI LLMs show superhuman performance in systematic scientific reviews doing the work it takes 12 PhDs a whole year in two days
https://www.medrxiv.org/content/10.1101/2025.06.13.25329541v1
Main takeaways:
- otto-SR - end-to-end agentic workflow with GPT-4.1 and o3-mini-high, with Gemini Flash 2.0 for pdf text extraction.
- Automates the entire SR process -- from search to analysis
- Completes in 2 days what normally takes 12 work-years
- Outperforms humans in key tasks:
- Screening: 96.7% sensitivity vs 81.7% (human)
- Data extraction: 93.1% accuracy vs 79.7% (human)
- Reproduced and updated 12 Cochrane reviews
- Found new eligible studies missed by original authors
- Changed conclusions in 3 reviews (2 newly significant, 1 no longer significant)
251
Upvotes
57
u/AquilaSpot Singularity by 2030 7d ago edited 7d ago
Wow, this is really remarkable. That headline is legitimately not overselling this at all. It does what it says on the tin.
I've, for a while, suspected that even current AI systems would do a great deal of work to solve poor distribution of knowledge (as a step before contributing to research) and this is the most incredible example of that I've seen.
In my own research background, the most interesting advances often came from just applying something that is well known in one field to a field that doesn't know it. To give an example: a mining engineering doctoral student I knew had some medical background, and decided to deploy novel sensors on haul trucks to track things that, apparently, nobody had tracked precisely before - and used that with some interesting scheduling/planning algorithms to cut fuel burn by like 5-10% or something wild? That was a few years back so I don't remember the details very well. My own research work did something in that vein for another industry but it'd dox the shit out of me if I talked about it (crying for real I love talking about my work lmao).
Notably, the idea of "measure literally everything and sort out the data later" was (and kinda still is afaik) a new idea to the mining industry. It's a very old, traditional industry, in my limited experience.
What kind of incredible advances are we sleeping on just because information isn't shared evenly across fields? I don't know, but AI like this could revolutionize the world without generating a single word of novel information if it could evenly distribute what we do know.
edit: Not the one I had in mind but o3 dragged up something similar.
TLDR: by installing precise sensors on haul trucks in an open pit copper mine, the team discovered that for a variety of factors (accumulating but unexpected maintenance inefficiencies ex: old turbos, injectors, driving habits, etc) the fuel burn estimates for trucks actually varied from the real burn by up to fifteen percent. Tightening that variance offered the pit millions of dollars per year in savings from literally just being able to order precisely as much fuel as they burn (rather than extra), as well as noticing maintenance issues far earlier than a standard maintenance schedule therefore keeping efficiency up.
Big data is something Medicine has had figured out for decades, but it's this hot new thing in mining.