The trawl found 20,500 articles tackling the topic, but shockingly, less than 1 percent of them were scientifically robust enough to be confident in their claims, say the authors. Of those, only 25 tested their deep learning models on unseen data, and only 14 actually compared performance with health professionals on the same test sample.
Nonetheless, when the researchers pooled the data from the 14 most rigorous studies, they found the deep learning systems correctly detected disease in 87 percent of cases, compared to 86 percent for healthcare professionals. They also did well on the equally important metric of excluding patients who don’t have a particular disease, getting it right 93 percent of the time compared to 91 percent for humans.
Ultimately, then, the results of the review are broadly positive for AI, but damning of the hype that has built up around the technology and the research practices of most of those trying to apply it to medical diagnosis.