Large-scale Long-tailed Disease Diagnosis on Radiology Images

Developing a generalist radiology diagnosis system can greatly enhance clinical diagnostics. In this paper, we introduce RadDiag, a foundational model supporting 2D and 3D inputs across various modalities and anatomies, using a transformer-based fusion module for comprehensive disease diagnosis. Due to patient privacy concerns and the lack of large-scale radiology diagnosis datasets, we utilize high-quality, clinician- reviewed radiological images available online with diagnosis labels. Our dataset, RP3D-DiagDS, contains 40,936 cases with 195,010 scans covering 5,568 disorders (930 unique ICD-10-CM codes). Experimentally, our RadDiag achieves 95.14% AUC on internal evaluation with the knowledge-enhancement strategy. Additionally, RadDiag can be zero-shot applied or fine-tuned to external diagnosis datasets sourced from various hospitals, demonstrating state-of-the-art results. In conclusion, we show that publicly shared medical data on the Internet is a tremendous and valuable resource that can potentially support building a generalist AI for healthcare.

Overall, the proposed dataset contains 40,936 cases, of 195,010 images from 9 diverse imaging modalities and 7 human anatomy regions, note that, each case may contain images of multiple scans. The data covers 5,568 different disorders, that have been manually mapped into 930 ICD-10-CM codes.

Specifically, cases in our dataset are sourced from the Radiopaedia website -- a growing peer-reviewed educational radiology resource website, that allows the clinicians to upload 3D volumes to better reflect real clinical scenarios. Additionally, all privacy issues have already been resolved by the clinicians at uploading time.

For each cases in Radiopaedia, 'Related Radiopaedia articles' contains links to related articles named with corresponding disorders for radiology images, which are treated as diagnosis labels and have been meticulously peer-reviewed by experts in Radiopaedia Editorial Board.

Finally, we get 40,936 cases containing 195,010 images labeled by 5,568 disorder classes and 930 ICD-10-CM classes and will continually maintain the dataset, growing the case number.

RP3D-DiagDS comprises images from 9 modalities, namely, computed tomography (CT), magnetic resonance imaging (MRI), X-ray, Ultrasound, Fluoroscopy, Nuclear medicine, Mammography, DSA (angiography), and Barium Enema. Each case may include images from multiple modalities, to ensure precise and comprehensive diagnosis of disorders. Overall, approximately 19.4% of the cases comprise images from two modalities, while around 2.9% involve images from three to five modalities. The remaining cases are associated with image scans from a single modality.

RP3D-DiagDS comprises images from various anatomical regions, including the head and neck, spine, chest, breast, abdomen and pelvis, upper limb, and lower limb, providing comprehensive coverage of the entire human body. Totally there are 5568 disorders mapped into 930 ICD-10-CM classes. We define the “head class” category with case counts greater than 100, the “body class” category with case counts between 30 and 100, and the “tail class” category with case counts less than 30.

In the table ``FM'' represents the fusion module and ``KE'' represents the knowledge enhancement strategy. We report the results on Head/Medium/Tail class sets separately.

As depicts in ROC curves above, the shadow in the figure shown the 95% CI (Confidence interval) and FM, KE are short for Fusion Module and Knowledge Enhancement.

We also conduct zero-shot evaluation on fine-grained diagnosis tasks introduced by each representative external datasets, spanning six anatomies, including the head and neck, chest, breast, abdomen and pelvis, and spine. These datasets cover five modalities, i.e., CT, MRI, X-ray, Ultrasound, and Mammography, allowing for a comprehensive evaluation of our model's capabilities. As baselines, we have included RadIN and BiomedCLIP for comparison in the zero-shot setting.

while finetuning our model on external datasets, we see significant performance improvement on all 22 external datasets with different data portions, compared to models trained from scratch. In most cases, our model can even surpass the specialist SOTAs which are designed carefully for the targeting task, demonstrating that publicly shared medical data on the Internet is a tremendous and valuable resource that can serve as a superior large-scale supervised training dataset for the medical domain.

We directly adopt the “normal” classifier head trained on our RP3D-DiagDS to obtain the prediction probability score and compare it with other foundation models, RadIN [53] and BiomedCLIP [ 82], in terms of AUC and AP on the five zero-shot datasets.

we pick a total of six diseases, where we demonstrate the prediction probability distribution on cases in the test set for a certain class. We utilize Score-CAM, a visual explanation method based on class activation maps, to visualize the areas that contribute more to make a decision on the target disease class.


        @article{zheng2023large,
          title={Large-scale Long-tailed Disease Diagnosis on Radiology Images},
          author={Zheng, Qiaoyu and Zhao, Weike and Wu, Chaoyi and Zhang, Xiaoman and Zhang, 
            Ya and Wang, Yanfeng and Xie, Weidi},
          journal={arXiv preprint arXiv:2312.16151},
          year={2023}
        }

Abstract

The RP3D-DiagDS Dataset

Architecture

Results

BibTeX