Large-scale Long-tailed Disease Diagnosis on Radiology Images


Qiaoyu Zheng*1,2
Weike Zhao*1,2
Chaoyi Wu*1,2
Xiaoman Zhang1,2
Lisong Dai1,3
Hengyu Guan4,5

Yuehua Li1,3
Ya Zhang1,2
Yanfeng Wang1,2,
Weidi Xie1,2,

1CMIC, Shanghai Jiao Tong University
2Shanghai AI Laboratory
3Shanghai Sixth People's Hospital, Affiliated to Shanghai Jiao Tong University
4Department of Reproductive Medicine, Ren Ji Hospital, Shanghai Jiao Tong University School of Medicine
5Shanghai Key Laboratory for Assisted Reproduction and Reproductive Genetics

Code [GitHub]

Paper [arXiv]

Model [HuggingFace]


Abstract

Developing a generalist radiology diagnosis system can greatly enhance clinical diagnostics. In this paper, we introduce RadDiag, a foundational model supporting 2D and 3D inputs across various modalities and anatomies, using a transformer-based fusion module for comprehensive disease diagnosis. Due to patient privacy concerns and the lack of large-scale radiology diagnosis datasets, we utilize high-quality, clinician- reviewed radiological images available online with diagnosis labels. Our dataset, RP3D-DiagDS, contains 40,936 cases with 195,010 scans covering 5,568 disorders (930 unique ICD-10-CM codes). Experimentally, our RadDiag achieves 95.14% AUC on internal evaluation with the knowledge-enhancement strategy. Additionally, RadDiag can be zero-shot applied or fine-tuned to external diagnosis datasets sourced from various hospitals, demonstrating state-of-the-art results. In conclusion, we show that publicly shared medical data on the Internet is a tremendous and valuable resource that can potentially support building a generalist AI for healthcare.



The RP3D-DiagDS Dataset

Overall, the proposed dataset contains 40,936 cases, of 195,010 images from 9 diverse imaging modalities and 7 human anatomy regions, note that, each case may contain images of multiple scans. The data covers 5,568 different disorders, that have been manually mapped into 930 ICD-10-CM codes.

Specifically, cases in our dataset are sourced from the Radiopaedia website -- a growing peer-reviewed educational radiology resource website, that allows the clinicians to upload 3D volumes to better reflect real clinical scenarios. Additionally, all privacy issues have already been resolved by the clinicians at uploading time.

For each cases in Radiopaedia, 'Related Radiopaedia articles' contains links to related articles named with corresponding disorders for radiology images, which are treated as diagnosis labels and have been meticulously peer-reviewed by experts in Radiopaedia Editorial Board.

Finally, we get 40,936 cases containing 195,010 images labeled by 5,568 disorder classes and 930 ICD-10-CM classes and will continually maintain the dataset, growing the case number.

Analysis of the Cases in RP3D-DiagDS dataset

RP3D-DiagDS comprises images from 9 modalities, namely, computed tomography (CT), magnetic resonance imaging (MRI), X-ray, Ultrasound, Fluoroscopy, Nuclear medicine, Mammography, DSA (angiography), and Barium Enema. Each case may include images from multiple modalities, to ensure precise and comprehensive diagnosis of disorders. Overall, approximately 19.4% of the cases comprise images from two modalities, while around 2.9% involve images from three to five modalities. The remaining cases are associated with image scans from a single modality.

RP3D-DiagDS comprises images from various anatomical regions, including the head and neck, spine, chest, breast, abdomen and pelvis, upper limb, and lower limb, providing comprehensive coverage of the entire human body. Totally there are 5568 disorders mapped into 930 ICD-10-CM classes. We define the “head class” category with case counts greater than 100, the “body class” category with case counts between 30 and 100, and the “tail class” category with case counts less than 30.



Architecture

The overview of our method. Three parts demonstrate our proposed visual encoders and fusion module, together with the knowledge enhancement strategy respectively. a The three types of vision encoder, i.e., ResNet-based, ViT-based , and ResNet-ViT-mixing. b The architecture of the fusion module. The figure shows the transformer-based fusion module, enabling case-level information fusion. c The knowledge enhancement strategy. We first pre-train a text encoder with extra medical knowledge with contrastive learning, leveraging synonyms, descriptions, and hierarchy. Then we view the text embedding as a natural classifier to guide the diagnosis classification.



Results

R1: Classification results on Disorders and ICD-10-CM levels.

In the table ``FM'' represents the fusion module and ``KE'' represents the knowledge enhancement strategy. We report the results on Head/Medium/Tail class sets separately.

R2: ROC curves on Disorders and ICD-10-CM.

As depicts in ROC curves above, the shadow in the figure shown the 95% CI (Confidence interval) and FM, KE are short for Fusion Module and Knowledge Enhancement.

R3: Zero-shot results on 6 external datasets.

We also conduct zero-shot evaluation on fine-grained diagnosis tasks introduced by each representative external datasets, spanning six anatomies, including the head and neck, chest, breast, abdomen and pelvis, and spine. These datasets cover five modalities, i.e., CT, MRI, X-ray, Ultrasound, and Mammography, allowing for a comprehensive evaluation of our model's capabilities. As baselines, we have included RadIN and BiomedCLIP for comparison in the zero-shot setting.

R4: Finetune results on various external datasets.

while finetuning our model on external datasets, we see significant performance improvement on all 22 external datasets with different data portions, compared to models trained from scratch. In most cases, our model can even surpass the specialist SOTAs which are designed carefully for the targeting task, demonstrating that publicly shared medical data on the Internet is a tremendous and valuable resource that can serve as a superior large-scale supervised training dataset for the medical domain.

R5: Analysis on normal/abnormal diagnosis.

We directly adopt the “normal” classifier head trained on our RP3D-DiagDS to obtain the prediction probability score and compare it with other foundation models, RadIN [53] and BiomedCLIP [ 82], in terms of AUC and AP on the five zero-shot datasets.

R6: Probability distribution and saliency map visualization

we pick a total of six diseases, where we demonstrate the prediction probability distribution on cases in the test set for a certain class. We utilize Score-CAM, a visual explanation method based on class activation maps, to visualize the areas that contribute more to make a decision on the target disease class.

For more detailed ablation studies and results, please refer to our paper.



BibTeX


        @article{zheng2023large,
          title={Large-scale Long-tailed Disease Diagnosis on Radiology Images},
          author={Zheng, Qiaoyu and Zhao, Weike and Wu, Chaoyi and Zhang, Xiaoman and Zhang, 
            Ya and Wang, Yanfeng and Xie, Weidi},
          journal={arXiv preprint arXiv:2312.16151},
          year={2023}
        }