Large-scale Long-tailed Disease Diagnosis on Radiology Images


Qiaoyu Zheng*1,2
Weike Zhao*1,2
Chaoyi Wu*1,2
Xiaoman Zhang1,2

Ya Zhang1,2
Yanfeng Wang1,2,
Weidi Xie1,2,

1CMIC, Shanghai Jiao Tong University
2Shanghai AI Laboratory

Code [GitHub]

Paper [arXiv]

Dataset [HuggingFace]


Abstract

In this study, we aim to investigate the problem of large-scale, large-vocabulary disease classification for radiologic images, which can be formulated as a multi-modal, multi-anatomy, multi-label, long-tailed classification. Our main contributions are three folds: (i), on dataset construction, we build up an academically accessible, large-scale diagnostic dataset that encompasses 5568 disorders linked with 930 unique ICD-10-CM codes, containing 39,026 cases (192,675 scans). (ii), on model design, we present a novel architecture that enables to process arbitrary number of input scans, from various imaging modalities, which is trained with knowledge enhancement to leverage the rich domain knowledge; (iii), on evaluation, we initialize a new benchmark for multi-modal multi-anatomy long-tailed diagnosis. Our method shows superior results on it. Additionally, our final model serves as a pre-trained model, and can be finetuned to benefit diagnosis on various external datasets.



The RP3D-DiagDS Dataset

Overall, the proposed dataset contains 39,026 cases, of 192,675 images from 9 diverse imaging modalities and 7 human anatomy regions, note that, each case may contain images of multiple scans. The data covers 5,568 different disorders, that have been manually mapped into 930 ICD-10-CM codes.

Specifically, cases in our dataset are sourced from the Radiopaedia website -- a growing peer-reviewed educational radiology resource website, that allows the clinicians to upload 3D volumes to better reflect real clinical scenarios. Additionally, all privacy issues have already been resolved by the clinicians at uploading time.

For each cases in Radiopaedia, 'Related Radiopaedia articles' contains links to related articles named with corresponding disorders for radiology images, which are treated as diagnosis labels and have been meticulously peer-reviewed by experts in Radiopaedia Editorial Board.

After article filtering, manual mapping and normal cases adding, we get 39,026 cases containing 192,675 images labeled by 5,568 disorder classes and 930 ICD-10-CM classes. We will continually maintain the dataset, growing the case number.

Analysis of the Cases in RP3D-DiagDS dataset

RP3D-DiagDS comprises images from 9 modalities, namely, computed tomography (CT), magnetic resonance imaging (MRI), X-ray, Ultrasound, Fluoroscopy, Nuclear medicine, Mammography, DSA (angiography), and Barium Enema. Each case may include images from multiple modalities, to ensure precise and comprehensive diagnosis of disorders. Overall, approximately 19.4% of the cases comprise images from two modalities, while around 2.9% involve images from three to five modalities. The remaining cases are associated with image scans from a single modality.

RP3D-DiagDS comprises images from various anatomical regions, including head and neck, spine, chest, breast, abdomen and pelvis, upper limb, and lower limb, providing comprehensive coverage of the entire human body.

For both disorder and disease classification, each case can correspond to multiple disorders, resulting in RP3D-DiagDS a long-tailed, multi-label classification dataset. We define the `head class' category with case counts greater than 100, the `body class' category with case counts between 30 and 100, and the `tail class' category with case counts less than 30.



Architecture

The architecture of our proposed visual encoder and fusion module, together with the knowledge enhancement strategy. (a) shows the details of the vision encoder. We design two variants to fit in the two main visual backbones, i.e., ResNet and ViT. (b) shows the transformer-based fusion module, enabling case-level information fusion. (c) shows the knowledge enhancement strategy. We first pre-train a text encoder with extra medical knowledge with contrastive learning, i.e., synonyms, descriptions and hierarchy, termed as knowledge encoder and then we view the text embedding as a natural classifier to guide the diagnosis classification.



Results

R1: Classification results on Disorders and ICD-10-CM levels.

In the table ``FM'' represents the fusion module and ``KE'' represents the knowledge enhancement strategy. We report the results on Head/Medium/Tail class sets separately.

R2: ROC curves on Disorders and ICD-10-CM.

As depicts in ROC curves above, the shadow in the figure shown the 95% CI (Confidence interval) and FM, KE are short for Fusion Module and Knowledge Enhancement.

R3: The AUC Score Comparison on Various External Datasets.

For each dataset, we carry out experiments with different training data portions, denoted as 1% to 100% in the table. For example, 30% represents we use 30% of data in the downstream training set for finetuning our model or training from scratch. ``SOTA'' denotes the best performance of former works (pointed with corresponding reference) on the datasets and ``Zero-shot'' denotes directly evaluate our model on external datasets. We mark the gap between ours and training from screatch on the subscript of uparrows in the table.

For more detailed ablation studies and results, please refer to our paper.



BibTeX


        @article{zheng2023large,
          title={Large-scale Long-tailed Disease Diagnosis on Radiology Images},
          author={Zheng, Qiaoyu and Zhao, Weike and Wu, Chaoyi and Zhang, Xiaoman and Zhang, 
            Ya and Wang, Yanfeng and Xie, Weidi},
          journal={arXiv preprint arXiv:2312.16151},
          year={2023}
        }