TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/CheXclusion: Fairness gaps in deep chest X-ray classifiers

CheXclusion: Fairness gaps in deep chest X-ray classifiers

Laleh Seyyed-Kalantari, Guanxiong Liu, Matthew McDermott, Irene Y. Chen, Marzyeh Ghassemi

2020-02-14FairnessMedical DiagnosisMulti-Label LearningDiagnosticAllMulti-Label Classification
PaperPDFCode(official)

Abstract

Machine learning systems have received much attention recently for their ability to achieve expert-level performance on clinical tasks, particularly in medical imaging. Here, we examine the extent to which state-of-the-art deep learning classifiers trained to yield diagnostic labels from X-ray images are biased with respect to protected attributes. We train convolution neural networks to predict 14 diagnostic labels in 3 prominent public chest X-ray datasets: MIMIC-CXR, Chest-Xray8, CheXpert, as well as a multi-site aggregation of all those datasets. We evaluate the TPR disparity -- the difference in true positive rates (TPR) -- among different protected attributes such as patient sex, age, race, and insurance type as a proxy for socioeconomic status. We demonstrate that TPR disparities exist in the state-of-the-art classifiers in all datasets, for all clinical tasks, and all subgroups. A multi-source dataset corresponds to the smallest disparities, suggesting one way to reduce bias. We find that TPR disparities are not significantly correlated with a subgroup's proportional disease burden. As clinical models move from papers to products, we encourage clinical decision makers to carefully audit for algorithmic disparities prior to deployment. Our code can be found at, https://github.com/LalehSeyyed/CheXclusion

Results

TaskDatasetMetricValueModel
Multi-Label ClassificationChestX-ray14Average AUC on 14 label84.9DensNet121
Multi-Label ClassificationMIMIC-CXRAverage AUC on 14 label0.8340000000000001DensNet121
Multi-Label ClassificationCheXpertAVERAGE AUC ON 14 LABEL0.805DensNet121

Related Papers

Hear Your Code Fail, Voice-Assisted Debugging for Python2025-07-20A Reproducibility Study of Product-side Fairness in Bundle Recommendation2025-07-18Smart fault detection in satellite electrical power system2025-07-18FedGA: A Fair Federated Learning Framework Based on the Gini Coefficient2025-07-17Demographic-aware fine-grained classification of pediatric wrist fractures2025-07-17Looking for Fairness in Recommender Systems2025-07-16FADE: Adversarial Concept Erasure in Flow Models2025-07-16Trustworthy Tree-based Machine Learning by $MoS_2$ Flash-based Analog CAM with Inherent Soft Boundaries2025-07-16