Explainable post-training bias mitigation with distribution-based fairness metrics
Ryan Franks, Alexey Miroshnikov
Abstract
We develop a novel optimization framework with distribution-based fairness constraints for efficiently producing demographically blind, explainable models across a wide range of fairness levels. This is accomplished through post-processing, avoiding the need for retraining. Our framework, which is based on stochastic gradient descent, can be applied to a wide range of model types, with a particular emphasis on the post-processing of gradient-boosted decision trees. Additionally, we design a broad class of interpretable global bias metrics compatible with our method by building on previous work. We empirically test our methodology on a variety of datasets and compare it to other methods.
Related Papers
A Reproducibility Study of Product-side Fairness in Bundle Recommendation2025-07-18FedGA: A Fair Federated Learning Framework Based on the Gini Coefficient2025-07-17Looking for Fairness in Recommender Systems2025-07-16FADE: Adversarial Concept Erasure in Flow Models2025-07-16Fairness-Aware Grouping for Continuous Sensitive Variables: Application for Debiasing Face Analysis with respect to Skin Tone2025-07-15Guiding LLM Decision-Making with Fairness Reward Models2025-07-15Fairness-Aware Secure Integrated Sensing and Communications with Fractional Programming2025-07-15Meta-Reinforcement Learning for Fast and Data-Efficient Spectrum Allocation in Dynamic Wireless Networks2025-07-13