Deep Reinforcement Learning for Data-Agnostic Post-Training Debiasing of Black-Box Machine Learning Models

Pinkava, Thomas

Deep Reinforcement Learning for Data-Agnostic Post-Training Debiasing of Black-Box Machine Learning Models

Files

Pinkava_washington_0250O_26926.pdf (7.25 MB)

Date

2024-09-09

relationships.isAuthorOf

Pinkava, Thomas

Abstract

As reliance on Machine Learning systems in real-world decision-making processes grows, ensuring these systems are free of bias against sensitive demographic groups is of increasing importance. Existing techniques for automatically debiasing ML models generally require access to either the models’ internal architectures, the models’ training datasets, or both. In this paper we outline the reasons why such requirements are disadvantageous, and present an alternative novel debiasing system that is both data- and model-agnostic. We implement this system as a Reinforcement Learning Agent and employ it to debias four target ML model architectures over three datasets. Our results show performance comparable to data- and/or model-gnostic state-of-the-art debiasers.