Machine learning algorithms form biases, like humans, based on the data they observe. However, unlike humans, the algorithms can readily admit their biases when probed appropriately. Using publicly available lists of names, we enumerate biases in an unsupervised fashion from word embeddings trained on public data. Gender, racial, and religious biases emerge, among others. We then analyze the effects of these biases on a problem motivated by recommending jobs to candidates. To collect data for this task, we extract hundreds of thousands of third-person bios from the web. The straightforward application of machine learning is found to amplify some biases. However, unlike humans, it is easy to put in place algorithmic corrections to mitigate this bias amplification.
Joint work with: Maria De Arteaga (CMU); Alexey Romanov (UMass Lowell); Nat Swinger (Lexington HS); Tom Heffernan (Shrewsbury HS); Christian Borgs, Jennifer Chayes, and Hanna Wallach (MSR); Alex Chouldechova (CMU; Mark Leiserson (UMD); Sahin Geyik and Krishnaram Kenthapadi (LinkedIn)