Identifying Unconscious Bias in Predictive Models

Sriram Parthasarathy
Product Coalition
Published in
4 min readMar 21, 2019

--

Image: Source

Organizations are turning to machine learning (ML) to leverage vast amounts of data to help them make more informed decisions. ML is used in a variety of industries and can assist on functions such as: claim approval, customer churn, advertising and even medical diagnoses. Lately, a lot of emphasis has been given to how these ML models and algorithms handle bias, mainly related to gender, age and race.

This guide provides practical examples for product managers to understand the impact of model bias when incorporating artificial intelligence (AI) insights into their applications. Though an algorithm could be potentially biased, most of the time the problem is in the data which could be prejudiced.

Let’s discuss a few example applications where biased models have a significant impact.

Hiring Process: Bias in the Input Data

A company’s sales team may comprise of mostly 25-year-old white males. The algorithm that uses this data thinks the ideal profile for a salesperson is of age 25 and a white male.

If someone older than 25 years old, or a female, applies for the position, they will not get a good hiring score. A company may be unbiased in the hiring process, but the imbalance in the input data leads to incorrect predictive hiring scores, which adds unintended bias to the hiring process.

In this scenario, should the age and gender be excluded from the model?

Rental Contract: Bias from Correlated Input Variables

Just removing the gender or age from the input data set does not solve the problem of bias. Unconscious bias might still be a factor as a result of highly-correlated proxy variables that are included in a model. For example, zip code could be a proxy for race. Even though race may not be present in the input, having zip code data in the model could have its proxy effect.

Consider a real estate establishment renting out houses. A rental property could be in a neighborhood which has a higher concentration of young, educated Asians. So, the algorithm may think that the perfect rental application is for young Asians. Anyone who applies outside of that specific demographic may not receive the right rental score, even though they may be ideal rental candidates.

Bias in the Training Sample Used

Sample bias occurs when the distribution of one’s training data doesn’t reflect the actual environment that the ML model will be running. An example of this is an organization using only East Coast-specific training data to predict sales for the entire country. Another sample bias scenario is training self-driving cars with only images and videos taken in daylight, when in reality, you’d want the cars to drive at all times of day.

Both are biased models as they are trained with biased samples.

Failed Sensors: Error in Data Capture

An error in data capture is another scenario in which bias can occur. Consider a manufacturing company spread across many shops and monitors. The company needs sensor values from their machines to run their business efficiently. Say one of the sensors in the New York shop is registering values 10 percent higher than other locations but is still performing correctly. This type of bias could skew the ML results in a particular direction.

Lack of precision in the measurement can introduce noise in general, which might average out over time, but a systematic distortion in the sensor values in one direction can impact the results of a model.

Final Thoughts

As you can see, there are several ways bias can crop into a model directly or indirectly. As a product manager, the first step is to have an open mind and to acknowledge that there could be a bias in your model or data. Incorporating bias testing as part of the model validation process, could help you reach your long-term goals. To start, look for bias with the standard attributes such as race, gender and age in addition to the proxy attributes like zip code.

--

--