Lentis/Algorithmic Bias

Algorithmic bias refers to undesirable results from a computer system that incorrectly or unfairly prioritizes one group over another. This chapter currently focuses on algorithmic bias in the United States.

Background
Of particular importance is algorithmic bias in Machine Learning (ML) and Artificial Intelligence (AI).

Companies and individuals are increasing their reliance on AI systems. Search engine results, social media recommendations, hiring decisions, stock market predictions, and policing practices use information from AI predictive modeling. Algorithmic bias in these models is particularly dangerous due to human Automation Bias, the tendency for humans to disregard contradictory information when presented with a computer-generated solution. This increases the likelihood that bias in the AI system will result in unfair or inequitable outcomes where the system is in use.

How does Algorithmic Bias occur in AI systems? Often, AI systems operate on massive data sets amalgamated from existing sources without refinement. This leads to any existing bias, often institutional or implicit, being passed along to the AI system. Consider an AI hiring algorithm created to give a competitive advantage in finding the best possible candidates for a computer science position. Because computer science is currently male dominated field, the hiring algorithm might erroneously prioritize male applicants. An example of a similar case is the Amazon Hiring Algorithm. In this way, AI systems will replicate existing bias and perpetuate existing prejudice in the status quo.

Assembled bias, a proposed type of Algorithmic Bias unique to AI and ML systems, describes novel biases introduced by AI systems that do not currently exist in society. Assembled bias arises from the idea that AI models are generative rather than purely statistical. A toy example to better understand this phenomenon is AI generation of realistic images. When asked to create an image of a spider, an AI model might produce a highly detailed, realistically textured image of an eleven-legged spider. Even though the AI has only ever trained with images of eight-legged spiders, the AI has no concept of how to count legs – instead, it creates its own metrics for what defines a spider.

Diversity is notably lacking from AI development and research positions. Only about 20% of AI researchers are women, and other minority groups are similarly underrepresented. Because the end goal when developing an AI system is often quantifying a nebulous concept, the perspectives of those present and working on the system strongly influence the outcome. The lack of diversity in AI positions represents another possible source of bias because the majority perspective will be overvalued in the outcome.

Currently, the unfair outcomes resulting from Algorithmic Bias in AI systems appear to be disadvantaging primarily women and sexual or racial minorities. Combating this unintended result of AI and ML is an ongoing field of research.

Amazon's Hiring Algorithm
In 2014, Amazon initiated a program that used AI to review job resumes. The purpose of the program was to reduce the time spent on finding good candidates to fill job openings. The AI was trained on resumes submitted to the company during the previous 10 years. The demographics of the submitted resumes were similar to that of most tech companies: composed mostly of men, especially in technical roles. Consequently, the AI produced an algorithm that favored male resumes more than female resumes, as most of the previously successful candidates were male. This manifested itself in the algorithm as it penalized resumes that included the word “women’s”, or rewarding those with words more commonly found on male resumes such as “captured” or “executed”. The algorithm was also found to downgrade candidates who graduated from all-women’s colleges.

To combat the gender-bias outcome of the tool, Amazon made the program neutral to gender-correlated terms. By doing so, Amazon engaged in internal auditing, a technique designed to reduce algorithmic bias by ensuring any emerging bias in a machine learning model is caught and stopped. However, it did not address the crux of the problem, which was the underlying data. A different approach would have been to analyze the existing data set to predict where the algorithm could be biased, evaluating and modifying assumptions about the data as necessary. This approach has been successful in other AI applications. In this case, however, Amazon left the underlying data unaddressed. This abetted the algorithm to find additional discriminatory ways of sorting the candidates. The tool was retired in 2017 after “executives lost hope for the project.”

Critics of Amazon argue this algorithm has perpetuated the gender disparity in their workforce. Amazon does not dispute that their recruiters looked at the recommendations generated by the tool, but maintain the tool “was never used by [them] to evaluate candidates.” Opponents of Amazon argue the two are mutually exclusive; recruiters were surely influenced by a tool that rates candidates from 1-5 stars, even if they claim they were not. Some critics, like ACLU Attorney Rachel Goodman, further contend that AI hiring tools “are not eliminating human bias – they are merely laundering it through software.” Proponents of AI hiring technology are more optimistic: In a 2017 survey from CareerBuilder, 55% of U.S. human resources managers said AI would be a regular part of their work within the next five years. However, as Vice President of LinkedIn Talent Solutions John Jersin explains, the technology has much room for improvement before it can “make a hiring decision on its own.”

Amazon has since revived the project, hoping that emphasizing diversity will lead to more equitable algorithmic outcomes. However, in the absence of critically evaluating the data the models are trained on, it is unclear why a certain algorithmic emphasis will lead to a non-discriminatory outcome of the algorithm.

Facial Recognition Algorithms
Several companies, including IBM and Microsoft, have developed facial recognition algorithms to recognize people only using their face. These algorithms are available to the public and have been used by police departments in conjunction with video surveillance programs. In 2018, Joy Buolamwini studied biases in gender classification for facial recognition algorithms from IBM, Microsoft, and Face++, and discovered that all of them perform better on "lighter faces than darker faces", and performed worst for "darker female faces ". All of the algorithms had an accuracy gap between "lighter males and darker females" of over 20% with IBM's algorithm performing the worst with a gap of 34.4%. In 2019, the National Institute of Standards and Technology (NIST) confirmed the issue of algorithmic gender bias and racial bias is an industry wide problem with 189 of the studied facial recognition algorithms being least accurate for people of color, and in particular, women of color. In response to the research IBM stopped working on their facial recognition algorithm, Microsoft and Face++ released improved versions of their facial recognition algorithms and reduced their accuracy gap by over 19% , and Amazon stopped police departments from using its facial recognition algorithm.

Loan Lending Algorithms
Financial technology (Fintech) is a field in which loan lending algorithms have begun to replace face-to-face meetings. In the U.S., 45% of the largest mortgage lenders offer software-based loan solutions but they have been scrutinized for discriminative pricing. A study by Bartlett et al. found that through online platforms and face-to-face meetings Black and Latinx borrowers paid, on average, 7.9 basis points higher interest on purchase loans than comparable borrowers and paid 3.6 basis points more on refinance loans. This difference costs Black and Latinx borrowers $756M annually. Researchers found the discrimination in algorithms to be about 40 percent less than face-to-face lenders. Specifically, when Fintech algorithms are used, underrepresented borrowers pay 5.3 basis points higher than their counterparts (2.6 points lower than traditional methods). The study analyzed 30-year, fixed-rate, single-family residential loans issued by Fannie Mae and Freddie Mac between 2008 and 2015, and lenders were found to make 11 percent to 17 percent higher profits from purchase loans to underrepresented groups. Although loaning algorithms demonstrate less bias than traditional face-to-face lenders, the persistence of bias of any degree emphasizes the need to examine loaning algorithms and determine the source of the algorithmic bias.

Loaning algorithms are based on machine learning and big data, which use a wide array of customer attributes to set prices. For example, geography can play a major role since the algorithm can target locations where comparison shopping is less likely. The algorithm may determine areas that are financial deserts where financial reserves are low and applicants are faced with monopoly pricing instead of having access to many options. The algorithm may not specifically target underrepresented applicants, but its logic may set a higher price knowing the applicant is more likely to accept it.

Underlying historical discrimination in training data may also lead algorithms to disfavor underrepresented groups and further wealth disparity. Fintech loaning algorithms use the prospective borrower’s credit histories, employment status, assets, debts, and the size of the loan requested to set interest rates. “If the data that you’re putting in is based on historical discrimination, then you’re basically cementing the discrimination at the other end,” says Aracely Panameño, director of Latino affairs for the Center for Responsible Lending. Research also shows that payday loan sellers often prey on neighborhoods predominantly populated with people of color since they typically have fewer bank branches. Banks report both positive and negative credit behavior while payday loan services only report missed payments. As a result, underrepresented groups from these neighborhoods find themselves with incomplete or skewed credit histories that are later fed into loan financing algorithms.

The Algorithm
The Correctional Offender Management Profiling for Alternative Sanctions algorithm (COMPAS), is a machine learning algorithm for judicial decision-making during criminal sentencing. Used in Wisconsin, New York, California, and Florida, COMPAS predicts the potential risk of recidivism. Created by Northpointe, Inc. (now Equivant), this commercial algorithm uses a questionnaire to categorize defendants as "low-risk", "medium-risk", or "high-risk", with respective scores of 1-4, 5-7, or 8-10. Factors such as age, gender, and criminal history are used while race is not. Used still today to advise on bail, sentencing, and early release, many are questioning the validity and fairness of COMPAS in providing objective advice, particularly by race.

In 2016 the Pulitzer Prize-winning nonprofit news organization ProPublica conducted a study to assess COMPAS for racial bias. A COMPAS score of medium or high risk and a defendant recidivating in 2 years, and a COMPAS score of low and a defendant not recidivating in 2 years are considered correct predictions. ProPublica found that a correct prediction of recidivism was made for white and black defendants 59 percent and 63 percent of the time, respectively. This is roughly the same rate. Upon questioning, the company said "it had devised the algorithm to achieve this goal. A test that is correct in equal proportions for all groups cannot be biased.” Northpointe Inc. even cites other studies within their Practitioner's Guide confirming the success and neutrality of their algorithm.

ProPublica found that Black defendants had uniform scores from 1-10, while White defendants had predominantly lower scores. After adjusting for Black defendants having higher recidivism rates overall, ProPublica found that COMPAS, while maintaining a similar accuracy, is "more likely to misclassify a black defendant as higher risk than a white defendant...the test tended to make the opposite mistake with whites."

Perpetuated Systemic Racism
COMPAS perpetuates the historic and structural bias that is found in the criminal system. For example, COMPAS considers one's area of residence. Minority-dominated areas are often more policed, which inflates arrest statistics. If geography correlates to recidivism, then race indirectly does too. Thus, these correlations can be dangerous when not taking validity measures. As a private company, Northpointe has no legal obligation to share details on how COMPAS calculates its score or weighs its variables.

Defendants labeled high/medium risk may be subject to harsher sentencing. The United States Sentencing Commission found that offenders who were sentenced to an incarceration length between 60 to 120 months and over 120 months were significantly less likely to recidivate. If a defendant is suspected of being high-risk, they may be given a longer sentence, and be less likely to receive bail or early release. COMPAS’s mislabeling of Black defendants may result in negative consequences despite reported racial neutrality, and could further perpetuate racial inequity in the criminal justice system.

The Algorithm
Several U.S. health care systems use commercial algorithms to guide health decisions and target patients for "high-risk care management" programs to help ensure that appropriate care is provided. Most of these programs are beneficial to patients with critical conditions because they have teams of specialized nurses, extra primary care appointment slots, and other scarce resources. As a result, hospitals and insurance companies have relied on one specific algorithm to minimize the cost for patients while targeting sicker patients that would benefit most from these programs.

In 2019, Obermeyer et al. conducted a study funded by the National Institute for Health Care Management Foundation on this commercial care management algorithm in response to “the growing concern that algorithms may reproduce racial and gender disparities via the people building them or through the data used to train them." This predictive-risk algorithm generated a risk score for each patient to infer their medical needs, which was primarily based on the patient's previous health care spending. Based on the score, patients would be automatically enrolled to the program, referred to their primary care physician for consultation, or no recommendations were made.

The study found that Black patients with the same level of algorithm-predicted risk as White patients had 26.3% more chronic illnesses. Furthermore, researchers found that when looking at specific biomarkers that index the severity of various chronic illnesses (hypertension, diabetes, bad cholesterol, etc.), Black patients had more severe illnesses than White patients with the same risk score. This was due to Black patients generating lower expected medical costs than White patients with the same chronic illnesses. These spending differences resulted in disparities between the perceived level of sicknesses between Black and White patients since the algorithm focused on patient health care spending as it's primary mechanism for producing risk scores.

Perpetuated Systemic Racism
The study identifies two systemic causes for reduced health care spending by Blacks that contributed to this algorithm's bias. First, poor patients face several barriers that can impede their access to healthcare despite having health insurance (geography, transportation access, competing demands from jobs or child care, knowledge of reasons to seek care, etc.). Second, race can impact patient spending through either direct discrimination by physicians or changes to the doctor-patient relationship, causing patients to not be recommended for or choosing not to seek further care. These observations demonstrate that accurate methods for decision-making models can indirectly disadvantage specific groups in society and have the ability to perpetuate existing systemic racism.

Conclusion
In the future, algorithmic biases will have an increased impact on human activity. It is clear that unmonitored, unrestrained production and application of algorithms can be detrimental to various groups of people. The ease with which companies and industries can utilize these algorithms creates an environment where inequitable outcomes are perpetuated. Possible avenues for reducing algorithmic biases include legislation, internal auditing, and community adjudication, as well as increased awareness and a culture of responsibility. Many of these avenues are primarily social and non-technical, which accords with the socio-technical nature of algorithmic bias. The increased prevalence of algorithmic biases is simply one way in which technology is impacting human life in the U.S, and more broadly, the world. Many other novel technologies with seemingly unlimited potential must be examined carefully before their widespread adoption, as failure to do so can result in harmful consequences, which is the case for many applied algorithms today. Finally, further research is recommended on the impact of legislation on algorithmic bias, the impact of algorithmic bias in countries outside the U.S., and new bias-mitigation techniques.