By John Spooner, head of Artificial Intelligence, EMEA at H2O.ai
It’s not news to say there are some serious concerns with bias in Artificial Intelligence (AI); recruiting software for programmers that skips women applicants, police systems that seem to be racist, the list goes on, unfortunately.
But be assured: the AI sector knows this. And while there is no silver bullet that will resolve the issue today, there are practical ways in how to address this bias issue, and move to the concept of ‘responsible’ AI.
We need to appreciate that there is going to be bias in AI models. The underlying challenge is that the data that you’re feeding into AI systems is a representation of reality, and machine learning has inherited bias. Why? Because these models are a representation of the world, and bias exists in the world.
Is Bias too endemic to solve?
It’s broadly a given that humans are imperfect creatures, and we don’t always use pure logic to make sense of the world. Bias has always been there in any decision that we make. But if bias is endemic, is it too big a problem to solve? Absolutely not; there are things that you can do to address those particular issues. We’re putting the onus on the machine learning and data science people to fix this, and we’re rising to the challenge, but I suspect that there is some form of regulation that’s needed too. However we progress this, the first step is to ensure that we’re educating ourselves that bias exists in different forms.
So, how do we uncover the bias and build a framework to enable people to trust the systems and the results that emerge. Well, if we think about machine learning and what it’s trying to do, it’s ultimately trying to create systems that learn through experience. And the only way that it can do that is to build on data and to build systems that are based on data.
This ensures that we are aware that before we build machine learning models and incorporate those into AI systems, bias will exist within this process. So how are we making sure that we’re putting things in place to prevent bias systematically going through that machine learning process? At present, the majority of the time these models are optimised for accuracy. So data scientists are trying to squeeze out the extra percentage point—but by focusing on the accuracy, they forget about optimising on fairness.
So the challenge data scientists have is that they need to make sure that when they’re building those machine learning models, that the data is clean, accurate, and is free from any bias that could skew results. Sometimes those machine learning algorithms can very quickly spot those biases that are naturally within us humans, and society, but we always need to be very conscious of not introducing even more bias into these systems when we’re selecting data, or identifying the types of data that we’re looking to collect.
Bias often also creeps in with the actual model building process. The key here is to ensure that we document all of the steps that are taken to collect and select the data, and make sure that we’re checking for bias in the data that we put into those models. That means that practically speaking, everyone and anyone, vendor, business person, open source developer, government organisation or citizen who are building AI take the right steps to ensure that the bias is not leaking into the decisioning of these machine learning platforms.
Broadening data science diversity
The next thing is how do we then make sure that we’re exploring all different angles. This is about how we know where to look for where bias could exist. One of the big challenges that we have in data science is typically the kind of person that works in IT, i,e. male and nerdy, as the stereotype goes. So we need to broaden the diversity of the people working in data science if possible, because if we have more voices in the team, you’re able to work out where biases exist in this potential decision-making process.
We must aim to broaden the diversity of the data that we’re collecting and analysing. We normally just jump in and analyse a subset of data, whereas actually we should be thinking, this is just a subset, how could we create a wider selection of data that this machine learning model could build from? Many organisations go straight in, and build the most accurate machine learning model: we’ve got some data, so let’s build a model and increase the accuracy rather than to say, What are we trying to solve here? What are the business decisions that we’re going to be making off the back of this machine learning model, and what data are we going to be using to feed that? And let’s look at that data, first of all and see if there are any challenges with the data that we’re using before we build the machine learning model? I don’t believe that we spend enough time doing that, but we absolutely can and should. You ideally need to ensure that you’ve got a true representation of representative characteristics as far as practical.
So, these are some of the simple steps to get bias identified and the building blocks of Responsible AI created. But can this process be automated? I don’t believe so, yet. What you can automate is the process of checking the quality of the data across a number of different dimensions, so for example, check it to make sure that you’ve got a representative sample of males and females or a representation sample of all of the protected characteristics that you may want to protect.
One potential blockage is there is no central standard about what de-biased data needs to look like, but there’s no reason at all we couldn’t move toward such a standard. We do also now have tools that can help measure how fair a model is, such as if there’s any bias toward different groups via ‘disparate impact analysis’. This works by viewing selected protected characteristics, seeing commonality in the dataset across different protected characteristics, and checking if you are getting similar types of results across it (e.g. if you were to look at gender, for example, is the accuracy of the model the same for males and females?).
However, with models, they do ‘go off’ easily and become less reliable. You need to constantly monitor, review and rebuild those machine learning models, to ensure that bias does not creep into the decisioning frameworks. So a very way of avoiding bias is to make sure that you’ve got a governance process around building models.
We are working with a number of global finance services organisations on this issue, because they are very focused on eliminating bias. For example, a major US card issuer is using our technology to speed up the process of checking its models, and is now able to create measures that break down individual machine learning predictions into their components. This is all happening via very sophisticated machine learning algorithms to work out which customers to accept or decline, but the brand is also able to give detailed explanations for their decisions. That means that for every single credit decision it accepts or declines, the company can give an individual the specific reasons of why their application was approved, or why their prediction was declined.
Genuine concern about gender or racial bias
In summary, AI does need to address this core bias challenge, but it also needs to be acknowledged that AI developers are aware of the issue. Increasingly, brands are also becoming more open that they’re using machine learning technology to speed up a process, but they’re also very conscious that these ML models have a reputation of being a black box. Every company I speak to is genuinely very conscious that it cannot have machine learning-based decisioning processes open to any gender or racial bias, and they are increasingly checking their decisions for as much bias as they can catch.
From a commercial perspective, firms need to of course consider the reputational risk: if you start to get known in the market that you’re automating decisions but not offering the right products to people from some particular background, that could hurt the business.
The bias issue can potentially damage the business, and you want to avoid that as much as possible. So let’s consider the challenge and work to manage it in the most effective way, for everyone’s benefit.