We have a cross-functional team that includes people from risk management, legal, product management, and data scientists, and they just ask the right questions, no one is going to say, “You can’t do this because it’s risky.” They’re going to say, “Do you have the right controls? Have you looked at the documentation? Have you looked at the agreement?” That’s the first place to start.
Then when it comes to detecting and mitigating bias, one of the things we do is think about intervention. The reason I mentioned scope is important because no one should build a model just for the sake of it as an intellectual exercise, right?
If you have a preferred model that can be used to move people up a level, if you think about it, in the context of a life insurance application, that’s a support action. So the measure of that model is different than the measure of where a model can push someone down. Once you define the intervention, you define that as a performance test, so you look at accuracy, precision recall, you also use the bias measure that you define and you look at it in subgroups. When we look at all the demographic variables, for example age, we can categorize them, for example 18 to 35, 35 to 45, 45 to 55, we categorize them, we calculate the measure. Then you define a reference group, and the reference group is the one that is historically advantaged, or the one that is the largest in your group.
So, you know, for example, for us, maybe 45 to 55 is the largest population that’s in that group. So we set that as the denominator and we calculate that measure. We take the other measures for the different subgroups and divide them up and we use the 80% rule, it’s just a yardstick. Whatever you want to get from the Equal Employment Opportunity Commission. And if your model is consistent with the measure that you’ve set, you already have to tell us as data scientists why you chose the measure, what the goal of your intervention is, how you evaluate it, and if it’s consistent, then you’re good to go. If it’s not, you have to mitigate and figure out what’s going on. And sometimes it can be the population that you’re applying your model to. So, for example, we had someone who built a model that didn’t perform very well for people 60 and older. But that didn’t really matter because the model was only supposed to be used for candidates who were 60 years old or younger. So that’s the kind of mitigation. You can either update the intervention or change the data or maybe look at the labels because to me, bias testing is just as much a part of performance testing because you don’t want to do worse with a subgroup. Right? Yeah, that doesn’t make sense.
This is a framework that we use, it’s really simple, it’s easy for legal departments to understand, it’s easy for a data scientist to implement and understand in general.
Every data scientist, when they’ve built a model that they’re ready to deploy or use, they have to share those metrics and put them on a piece of paper, describe the data and all the features that you’re using, what your final iteration was, give us your performance metrics and your bias metrics. They’re side by side with the performance analysis to make sure that everything is there. So that kind of helps because if the models aren’t accepted, they’re not going to be subject to legal review or ready to be used. You leave the responsibility on the developer to iterate. The most important thing is to make sure that we have a culture where this is important, a culture where this is supported, so that no one gets penalized. And then a culture that is open to discussing mitigations.