Trusting AI: when accuracy is not enough

Software programming is inherently deterministic; built on instructions that could be reviewed at a human scale. There’s trust in the system as the flow of code makes the behaviour predictable and traceable. Machine learning based solutions process large data sets through their algorithms and the complexity makes it hard to interpret how decisions came about. Neural networks and gradient boosting models for example are tweaked for predictive accuracy and it’s difficult to understand the weightings behind each feature within the model. In the hunt for accuracy and competitive advantage, developers of Black Box AI systems have neglected the fundamental need for trust to drive engagement.

Trust – the challenge for AI

Lack of transparency in the decision/prediction making process is problematic for many critical use-cases such as in healthcare and autonomous driving. The gains promised by AI services to enterprises can only be achieved if the stakeholders trust the reasoning and learning behind the service. Apple Card’s recent bad publicity around perceived gender bias is a good example. When a couple with shared finances separately apply for the new Apple Card, the husband receives an offer with 20 times the credit limit of his partner’s despite her credit score being higher when they pay to check. Here the customer service department have compounded the trust and understanding issues by deflecting complaints, blaming “the algorithm”.

The lack of trust in AI stems from three aspects; Bias, Opacity and Volatility. Understanding that these aspects come from us as designers and customers of AI services and not blaming the algorithm can go a long way to rebuilding trust. AI driven outcomes themselves aren’t creating controversial outcomes in isolation.

Just as we don’t blame children for bad behaviour without considering the influence of their parents and peers; uncomfortable AI output may be a reflection of the biases and inputs of it’s designers and datasets.


Incomplete data sets, volatile data inputs and weighting in the collection process all contribute to bias. Users won’t trust a system perceived to be biased but might accept bias where the outcome is seen to be fair. We’re biased in some way everyday. Subconsciously, we’re using heuristics in decision making, going with our “Gut Feel” in making a decision with incomplete information. Heuristics are influenced by social circumstance, practice and cultural norms; all of which lead to inevitable bias. Similarly, machine learning algorithms look for pattern matching and trends in data that can lead to prejudice. We may not be able to completely remove bias but incorporating transparency would put context around the decisions presented.

Similar to the gender bias example above, the COMPAS case of racial bias towards the risks of criminals reoffending is a good case study. Here the model predicted African American recidivism being double that of Caucasian ethnicities, generating false positives. In the 2016 Supreme Court of Wisconsin review, Northpointe, the developers of the COMPAS program refused to reveal their proprietary algorithms but noted the program was designed as an assistive tool rather than a sentencing tool.

‘COMPAS Software Results’, Julia Angwin et al. (2016)


Human brains don’t work like neural networks, nor able to process the enormous datasets and algorithms. In order to trust a decision we need to understand correlation and the underlying mechanisms that justifies each decision. Current Black Box AI lacks the human judgement to assess the context in which the algorithm operates and the implications of decisions. 

IBM Watson for Oncology failed to appeal to physicians not because of results but because IBM’s customers, the physicians, didn’t understand the decision process behind the predictions and without transparency in the model; couldn’t manage the risks of false negatives or positives.


Without an understanding of the underlying data set, the decision making processes nor the curation and feed of incoming data; the Black Box AI is prone to being brittle in efficacy. Artifacts introduced into the dataset can unwittingly degrade the accuracy without warning. Where regulation is critical such as healthcare, new data would compromise approved datasets even if it would improve efficiency.

The lack of Trust is core to the criticisms of Black Box AI. We don’t always understand human reasoning and the processes behind expert judgements and decisions but trust allows us to accept them. Trust can come in many ways and AI solutions need to build trust before they will be accepted and allowed to deliver their promised benefits.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s