From the course: Using Data Effectively and Reliably with AI Analytics
AI reliability and biases
From the course: Using Data Effectively and Reliably with AI Analytics
AI reliability and biases
- One reason AI tools are so appealing is that they can make a complex task appear superficially simple. Let's say your boss asks you whether it makes financial sense for your company to continue its social media marketing campaign that's been ongoing for the last month. If you were to enter that question into an AI tool and ask for an analysis to support the conclusion, you'd get a response. For example, it could compare revenues in the month when the campaign was active to the prior month. If it finds an increase, it may recommend continuing the campaign. Well, this is an oversimplified example. One can see the draw of such a powerful tool, which can quickly and easily complete complex and time-intensive tasks. - However, the appearance of simplicity is a facade. The AI tool is doing every step that you, the analyst, would've done if you were completing the task without its help. This includes defining the question of interest, generating a testable hypothesis, identifying the relevant data, building a model, running the analysis, and communicating the results. This means that you need to consider and account for the same errors and issues whether you use an AI tool or not. - Why is this important? Because AI is not always right. A survey on the state of AI by McKinsey and Company found that companies reported inaccuracies as the most common risk to using AI. And a survey of global CEOs by PWC found that 52% see generative AI as likely to increase the spread of misinformation in their companies. Further, surveys of practitioners and AI users also find challenges. A survey by Aporia found that 89% of machine learning engineers in companies that use LLMs and generative AI models like chat bots and virtual assistants say their model shows signs of hallucination. The severity of these hallucinations can range from factual errors to content that's biased and even dangerous. For example, only 7% of desk workers in Slack's workforce survey found AI to be completely trustworthy. - However, because the capabilities of AI tools will augment and accelerate workflows, auditing and validating the data and the methodological choices and assumptions underneath work prepared by AI will become a more critical function of an analyst role. Let's start with the data because the data is the foundation of an AI model. Sometimes immense amounts of data. An analyst must ensure that an AI tool is using the right data, not just the available data, if they're going to trust that the results are accurate and meaningful. Data errors and biases can come in many forms. For example, a model may exhibit measurement bias if the metrics collected and used to train the model differ from what we actually want to measure. For example, imagine trying to measure total sales in a retail store, but you only have data on the number of people who came into the store that day. There can be biases due to missing data, especially when that data is not missing at random. For example, one medical study found that patients in contact isolation have fewer recorded entries for their vital signs than other patients. An algorithm trying to identify clinical deterioration in contact isolated patients may be biased because it is basing its predictions on less complete data. There can be biases such as historical bias in which underlying data may already contain the bias, such as relying on past data in which more men were promoted than women, and hence a model trained on this data favors male candidates. Or algorithms can suffer from representation bias if it lacks sufficient training data from all sources that is used to make predictions for. For example, a model in the medical space which is trained on data from hospitals may perform poorly when applied to outpatient facilities. While these are by no means an exhaustive list of biases in data, they highlight why investigating the data a model uses is critical in an AI world where it is tempting to let the data be more opaque - Separate from the data itself, understanding, testing, and substantiating the methodology employed by an AI tool is equally as important. AI tools are trained using machine learning algorithms that can be hard to explain or appear akin to a black box. These can lead to errors such as learning biases where the way a model is defined can cause systematic errors. Furthermore, AI tools will give you the answers, but just like a human, may not necessarily provide a step-by-step accounting of how they reached that answer. Applying your analytical skillset to interrogate the assumptions, conduct robustness checks, and investigate counterintuitive results will ensure that a sound methodology has been employed. For instance, if we return to the simplistic example of whether your company should continue its social media marketing campaign, one can begin to unpack the assumptions and choices of the AI model with the same questions you would ask of your own work. Such as, why is aggregate online revenue the selected metric? Did profit also increase or was the campaign too costly? Were certain products more successful than others? Did some advertising platforms provide a higher ROI and deserve a larger investment going forward? Does comparing revenues from the campaign month to the prior month account for seasonal differences or holiday differences? How did the ROI of the social media marketing campaign compare to an alternative use? - The different paths that these questions could take us down is an important reminder that what makes AI special is its ability to synthesize data to help make choices and provide answers. However, an analyst should not assume that an AI model is correct simply because it's a computer. It is not a calculator adding two numbers. Just as analysis conducted by humans can be unreliable because of the decisions that were made, so too can the work of AI. Assessing AI analytics for biases and errors will be crucial for those who want to succeed in a changing analytic space.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.