QA for AI/ML applications requires a different approach when compared to traditional applications. Unlike the latter that has set business rules with defined outputs, the continuously evolving nature of AI models makes their outcomes ambiguous and unpredictable. QA methodologies need to adapt to this complexity and overcome issues relating to comprehensive scenario coverage, lack of security, privacy, and trust.
How to test AI and ML applications?
The standard approach to AI model creation, also known as the cross-industry standard process for data mining (CRISP-DM), starts with data acquisition, preparation, and cleansing. The resulting data is then used on multiple model approaches iteratively before finalizing the perfect model. Testing this model starts by using a subset of data that has undergone the process outlined earlier. By inputting this data (test data) into the model, multiple combinations of hyperparameters or variations are run on the model to understand its correctness or accuracy, ably supported by appropriate metrics.
Groups of such test data are generated randomly from the original data set and applied to the model. Very similar to the new data simulation approach, this process dictates how the AI model will scale in the future with accuracy.
Challenges in data-led QA for AI/ML applications
The data-led testing and QA for AI/ML applications outlined above suffer from myriad issues, some of which are given below.
The decision-making algorithms of AI models have always been perceived to be black boxes. Of late, there is a strong move towards making them transparent by explaining how the model has arrived at a set of outcomes based on a set of inputs. It helps understand and improve model performance and helps recipients grasp the model behavior. This is even more paramount in complaint-heavy areas like insurance or health care systems. Multiple countries have also started mandating that along with the AI model, there needs to be an explanation set on the decisions made.
Post facto analysis is key to addressing explainability. By retrospectively analyzing specific instances misclassified by an AI model, data scientists understand the part of the data set that the model actively focused on to arrive at its decision. On similar lines, positively classified findings are also analyzed.
Combining both helps to understand the relative contribution made by each data set and how the model stresses specific attribute classes to create its decision. It further enables data scientists to reach out to domain experts and evaluate the need to change data quality to get more variation across sensitive variables and understand the need to re-engineer the decision-making parameter set used by the model. In short, the data science process itself is being changed to incorporate explainability.
You may also like: 5 points to evaluate before adopting AI in your organization
Decision-making ability of an AI model hinges to a large extent on the quality of data that it’s exposed to. Numerous instances show seepage of biases into the input data or how the models are streamed, like Facebook’s gender discriminatory Ads or Amazon’s AI-based automated recruiting system that showed discrimination against women.
The historical data that Amazon used for its system was heavily skewed on account of male domination across its workforce and the tech industry over a decade. Even large models like open AI or codepilot suffer from the percolation of world biases into their models since they are trained on global data sets that are themselves biased. While removing biases, it’s sufficient to understand what has gone into data selection and the feature sets that contribute to decision-making.
Detecting bias in a model mandates evaluating and identifying those attributes that excessively influence the model compared to other attributes. Attributes so unearthed are then tested to see if they represent all available data points.
According to Deloitte’s State of AI in the Enterprise survey, 62% of respondents view cyber security risks as a significant concern while adopting AI. ‘The Emergence Of Offensive AI’ report from Forrester Consulting found that 88% of decision-makers in the security industry believe offensive AI is coming.
Since AI models themselves are built on the principle of becoming smarter with each iteration of real-life data, attacks on such systems also tend to become smarter. The matter is further complicated by the rise of adversarial hackers whose goal is to target AI models by modifying a simple aspect of input data, even to the extent of a pixel in an image. Such small changes can potentially bring out more significant perturbations in the model, leading to misclassifications and erroneous outcomes.
The starting point for overcoming such security issues is to understand the type of attacks and vulnerabilities in the model that hackers can exploit. Gathering literature on such kinds of attacks and domain knowledge to create a repository that can predict such attacks in the future is critical. Adopting AI-based cyber security systems is an effective technique to thwart hacking attempts since the AI-based system can predict hacker responses very similar to how it predicts other outcomes.
With the increased uptake of privacy concerns like GDPR, CCPA across all applications and data systems, AI models have also come under the scanner. More so because AI systems depend heavily on large volumes of real-time data for intelligent decisions – data that can reveal a tremendous amount of information about a person’s demographic, behavior and consumption attributes, at the minimum.
The AI model in question needs to be audited to evaluate how it leaks information to address privacy concerns. A privacy-aware AI model takes adequate measures to deanonymize, pseudonymize or use cutting-edge technology for differential privacy. By analyzing how privacy attackers get access to input training data from the model and reverse engineer effectively to get access to PII (Personally Identifiable Information), the model can be evaluated for privacy leakage. A two-stage process of detecting the inferable training data by inference attacks and then identifying the presence of PII in the data can help identify privacy concerns when the model is deployed.
Want to know more? Read: Best practices for test data management in an increasingly digital world
Ensuring accuracy in QA for AI/ML applications
Accurate testing of AI-based applications calls for extending the notion of QA beyond the confines of performance, reliability, and stability to newer dimensions of explainability, security, bias, and privacy. The international standards community has also embraced this notion by expanding the conventional ISO 25010 standard to include the aforementioned facets. As AI/ML model development progresses, focus across all these facets will lead to better performing, continuously learning, a compliant model with the ability to generate far more accurate and realistic results.