Skip to main content
Skip to footer

Improve Data Quality to Enhance Explainability of Artificial Intelligence

Dr. P.H. Anantha Desik

Today, the use of artificial intelligence (AI) is so widespread that it encapsulates various domains, directly and indirectly, making a significant difference in almost all fields of work. AI applications are used in wide-ranging domains such as security, finance, healthcare, medicine, and criminal justice. Successful AI-based models have become increasingly complex and are difficult to express in easily understandable human terms, adversely affecting the adoption of AI models. AI model explainability depends on the quality of the data. If the data is not qualitatively rich, it will result in inaccurate insights, leading to unpredictable decisions in human context models.

Accuracy and interpretability in AI

AI model algorithms are classified based on the extent to which they can be interpreted or on the degree of accuracy. While linear and smooth relationship models such as classification, linear regression, decision trees, and K-Nearest Neighborhood (KNN), are easily interpretable, they are less accurate. On the other hand, support vector machines, random forest, and boosting algorithms are less interpretable but highly accurate. Neural network models are highly accurate but less interpretable. It has been observed that medium and less interpretable models require more explainability and typically use more explainable tools and frameworks. However, much depends on the data quality of the model being used or exposed.

AI explainability models

This has two approaches, model-based (where interpretable ML models are built), and post-hoc (where derived explanation is used for complex models). The model-based approach is linear, based more on input and output explainability. The post-hoc approach is either not known or a known design, based on data and its relationships.  Model explainability approaches can further be categorized as model-agnostic, applicable to all model types, and model-specific, applicable only to specific models. Explainable AI depends on data types such as image, text, numeric, audio and on the explanation types such as visual, data points, and feature importance. These parameters for explainability can be mapped, derived, and processed from the original data types, parameters, distributions, and relations of the model approach. All depends on the quality and trustworthiness of the data, since without data, there is no explainability.

Explainability and interpretability are interchangeably used in the basic models of AI explainability. In order to ensure fair explainability for complex models, both interpretability and fidelity are required. Fidelity refers to accurate, elaborate descriptions of the models, while interpretability provides a single rationale of input and output relationships in a compact model form. Fidelity provides sufficient and truthful information for completeness of model explainability.

AI dependence on data quality

Both the model and post-hoc approaches of explainability, depend on data inputs, feature importance, predictions, and business rules. Explainability depends on the extent to which the data is qualitative, quantitative, or biased for the AI model. Data patterns are based on data quality and relations. Low data quality leads to a skewed model and incorrect AI model explainability, adversely impacting human belief in the models.

AI models adhere to data quality of different dimensions such as accuracy, relevance, completeness, and constituency in data standardization. There are separately defined metrics to measure trustworthiness before considering data for the AI model data bias plays an important role for obtaining good data quality and for model explainability to remain unaffected in both model and post-hoc approaches. If it is not maintained, the impact is high which skews the model’s outcomes.

AI model development generally has a sequential approach for data processing, involving data operations ranging from treatment and wrangling to model building. It is important to capture the logs for data changes and distribution since they are vital for post transformation explainability, after model building. Data engineering logs play a significant role in the explainability of the data type, relations, data patterns, and changes in the post-hoc model. If data quality issues are not addressed initially, they can snowball into larger issues during the successive processes, impacting explainability, human decisions, and society at large.

AI model explainability is important for humans to understand and take decisions based on data while data quality is crucial in providing fair explainability of solutions. Research is ongoing to identify new explainability tools for complex AI models, as this will directly impact data quality, associated challenges and the way forward in overcoming these hurdles in AI model adoptability for human application.

About the author

Dr. P.H. Anantha Desik
Dr. Anantha Desik is currently spearheading data analytics, modelling, machine learning and deep learning with the integrated data management platform, TCS MasterCraft™ DataPlus at Tata Consultancy Services. He has also played a significant role in setting up the data analytics & actuarial COE for the insurance business in TCS. He has over 30 years of experience cutting across business consulting and delivery in different industry domains such as insurance, healthcare and finance. He has published many papers in conferences and journals on digital technologies. Dr.Desik holds a Ph.D degree in Operation research , Executive MBA from IIM Kozikode.