BUSINESS AND TECHNOLOGY INSIGHTS

Making the Foundation Strong: The Importance of Data Processing in Machine Learning/Artificial Intelligence

 
August 13, 2019

I underwent data analytics training by paying $X dollars but I am getting data issues after applying the algorithm.”

“I have applied the algorithm, but I am not getting the business results.”

“I don’t know what data to use and analyze.”

These three statements above have been a constant bother for students of the data sciences and developers. There are two reasons behind this. It is because they have either not yet realized the importance of data processing in machine learning (ML) and artificial intelligence (AI) algorithms or they have not learnt the process of model development rigorously.

The importance of data

Data is the core of any ML/AI algorithm. It must be supplied in the form that the algorithm understands. The main function of ML/AI algorithms is to unlock the concealed information/knowledge available in the data. The algorithms will end up providing incorrect, bogus insights if the data is available in a form not comprehended by the algorithm. This might end in revenue loss for the project or company.

Pyramid of Needs

This brings us to the crux of the story — how critically important it is for a data scientist to recognize the categorization of procedure for process maturity. Data science and AI advisor Monica Rogati cautions companies keen to implement ML/AI by asking them to imagine AI as situated at the uppermost point of a ‘pyramid of needs’. According to her, AI — like actualization — is amazing, but the first needs in all cases are the basics like data literacy, data collection, and infrastructure.

Maslow’s Human Resources Theory

This can be explained with the much-similar Maslow’s human resources (HR) theory. Published in 1954 in Motivation and Personality, the human psychologist put forth a theory stating that there exists a universal blueprint of ‘needs recognition and satisfaction that people follow in generally the same sequence’. He also postulated on the concept of prepotency — that individuals cannot identify newer, higher needs until his or her current one is properly satiated. It is often illustrated with a pyramid that has survival need at the bottom of its broad base and self-actualization at the leaner top.

AI Hierarchy of Needs

According to Rogati, the same applies to developers who desire to grow into data analysts or scientists dealing with data. They need to focus their time on getting literate about data, collecting, cleaning, exploring, and transforming it, prior to acclimatizing themselves with ML/AI algorithms. For the benefit of any established ML/AI developer to grasp and appreciate concepts in order to create clever ML/AI models and transform themselves into reliable data analysts or scientists, here is a figure that describes the hierarchy of needs

Not a new idea at all, this concept has been used famously in business model development with CRISP-DM (cross-industry process for data mining) methodology as well.

Basic steps in data processing

Data collection: Situated at the pyramid bottom, this feature involves data needs for building the right dataset for ML/AI and the availability of the same. For example, whether all relevant user interactions are being logged or how the data is coming onto a sensor, etc.

Data analysis and treatment: This step stands for the initial understanding of each data variable - type, value range, any missing data, and statistical info to understand and treat the data.

Data exploration and transformation: This step involves exploration of relationship of the given variables based on that dropping the insignificant variables and derivation of new variables based on the transformation. It is crucial in making the pyramid base strong.

Data training: It is only after this that business intelligence or analytics can be built, which in turn form the basis of the ultimate goal of building artificial intelligence, which is knowing your predictions and organizing training data with labels.

Data experimentation: For ML/AI, A/B testing or experimentation is significant in order to mitigate potential problems and obtain an inexact idea about the result of changes before it is felt in a wider base.

Now, with data cleaned and organized, the right things can be measured, daily experimentation can be possible. AI can roll in, fantastic differences can be implemented for users and the company, and a success story can be scripted! In case of problems, the company will learn new methods, nurture new opinions, and experience AI hands-on.

It will be interesting to note here that even ML/AI product manufacturers are gravitating to this direction and incorporating features like data collection, cleaning, and transformation in their products. This is resulting in data pipelining, the core feature in big data products. All other analytical products are also being introduced with this feature. As a data scientist matures during further ML/AI model development, the hierarchical needs of data play a crucial role in his or her learning process.

 

In the absence of diligent data processing…

From the perspective of superfast advancement in the business of digital technologies, conventional developers and businesses are constantly attempting to draw level with the ML/AI bandwagon. In their hurry, they hardly ever take the time needed for diligently follow every required step. They end up omitting steps in their race to the top and sooner or later end up looking at a resounding ML/AI failure or creation of immature models. Therefore, it is crucial to keep in mind the hierarchy of needs in order to remain relevant as a data scientist or data analysis organization.

Tags

Dr. Anantha Desik is currently spearheading Data Analytics, Modelling, Machine Learning & Deep learning with the Data Analytics Service proprietary integrated data management platform, TCS MasterCraft™ DataPlus at Tata Consultancy Services (TCS). He has also played a significant role in setting up the Data Analytics & Actuarial COE for the Insurance business in TCS. He has over 25 years of experience cutting across Business Consulting, Delivery and functional experience in different industry domains such as Insurance, Healthcare & Finance. He has published many papers in conferences and journals on digital technologies.>