Machine Learning Warehouse – A Myth or Reality

Business and Technology Insights

Machine Learning Warehouse – A Myth or Reality

 
June 30, 2020

I am tossing a new technical buzzword today - Machine Learning Warehouse; sounds overwhelming? Let me try to explain what it could mean in reality.

Let me ask you a question! Can we create a warehouse framework by stacking up machine learning models, templates, pre-trained models, and a set of customized ML algorithms, and deploy it for enterprise use? Why not? When we can dump our enterprise data (structured or unstructured) in a data warehouse or in a data lake, then the same should be possible for machine learning models as well! Data warehouse holds integrated data from multiple and disparate sources to serve analytical queries. Likewise, ML warehouse can hold ML models or ML templates with integrated machine learning techniques, customized models, model parameters, input data to the models or output from the models, metadata and information exchange mechanism between models to solve business puzzles.

Let’s compare ML warehouse with a data warehouse to understand the concept.

Kimball school of thought

The data warehouse (DW) concept was first developed during late 1980s. The main challenge to build a successful DW was the infrastructure, that could process massive enterprise data and respond quickly to user queries. But when we got powerful compute engines with cheap memory devices, building DW became an easy task.

Story of God Particles

On a lighter note, but actually not, existence of Higgs Boson particle was first theoretically predicted during late 60s by Higgs et al, and they had no lab environment to prove it practically. But, recently the detector in Large Hadron Collider (LHC) observed the Higgs Boson break down into tiny particles!! So, we must wait until we have the proper environment to test our concept.

Concept of ML Warehousing

Many advanced Artificial Intelligence (AI)/ML algorithms were developed long back but it required huge computing power along with millions of data to test!! Today we have petabytes of data and powerful GPUs, TPUs to train and test our advanced ML models. So why should we not think of taking a step forward to build an ML warehouse?

Another interesting futuristic shape of an ML warehouse can be compared with the anatomy of human brain. Our brain has several distinct parts that perform different activities such as vision, controlling limbs, smelling, hearing and so on. All these activities are synchronized by exchanging information from one part to other and controlled by Corpus Callosum, consisting of millions of axons. For example, if I want to draw a picture of an object, then right hemisphere of my brain first captures the image and passes it to left hemisphere that controls my hands with the help of Corpus Callosum.

Similarly, ML warehouse will have an Intelligent Information Exchange Bus (IIEB) through which one or more ML models’ output will pass as inputs to other models to generate more effective and collaborative predictions.

Machine Learning Warehousing

What Makes an Ideal ML Warehouse?

  1. Machine Learning Mart: As per Kimball, a DW consists of multiple Data Marts that are basically star schemas to hold business unit specific denormalized data. Likewise, domain-specific ML models or model building templates are deployed in ML Marts to be consumed by the user groups. We can imagine, ‘Networking and Data Center’ as one ML mart. So, it can have multiple models to predict the following questions (compare with KPIs concept in data mart).

When could a network failure occur?

How much time will it take to recover from failure?

Forecasting – what could hit the high load in the data center and when?

As described earlier, ‘Network Failure Forecasting’ output could be one of the vital inputs to the model that predicts ‘Forecasting High Load on Network’; as it could be assumed that after any such failure, load to network will increase.

Collaborative Governance: A strong governance framework would be maintained to exchange influenced and relevant information from one model to other through Intelligent Information Exchange Bus.

ML Data Store: All the model-specific input or output data to be stored in Data Store area. Data Service Management component would be responsible to manage this layer.

Easy and Fast Access: Access mechanism should be extremely fast and easy in the ML warehouse; it could be either retrieving of saved models or access of prediction output from the model. All will be done by the built-in REST APIs in the solution.

Offline Workload: User would be able to download the models, respective data or feature dataset from the ML marts or data store and train/test, do fine tuning activities in local environment. Offline model fine tuning options such as addition of new features and optimization of model hyper-parameters could be done by the users themselves. Once the final model is ready, user would be able to push back the model to the warehouse to serve enterprise requirements.

The Future Looks Promising

If we can envisage the ML warehouse as an integral part of a Cognitive Decision Support System (CDSS) to meet the wide enterprise needs, then probably all the mentioned building blocks can be positioned appropriately to solve various kinds of business puzzles.

Biswanath Bal is a Machine Learning practitioner in Innovation and Product Engineering group within TCS Platform Solutions. He is responsible for building embedded cognitive intelligent and machine learning capabilities to the TCS Platform Solutions products for Human Resource and Procurement. Prior to this, he worked on several strategic EDW and Business Intelligence engagements for clients from different industries including Telecom, Manufacturing, Banking etc. His special interests include computer vision, natural language processing, real time prediction, forecasting on time series data, Machine Learning on Cloud. He holds a post-graduate degree in Computer Applications (MCA) from North Bengal University, West Bengal.