Data science derive experimental conclusion on a hypothesis through experiments with data and statistical methods. As a branch of data science, machine learning is an engineering initiative involving data learning by machines and refinement of model algorithms till a conclusion is accepted. Machine learning algorithms involve neural networks and the size and depth of the neural network is not the default indicator of model success but in the ability to adjust experiments for a desired result.
However, the priority is not primarily on the algorithms, but the data input. The right kind of data must be considered and not be limited to a single data type or source. Multimodal data types and models provide a richer representation and closely resemble human decision making.
Integrated data design
Therefore, integration of various data sources will capture humans’ behaviours, attributes and decision-making process, translating real-life experiences and knowledge into machine-readable data. This involves structured and unstructured data, processed in data streams or batches, in near real-time. Data processing step will precede the deployment of models and the right data sources will help provide a good illustration of the use case.
To adopt a common understanding on the dataset used, the inclusion of meta-data for model development will enable the model to understand the data as common knowledge and have a single data source. For example, the usage of graph structure describing data relationship on a unified data store will prevent duplication and provide data linkages with performance tracking.
This also supplement the model’s learning ability and provide domain knowledge beyond the limited training data provided – also known as the few-shot learning challenge. In a way, the model learns from prior experiences and related knowledge before attempting its own learning. This concept of using graphs to share information is referenced from research papers on graph neural networks (GNN).
Enabling MLOps with meta-data
Meta-data will provide a shared knowledge base and enable MLOps to be inserted into the wider organization functions. Furthermore, meta-data can fulfil different requirements of the data science research and engineers/development teams, providing researchers with graphs for dataset creation and document meta-data for engineers to track development pipeline performance. This also helps in audits and explainability with the wider audience of a solution. MLOps focus on constant iterative data feedback, model training, validation and testing, and with the inclusion of meta-data, it will provide the common library for both the machine and human stakeholders to utilize machine learning to meet business objectives.
To learn more, please pick any publication you like (list of recent reports here) and we can discuss over video calls, walking through any reports/blogs, exhibits, provide additional insights or perspectives. Email me at firstname.lastname@example.org
Below are related reports to this blog:
MLOps Part 1: From Machine Learning Innovation to Production
MLOps Part 2: Examples of Enterprise Machine Learning Deployment Providers
Introduction to Graph Data Design: Alternative Database and Tools