The world is just at the beginning of a new and radical transformation that will impact all aspects of our current societies and economies: the Fourth Industrial Revolution, a technological shift that has been identified and named before actually happening for the first time in history. Artificial Intelligence (AI) is one of the technologies that will widely contribute to it thanks to the huge progress done during the last years, especially in the area of Machine Learning and Deep Learning. The release of new hardware platforms, specialised services in cloud solutions or software frameworks have brought a new golden age for AI after several periods of winters and summers. Although some examples of AI-based applications have demonstrated a real and relevant business impact, some challenges must be still addressed to unveil the whole potential of AI/ML/DL when considering its deployment in Business to Business (B2B) scenarios, industrial use-cases and critical applications. Indeed, some of these challenges have been identified by the European Commission and different technical and non-technical mechanisms are going to be requested by the emerging AI Act , the first regulation for this technology that will be published worldwide.
Nowadays, most of the effort for the development of services and applications based on data-driven AI technologies is focused on data pre-processing, analytics, selection of the most appropriate model or neural network, training the model and optimising the hyperparameters. There is a gap in transferring the resulting models to production environments, applying industrial practices that guarantee appropriate performance, robustness or scalability. In this sense, the convergence between Machine Learning and DevOps paradigms is becoming a very active research area called Machine Learning Operations (MLOps). The MLOps practice brings ML models into the software solution production process. It bridges the ML applications with DevOps principles, where deployments and maintenance of ML models can be automated in the production environment. MLOps systems should be capable of acting as collaborative, continuous, reproducible, tested, and monitored systems to achieve organizational MLOps goals. The development life cycle of MLOps consists of three major components as data, model, and code .
Following this new trend, multiple open-source and proprietary solutions are being released covering specific parts of the MLOps flow: data labelling, data versioning, feature engineering, experiments’ tracking, hyperparameters optimisation, models’ deployment, models’ serving or models’ monitoring. Main cloud providers are also integrating MLOps as part of their portfolio of ML services, addressing the orchestration of the workloads needed to create, deploy, scale or reproduce ML workflows. This is the case with AWS , GCP or Azure . An interesting overview of some of these tools is included in the following article: https://research.aimultiple.com/mlops-tools/.
Three of the most promising open-source alternatives that can be seen at this moment are:
- Kubeflow: an MLOps platform designed to support the deployment of ML workflows on any infrastructure managed by Kubernetes.
- MLFlow: a platform focused on improving the reproducibility of ML experiments by tracking and recording detailed information about all aspects of the process and also facilitating the packaging of the resulting models.
- Airflow: a more generic solution for the management and orchestration of Directed Acyclic Graphs (DAGs).
In the case of the IoT NGIN project, we are investigating the usage and extension of the Kubeflow framework to support MLOps workflows optimised for the requirements of infrastructures spanning over the computing continuum and including edge devices. This supposes to be able to use streaming information coming from IoT devices, use ML models or neural networks that can run efficiently on these resources, train models in a distributed manner using Federated Learning, package results in an appropriate format, to deploy the models and monitor later their performance and results.
2 Hewage, N., & Meedeniya, D. (2022). Machine Learning Operations: A Survey on MLOps Tool Support. arXiv preprint arXiv:2202.10169.