What is MLOps, after all?. Unlocking the Full Potential of Machine… | by Iago Modesto Brandão

This Medium post is deeply inspired by an papaer named Machine Learning Operations (MLOps): Overview, Definition and Architecture [1]

At the heart of any ML project is the data scientist, crafting intricate and powerful models in the development stage. In reality these models, while sophisticated, often lack the robustness needed for a smooth transition into production [1]. This limitation is largely due to the nature of the models themselves, typically built in isolation and therefore, not suited to handle the complexity and variability of a live environment.

In reality, ML models, while sophisticated, often lack the robustness needed for a smooth transition into production (D. Kreuzberger et al., 2023)

What follows is a frustrating disconnect: a data scientist’s development does not always align with the real-world production landscape. This discordance leads to ML endeavors often failing to deliver business value, for failing on their promises, thereby stifling the growth of a data-driven and data-informed culture within organizations.

Enter MLOps, an emerging field with promising solutions, being described as a system of best practices, concepts, and development culture designed to operationalize ML products [1].

MLOps is a paradigm including aspects like best practices, sets of concepts, as well as development culture when it come to the end-to-end conceptualization, implementation, monitoring, deployment and scalability of machine learning products (D. Kreuzberger et al., 2023).

MLOps stands as the bridge between development and production, ensuring that the models designed by data scientists can be effectively automated and operationalized.

MLOps Products are made from technical components built based on principles, being principle a guide of behavior, indicating how MLOps processes should occur.

A vision of main MLOps products will be describe in following sections.

Adapted from: *D. Kreuzberger et al*. (2023)

CI/CD Pipelines

The CI/CD component guarantees the seamless integration, uninterrupted delivery, and continuous deployment of software. It handles essential stages such as building, testing, delivering, and deploying. By promptly providing feedback to developers on the outcome of these steps, it effectively enhances overall productivity.

Exemples are: Github Actions and Gitlab

Source Code Repository

The training, inference, and application source code are stored in a version-controlled repository. This enables multiple developers to contribute by committing and merging their changes.

Examples are: Github and Bitbucket

Model Training Infrastructure

The infrastructure for model training offers essential computational resources such as CPUs, RAM, and GPUs. This infrastructure can be either distributed or non-distributed. Generally, it is advisable to use a scalable and distributed infrastructure.

Examples are: Vertex AI and Databricks

Feature Stores

A feature store system guarantees the centralized storage of frequently used features. It consists of two configured databases: an offline feature store database, which provides features with regular latency for experimentation purposes, and an online store database, which serves features with low latency for predictions in production.

Examples are: Vertex AI Feature Store and Databricks Feature Store

Model Registry

The model registry centrally stores the trained ML models along with their metadata. It serves two primary purposes: storing the ML artifact and storing the ML metadata.

Examples are: Mlflow Registry and Kubeflow

Monitoring Component

The monitoring component is responsible for continuously monitoring the performance of the model serving, such as prediction accuracy. Moreover, monitoring of the ML infrastructure, CI/CD, and orchestration is necessary.

Examples are: Great Expectations and Vertex AI

ML Metadata Stores

ML metadata stores enable the tracking of diverse types of metadata for every task in an orchestrated ML workflow pipeline. Additionally, a separate metadata store can be set up within the model registry to monitor and record the metadata associated with each training job. This includes details such as the training date and time, duration, utilized parameters, resulting performance metrics, as well as the model lineage encompassing the data and code utilized.

Examples are: Mlflow Tracking

Workflow Orchestration

The workflow orchestration component enables the orchestration of ML workflows through directed acyclic graphs (DAGs). These graphs depict the sequence of execution and the utilization of artifacts for individual steps within the workflow. For instance, a workflow may utilize packaged code artifacts during various process steps such as data extraction, training, inference, or embedding of a model binary into an application.

Examples are: Apache Airflow, GCP ComposerB and Azure Data Factory

Model Serving

The model serving component can be configured to serve various purposes. For instance, it can be used for online inference to make real-time predictions or for batch inference when handling large amounts of input data. Serving can be achieved through a REST API, for example. It is advisable to establish a scalable and distributed model serving infrastructure as a fundamental layer of the system.

Examples are: Vertex AI Serving and Databricks Serving

Data Scientist

A Data Scientist is responsible for researching and applying advanced statistical techniques, such as regression, classification, clustering, and optimization, to automate processes that impact business operations or customer-facing products. They work closely with Software Engineers or ML Engineers to deploy and monitor their models. Data Scientists also utilize inferential and experimental techniques to choose among different versions of a product or a process. Candidates interested in a Data Science position often have a graduate degree in a quantitative field [1][2].

Data Engineer

Data Engineers play a crucial role in architecting and maintaining databases, building data pipelines, and developing tools used for analytics and ML. They are responsible for ingesting data from various sources, ensuring its quality, and designing and maintaining the pipelines that move the data between databases. Data Engineers typically have a strong background in computer science or analytics and are proficient in programming languages like SQL and Python. They also work with modern data tools and solutions, such as cloud platforms (AWS, GCP), distributed systems, and workflow management tools like Airflow [1][2].

ML Engineer

Machine Learning Engineers focus on designing efficient tools, or a ML Plataform, for model deployment, managing and monitoring over time, optimizing time-to-market for model business value delivery. ML Engineers have a combination of experiences in its own are, but also have related knowledge, as Data Science, Data Engineering, Software Engineering, DevOps engineering and Backend Engineering, often with graduate degrees in quantitative fields. They collaborate with Data Scientists and Data Engineers to deploy machine learning models and integrate them into production systems [1][2].

Software Engineer

Software Engineers are responsible for the design, development, and maintenance of software applications and systems. They work on both frontend and backend components, writing code, debugging, and ensuring the overall functionality and performance of the software. Software Engineers collaborate with cross-functional teams, including designers, product managers, and quality assurance engineers, to deliver high-quality software products. They have expertise in programming languages, software development methodologies, and frameworks relevant to their specific domain [1].

DevOps Engineer

DevOps Engineers focus on the intersection of software development and IT operations, aiming to streamline the development, deployment, and maintenance of software systems. They are responsible for automating processes, managing infrastructure, and ensuring efficient and reliable software delivery. DevOps Engineers work with tools and technologies like containerization (Docker), configuration management (Ansible), and continuous integration/continuous deployment (CI/CD) pipelines. They collaborate closely with software development teams to enable faster and more reliable software releases [1][2].

Backend Engineer

A Backend Engineer is responsible for the development and maintenance of the server-side logic of a software application or system. They focus on the implementation and optimization of the core functionalities that run on the server, ensuring efficient data processing, storage, and retrieval. Backend Engineers often work with databases, APIs, and frameworks to build robust and scalable server-side solutions. They collaborate with Frontend Engineers and other stakeholders to ensure seamless integration between the frontend and backend components of an application [1].

Source link