Why do you need to use W&B to track your ML project? | by LittleBigCode | BEYOND DATA by LittleBigCode

By Jamila REJEB, Data Scientist at LittleBigCode 🚀

If you are reading this article, then you are currently working on a machine learning project and are eager to learn about ways to help you develop and deliver better models. As a matter of fact, during a lifecycle of a Machine Learning project, creating an ML model is merely the first step, deploying and monitoring your ML models, data and experiments is where things get complicated. Therefore, we need a methodical approach and set of practices and tools to address this challenge. That is why, at LittleBigCode, we have decided to help you tackle this issue through a series of articles. Starting from a gentle introduction to MLOps, to learning how to smartly manage your datasets and track your experiments.

In this article, we will dive into a particular tool designed to support and automate key steps in the MLOps life cycle, such as experiment tracking, dataset versioning and model management.

Without further ado, let us discover what Weights & Biases (W&B) has to offer.

Weights & Biases (W&B) is introduced as « the developer-first MLOps platform». It is designed to be a platform geared towards developers for building better machine learning models more efficiently.

Weights and biases is a web-based subscription service. You start by creating a free account and with it, you get 100 GB of data and artifacts storage.

Using this cloud-based service you will be able to host your experiments in a single central repository and if you have a private infrastructure, Weights & Biases can also be deployed on it.

There are two main components that make up W&B, the Workspace, and a Python API.

a. the python API component is what you use to integrate your ML code with W&B and get insights from your experiments in the Workspace

b. the workspace on the other hand contains the dashboard and the navigation bar where you can access recent Projects and get a visual understanding of your datasets and experiments

Experiments: Leightweight experiment tracking
Reports: collaborative dashboards
Artifacts: Dataset and model versioning
Tables: Interactive data visualization
Sweeps: Hyperparameter optimization

Every ML projects starts by understanding the data in order to build interesting features.

The Tables feature helps sort, filter, group and create charts directly from tabular data.

You can also use this functionality to understand and visualize your machine learning model predictions. Like in this example.

You can group by the prediction and see which examples are being misclassified by grouping by the guess column as shown below.

Tracking Experiments using the dashboard

Every Machine Learning project starts with an experimental step in which different models are tested, features, and hyperparameters. However, too often we find ourselves lost in all the folders, excel files, and notebooks that we used to track and compare these experiments’ performances
Using W&B dashboard you will be able to compare your experiments using the graphs created from all the metrics that you have logged.

More practically, you will start by adding 5 lines of code to your existing python.

# Flexible integration for any Python script
import wandb# 1. Start a W&B run
wandb.init(project=wandb_example)
# 2. Save model inputs and hyperparameters
config = wandb.config
config.learning_rate = 0.01
# Model training here
.....
# 3. Log metrics over time to visualize performance
wandb.log({"loss": loss})

And just like that, you have created a project named “wandb_example”

Model optimization with sweeps

An important step in choosing the best model for a specific task is to optimize the hyperparameters. This search however tends to be heavy, and time-consuming. In addition, we generally end up again with a lot of graphs and saved models with complex names.

W&B has a specific feature for this task. On the left of your project space, you will find an icon that looks like a broom: this is where you create your sweep files. Once your sweep file is configured, the search will be run and then you will be able to see its result directly on the dashboard. You can even visualize which hyperparameters affect the metrics you care about.

Versioning with Artifacts

Reproducibility is very important in any project, especially for those involving Machine Learning models. You have to save your model at each training and also your dataset. By building a dependency graph you will be able to trace the flow of data through your pipeline, so you know exactly which datasets feed into your models thanks to the graph view. In the example below, you can see that for the same project we have two versions of the dependency graph. In each version we have used a different version of the dataset with a different training script. This feature in this case will help us better visualize our pipeline.

Collaborative analysis using Reports

Another useful feature is the reports. By creating a report you can easily share updates and outcomes of your machine learning projects with your coworkers, add text to explain how your model works, show graphs and compare model versions and demonstrate progress towards milestones. Your coworkers will also be able to edit and comment on the report.

After exploring every feature of the W&B platform and testing it on a dummy project, we can now create a small review to summarize the main advantages and drawbacks of this tool.

Let us start with the Pros.

Ease of use: creating an account was very straightforward. The platform is neat and very easy to play with. The free 100 Gb is a good start to test the product.
A central, user-friendly, and interactive dashboard where you can view your experimentations and track their performance.
Tracking every part of the model training process, visualizing models, and comparing experiments.
Automated hyperparameter tuning with the use of Sweeps, which provides a sample of hyperparameter combinations to help with model performance and understanding.
Collaborative reports for teams, where you can add visualizations, organize, explain and share your model performance, model versions, and progress.
End-to-end artifact tracking of the machine learning pipeline, from data preparation to model deployment.
Easy integration with frameworks like Tensorflow, Pytorch, Keras, Hugging Face, and more.
Collaborative work in a team with multiple features for sharing, experimenting, etc.

These are all useful features that Weights & Biases provides, which makes it a good tool for research teams looking to discover, learn and gain insights into machine learning experiments.

However, there are still a couple of cons that we deemed worthy of listing:

ML lifecycle management: Managing the complete lifecycle of a model is important during research, i.e from data sourcing to model deployment because it allows them to correctly monitor, and debug any issue at any stage of development.
Production use-case: For production-based teams or projects, Weights & Biases is not a good option because of its lack of a production engine that will help you deploy your model and generate new predictions as you receive them.
Model inference: An important part of research is testing and carrying out real-time inferences. This is why model deployment is needed right after building and evaluating models.

In conclusion, Weights & Biases provides multiple useful features, which makes it a good tool for research teams looking to discover, learn and gain insights into machine learning experiments. However, when it comes to delivery, Weights & Biases is not always the best option.

Source link