Vertex AI Experiments and Training: Better integration | by Ivan Nardini | Google Cloud – Community

Vertex AI Experiments and Training: Better integration — Image from author

Tracking your ML experiments is fundamental during the process of model development for many reasons, including debugging, compliance and cost saving. On Vertex AI, you can run your experiments using Vertex AI Training and you can track your parameters and metrics using Vertex AI Experiments.

So far, as a data scientist, who wants to train a model using Vertex AI Training, and log parameters and metrics in Vertex AI Experiments, you have to

Install the google-cloud-aiplatform dependency
Hard code lines to initialize an experiment and an experiment run, then call logging APIs (log_params, log_metrics) for tracking parameters and metrics in the training task script (inner script)
(Optional) Manually log Vertex AI Training job information (job name, job type, etc..) as parameters when you run the training job script (outer script)

Below is an example of logging parameters and metrics in Vertex AI Experiments from your training task.

# Import libraries
from google.cloud import aiplatform# Define constants
params = read_config('config.yaml')
# Initialize vertex ai experiment
aiplatform.init(project=project, experiment='your-experiment')
# Initialize vertex ai experiment run
with aiplatform.start_run('your-experiment-run', resume=True):
# log training params 
aiplatform.log_params(params)
...
# train model
model = train_model(x_train, y_train, params)
# evaluate model
accuracy = evaluate_model(model, x_test, y_test)
# log metrics
aiplatform.log_metrics(accuracy)

Next, submit the related training job using Vertex AI Python SDK.

# Initiate a custom training job from the script
job = aiplatform.CustomJob.from_local_script(
...
script_path="your_training_script.py",
...
)job.submit()
# Log training job in vertex ai experiments
with aiplatform.start_run('your-experiment'):
job_info = {
"job_id": job.name,
"job_type": "Custom Job",
...
}
aiplatform.log_params(job_info)

Thanks to the new integration between Vertex AI Training and Vertex AI Experiments, you don’t have to glue your code anymore. To minimize code change in both inner (training task) and outer (training job) scripts while automatically logging as much CustomJob data as possible in Experiments, with the new integration.

Automatically provides the Vertex AI Experiments dependency (google-cloud-aiplatform) both in pre-built training containers and custom containers.
Enables experiment configuration inheritance between the code you use to run the custom training job and the model training script you use to run an experiment.
Integrates Vertex AI Experiments autologging with Custom Training.
After the training job is submitted, automatically log the job’s metadata to Vertex AI Experiments.

In terms of code, assuming you use Vertex AI Autologging, see how simple the training code results are now thanks to the new integration.

# Import libraries
from google.cloud import aiplatform# Define constants
params = {'data_path': "gs://your-bucket", ..., 'n_est' = 3}
...
# train model
model = train_model(x_train, y_train, params)
# evaluate model
accuracy = evaluate_model(model, x_test, y_test)

And after the training job successfully completes, you can get logged metadata as shown below.

# Initialize vertex ai experiment
aiplatform.init(experiment='your-experiment')# Initiate a custom training job from the script
job = aiplatform.CustomJob.from_local_script(
...
script_path="your_training_script.py",
enable_autolog=True,
...
)
job.submit(experiment="your-experiment")
# Get logged custom training metadata
experiment_run = experiment_df.run_name.iloc[0]
with vertex_ai.start_run(experiment_run, resume=True) as run:
# get the latest logged custom job
logged_job = run.get_logged_custom_jobs()[-1]
# print custom job spec (example)
print(logged_job.job_spec)

Vertex AI ML Platform keeps evolving. Vertex AI Experiments and Training are better integrated now. As a data scientist, when you run an experiment using Vertex AI Training and you want to log parameters and metrics using Vertex AI Experiments:

You don’t need to install Vertex AI Experiments dependency (google-cloud-aiplatform) to log your experiments
You don’t need to hardcode experiment and experiment run names in their training scripts.
Your training parameters and metrics can be automatically logged to Experiments without any change in your training scripts.
No additional code is required for tracking Vertex AI Training job metadata.

In other words, from now on logging experiments you run on Vertex AI is much easier thanks to the new integration between Vertex AI Experiments and Training.

Notice that, in this blog, I only show how the new integration simplifies training code using autologging. For more, see the documentation and official sample notebooks.

In the meantime, I hope you found the article interesting. If so, clap or leave comments. And feel free to reach me on LinkedIn or Twitter for further discussion or if you have a question on Vertex AI, check out the Vertex AI Q&A initiative.

Thanks to Ann Farmer for feedback and suggestions.

Source link

Different types of artificial intelligence (AI) | by Robert Ishimura Sousa | Apr, 2024

VC-Dimension V.S. Inductive Bias V.S. Biology V.S. Physical Laws : Comprehensive Multi-Disciplinary Table of Machine Learning Classifiers | by Medium_AI_CS_ML | Apr, 2024

Why Machine Learning Is Worth Talking About? | by jupytermishra | Apr, 2024

You may have missed

The Weekly Reorg: Bitcoin Fashion Week

Virtual curating frees artist – Hypergrid Business

Different types of artificial intelligence (AI) | by Robert Ishimura Sousa | Apr, 2024

Azteco Is Helping Millions Buy Bitcoin Without Sharing Their Identity

Leave a Reply Cancel reply

Related Stories

Different types of artificial intelligence (AI) | by Robert Ishimura Sousa | Apr, 2024

VC-Dimension V.S. Inductive Bias V.S. Biology V.S. Physical Laws : Comprehensive Multi-Disciplinary Table of Machine Learning Classifiers | by Medium_AI_CS_ML | Apr, 2024

Why Machine Learning Is Worth Talking About? | by jupytermishra | Apr, 2024

You may have missed

The Weekly Reorg: Bitcoin Fashion Week

Virtual curating frees artist – Hypergrid Business

Different types of artificial intelligence (AI) | by Robert Ishimura Sousa | Apr, 2024

Azteco Is Helping Millions Buy Bitcoin Without Sharing Their Identity