Tracking your ML experiments is fundamental during the process of model development for many reasons, including debugging, compliance and cost saving. On Vertex AI, you can run your experiments using Vertex AI Training and you can track your parameters and metrics using Vertex AI Experiments.
So far, as a data scientist, who wants to train a model using Vertex AI Training, and log parameters and metrics in Vertex AI Experiments, you have to
- Install the google-cloud-aiplatform dependency
- Hard code lines to initialize an experiment and an experiment run, then call logging APIs (log_params, log_metrics) for tracking parameters and metrics in the training task script (inner script)
- (Optional) Manually log Vertex AI Training job information (job name, job type, etc..) as parameters when you run the training job script (outer script)
Below is an example of logging parameters and metrics in Vertex AI Experiments from your training task.
# Import libraries
from google.cloud import aiplatform# Define constants
params = read_config('config.yaml')
# Initialize vertex ai experiment
aiplatform.init(project=project, experiment='your-experiment')
# Initialize vertex ai experiment run
with aiplatform.start_run('your-experiment-run', resume=True):
# log training params
aiplatform.log_params(params)
...
# train model
model = train_model(x_train, y_train, params)
# evaluate model
accuracy = evaluate_model(model, x_test, y_test)
# log metrics
aiplatform.log_metrics(accuracy)
Next, submit the related training job using Vertex AI Python SDK.
# Initiate a custom training job from the script
job = aiplatform.CustomJob.from_local_script(
...
script_path="your_training_script.py",
...
)job.submit()
# Log training job in vertex ai experiments
with aiplatform.start_run('your-experiment'):
job_info = {
"job_id": job.name,
"job_type": "Custom Job",
...
}
aiplatform.log_params(job_info)
Thanks to the new integration between Vertex AI Training and Vertex AI Experiments, you don’t have to glue your code anymore. To minimize code change in both inner (training task) and outer (training job) scripts while automatically logging as much CustomJob data as possible in Experiments, with the new integration.
- Automatically provides the Vertex AI Experiments dependency (google-cloud-aiplatform) both in pre-built training containers and custom containers.
- Enables experiment configuration inheritance between the code you use to run the custom training job and the model training script you use to run an experiment.
- Integrates Vertex AI Experiments autologging with Custom Training.
- After the training job is submitted, automatically log the job’s metadata to Vertex AI Experiments.
In terms of code, assuming you use Vertex AI Autologging, see how simple the training code results are now thanks to the new integration.
# Import libraries
from google.cloud import aiplatform# Define constants
params = {'data_path': "gs://your-bucket", ..., 'n_est' = 3}
...
# train model
model = train_model(x_train, y_train, params)
# evaluate model
accuracy = evaluate_model(model, x_test, y_test)
And after the training job successfully completes, you can get logged metadata as shown below.
# Initialize vertex ai experiment
aiplatform.init(experiment='your-experiment')# Initiate a custom training job from the script
job = aiplatform.CustomJob.from_local_script(
...
script_path="your_training_script.py",
enable_autolog=True,
...
)
job.submit(experiment="your-experiment")
# Get logged custom training metadata
experiment_run = experiment_df.run_name.iloc[0]
with vertex_ai.start_run(experiment_run, resume=True) as run:
# get the latest logged custom job
logged_job = run.get_logged_custom_jobs()[-1]
# print custom job spec (example)
print(logged_job.job_spec)
Vertex AI ML Platform keeps evolving. Vertex AI Experiments and Training are better integrated now. As a data scientist, when you run an experiment using Vertex AI Training and you want to log parameters and metrics using Vertex AI Experiments:
- You don’t need to install Vertex AI Experiments dependency (google-cloud-aiplatform) to log your experiments
- You don’t need to hardcode experiment and experiment run names in their training scripts.
- Your training parameters and metrics can be automatically logged to Experiments without any change in your training scripts.
- No additional code is required for tracking Vertex AI Training job metadata.
In other words, from now on logging experiments you run on Vertex AI is much easier thanks to the new integration between Vertex AI Experiments and Training.
Notice that, in this blog, I only show how the new integration simplifies training code using autologging. For more, see the documentation and official sample notebooks.
In the meantime, I hope you found the article interesting. If so, clap or leave comments. And feel free to reach me on LinkedIn or Twitter for further discussion or if you have a question on Vertex AI, check out the Vertex AI Q&A initiative.
Thanks to Ann Farmer for feedback and suggestions.