PINNS with Colab Enterprise. on Vertex AI | by Giovanni Marchetti

on Vertex AI

Abstract

You can run Physics-Informed Neural Networks (PINNS) on Vertex AI using DeepXDE, a popular library for PINNs backed by Tensorflow, PyTorch or Jax. Vertex AI experiments will provide a useful mechanism to keep track of your work, while Vertex AI training can automate some aspects of it.

In this article we describe how to set up a Google Colab Enterprise environment with DeepXDE in Vertex AI. We then develop a model to solve Navier-Stokes equations in a simplified form for demonstration purposes.

Introduction

Physics-Informed Neural Networks (PINNs) utilize existing knowledge of physical laws to learn a model of a system from representative data. They are particularly useful when such data are scarce or difficult to obtain. Embedding the physical laws governing the system allows for a more robust model than would otherwise be possible to derive.

Mathematical formulation

According to the universal approximation theorem, a neural network with a non-linear activation can approximate any arbitrary function. We can take advantage of that property to find numerical solutions for the differential equations that govern the behavior of physical systems.

Let us consider an ordinary differential equation

where we know some values of f(u,t) at given times from experimental observations, and the initial conditions u0, but we do not know u(t).

It is possible to define a neural network NN(t) that approximates u(t), i.e. NN(t)≈u(t), so

If we use a loss function such as sum of squared errors

where ω is a set of tunable parameters, we can find an ω that minimizes the loss by training our network with stochastic gradient descent methods. In doing so, we’ll also find a useful approximation of u(t) in the network NN(t).

In other words, we introduce our physical law f(u(t),t) as a regularization term in the training of a neural network and use that to approximate a function that we did not know a priori. Luckily Google has just the tool to create and train neural networks, namely Tensorflow!

An example with Tensorflow

As a first attempt, let us consider an equation whose analytical solution is known

with initial condition u(0)=1

The analytical solution is

We create a neural network with one input, one output and 32 neurons in one hidden layer.

NN = tf.keras.models.Sequential([
tf.keras.layers.Input((1,)),
tf.keras.layers.Dense(units = 32, activation = 'tanh'),
tf.keras.layers.Dense(units = 1)
])
NN.summary()
Model: "sequential"
_______________________________________________________________
Layer (type)                Output Shape              Param #   
===============================================================
dense_6 (Dense)             (None, 32)                64        dense_7 (Dense)             (None, 1)                 33        
===============================================================
Total params: 97 (776.00 Byte)
Trainable params: 97 (776.00 Byte)
Non-trainable params: 0 (0.00 Byte)
_______________________________________________________________

We define the loss function L:

in Tensorflow

def L_w(t, nn):
t = t.reshape(-1,1)
t = tf.constant(t, dtype = tf.float64)
t_0 = tf.zeros((1,1), dtype = tf.float64)
one = tf.ones((1,1), dtype=tf.float64)with tf.GradientTape() as tape:
tape.watch(t)
u = nn(t) # compute neural network output
u_t = tape.gradient(u, t) # compute derivative
loss = u_t - tf.math.cos(2*np.pi*t) # computation loss
loss0 = nn(t_0)-one #initial condition loss
loss2 = tf.square(loss) + tf.square(loss0) # sum of squares
total_loss = tf.reduce_mean(loss2) # compute MSE
return total_loss

then we train using the Adam optimizer.

opt=tf.keras.optimizers.Adam(learning_rate=1e-3)
#generate some training samples
train_t = (np.random.rand(30)*2.0).reshape(-1, 1) 
train_loss_record = [] # to keep history of loss values
#eager execution to view intermediate outputs
tf.config.run_functions_eagerly(True) 
for i in range(6000):
with tf.GradientTape() as tape:
train_loss = L_w(train_t, NN) # compute loss
train_loss_record.append(train_loss)grad_w = tape.gradient(train_loss, NN.trainable_variables) # compute gradients
opt.apply_gradients(zip(grad_w, NN.trainable_variables)) # back propagation
if i % 100 == 0:
print(train_loss.numpy())

and plot the results

plt.figure(figsize = (10,8))
plt.plot(train_loss_record)
plt.show()

Now we can predict over a test dataset and verify that the neural network provides a decent approximation of u(t)

# generate some test points
test_t = np.linspace(0, 2, 100)train_u = np.sin(2*np.pi*train_t)/(2*np.pi) + 1
true_u = np.sin(2*np.pi*test_t)/(2*np.pi) + 1
# compute approximation
pred_u = NN.predict(test_t).ravel()
plt.figure(figsize = (8,6))
plt.plot(train_t, train_u, 'ok', label = 'Training')
plt.plot(test_t, true_u, '-g',label = 'True')
plt.plot(test_t, pred_u, '--b', label = 'Predicted')
plt.legend(fontsize = 15)
plt.xlabel('t', fontsize = 10)
plt.ylabel('u', fontsize = 10)
plt.show()

Now with DeepXDE

DeepXDE is a library for scientific machine learning that can be used to define PINNs (and other algorithms). Its pre-defined functions provide a useful level of abstraction to formulate scientific problems in multiple dimensions.

Several frameworks (Tensorflow, PyTorch, Jax and PaddlePaddle) are supported.

Let’s rewrite our example for DeepXDE

from deepxde.backend.set_default_backend import set_default_backend
set_default_backend("tensorflow")

Using the Jacobian (first-order derivative) operator, the loss can now be defined as:

pi = tf.constant(m.pi)def L_w(t, u):
du_t = dde.grad.jacobian(u, t)
return du_t - tf.math.cos(2*pi*t)

The boundary condition occurs when t is close to zero

def boundary(t, on_initial):
return on_initial and np.isclose(t[0],0)

The geometry of the problem is one time dimension and the value of the function u(t) on the boundaries is 1

# time domain between 0 and 2
geom = dde.geometry.TimeDomain(0, 2)# initial condition u(boundary) = 1
ic = dde.IC(geom, lambda t: 1, boundary)
# Analytical solution to compute error
def true_solution(t):
return np.sin(2*np.pi*t)/(2*np.pi) + 1

Our ordinary differential equation system can then be defined as:

data = dde.data.PDE(geom, # geometry
L_w, # loss
ic, # initial conditions
num_domain = 30, # training samples
num_boundary = 2, # boundary samples
solution = true_solution, # true solution
num_test = 100 # test samples
)

The neural network approximation will be:

layer_size = [1] + [32] + [1]
activation = "tanh"
initializer = "Glorot uniform"NN = dde.maps.FNN(layer_size, activation, initializer)
model = dde.Model(data, NN)
model.compile("adam", lr = 0.001)

We train it and plot the results:

## Train
losshistory, train_state = model.train(epochs = 6000)
dde.saveplot(losshistory, train_state, issave = False, isplot = True)

Equations of Navier-Stokes

Figure 5: Simplified geometry for Navier-Stokes problem

The Navier-Stokes equations are considered the foundations of fluid dynamics. They derive from three conservation laws:

The conservation of mass, which states that the difference of mass entering and exiting a region is zero (nothing is created, nothing is destroyed). In mathematical terms, this can be written as:

where 𝜌 is the density of the fluid and V=[u, v, w] its 3D velocity vector. For an incompressible fluid, such as water at room temperature, density is constant, so this equation becomes

or in 3D space

2. The conservation of momentum (an object in motion remains in motion unless a force is enacted upon it). Momentum is defined as mass x velocity vector, m x V . Its variation per unit of volume is:

where D is the 4D material operator, µ the viscosity coefficient (assumed to be constant), p static pressure and 𝜌g a force per unit of volume applied to the whole body of fluid (e.g. gravity). Since

we can rewrite the 2nd conservation law in 3D space as:

3. The conservation of energy (total energy remains constant, hence any variation of energy in a fluid in motion is the sum of variations in thermodynamic and mechanical work).

where T is the temperature, cp the specific heat at constant pressure, qg the heat generated per unit of volume, k the thermal conductivity and Φ the viscous dissipation.

Simplified form

An analytical solution of such equations in their general form is not known, hence the importance of numerical approximations. For the sake of our experiment, we’ll consider a simplified use case shown in Figure 5:

The fluid is in a steady state, so there is no variation over time.
Temperature is constant and no heat is generated (hence we can ignore the third equation)
The geometry is 2D, hence we can ignore the w terms.

Imagine, if you will, a shallow canal inlet, or a shallow river passing under a bridge arch, seen from top down. In that case, we want to solve the following equations:

We will also impose the following boundary conditions:

Only one direction of motion at the inlet

Only one direction of motion at the outlet, no pressure

and assume the following properties:

(They do not represent a realistic fluid, but simplify our calculations for demonstration purposes).

Vertex AI Experiment

It is useful to treat our calculations as experiments, so we can keep track of parameters and results. While Tensorflow provides a way to log metrics during training via Tensorboard, as of writing DeepXDE does not support it, although it does capture those metrics. We can however still make use of Vertex AI experiments for this purpose. If we do this in Colab Enterprise, we’ll have easy access to the metrics in a side panel.

Setup

Let’s set the experiment up:

from google.cloud import aiplatform
aiplatform.init(
project=PROJECT_ID,
staging_bucket=BUCKET_URI,
location=REGION,
experiment=EXPERIMENT_NAME)

Let us also make sure that we use the best precision available. Tensorflow defaults to float32. That may not be sufficient and affect the numerical stability of our model, especially if we want to calculate second-order gradients. Thus, we’ll switch to float64.

from deepxde.backend.set_default_backend import set_default_backend
set_default_backend("tensorflow")
dde.config.set_default_float("float64")

In other words, consider that we are training an equation solver to all practical effects. Hence, we want the most detailed solution that is practical to achieve on our computing infrastructure.

Next, we define the properties and boundaries as explained before.

# Properties
rho = 1
mu = 1
u_in = 1
D = 1
L = 2
def boundary_wall(X, on_boundary):
on_wall = np.logical_and(np.logical_or(np.isclose(X[1], -D/2), np.isclose(X[1], D/2)), on_boundary)
return on_walldef boundary_inlet(X, on_boundary):
return on_boundary and np.isclose(X[0], -L/2)
def boundary_outlet(X, on_boundary):
return on_boundary and np.isclose(X[0], L/2)

Keeping in mind that the Jacobian operator is the first derivative and the Hessian the second one, we describe the system with 2 momentum and 1 mass conservation equations.

def pde(X, Y):
du_x = dde.grad.jacobian(Y, X, i = 0, j = 0)
du_y = dde.grad.jacobian(Y, X, i = 0, j = 1)
dv_x = dde.grad.jacobian(Y, X, i = 1, j = 0)
dv_y = dde.grad.jacobian(Y, X, i = 1, j = 1)
dp_x = dde.grad.jacobian(Y, X, i = 2, j = 0)
dp_y = dde.grad.jacobian(Y, X, i = 2, j = 1)
du_xx = dde.grad.hessian(Y, X, i = 0, j = 0, component = 0)
du_yy = dde.grad.hessian(Y, X, i = 1, j = 1, component = 0)
dv_xx = dde.grad.hessian(Y, X, i = 0, j = 0, component = 1)
dv_yy = dde.grad.hessian(Y, X, i = 1, j = 1, component = 1)pde_u = Y[:,0:1]*du_x + Y[:,1:2]*du_y + 1/rho * dp_x - (mu/rho)*(du_xx + du_yy)
pde_v = Y[:,0:1]*dv_x + Y[:,1:2]*dv_y + 1/rho * dp_y - (mu/rho)*(dv_xx + dv_yy)
pde_cont = du_x + dv_y
return [pde_u, pde_v, pde_cont]

Now we define the geometry, which in this simplified case is a rectangle as per figure 5, then apply the boundary conditions.

geom = dde.geometry.Rectangle(xmin=[-L/2, -D/2], xmax=[L/2, D/2])bc_wall_u = dde.DirichletBC(geom, lambda X: 0., boundary_wall, component = 0)
bc_wall_v = dde.DirichletBC(geom, lambda X: 0., boundary_wall, component = 1)
bc_inlet_u = dde.DirichletBC(geom, lambda X: u_in, boundary_inlet, component = 0)
bc_inlet_v = dde.DirichletBC(geom, lambda X: 0., boundary_inlet, component = 1)
bc_outlet_p = dde.DirichletBC(geom, lambda X: 0., boundary_outlet, component = 2)
bc_outlet_v = dde.DirichletBC(geom, lambda X: 0., boundary_outlet, component = 1)

We then generate the training data for the system given such geometry.

data = dde.data.PDE(geom,
pde,
[bc_wall_u, bc_wall_v, bc_inlet_u, bc_inlet_v, bc_outlet_p, bc_outlet_v],
num_domain = 2000,
num_boundary = 200,
num_test = 100,
)
plt.figure(figsize = (10,8))
plt.scatter(data.train_x_all[:,0], data.train_x_all[:,1], s = 0.5)
plt.xlabel('x-direction length')
plt.ylabel('Distance from the middle of plates (m)')
plt.show()

and obtain a set of sample points like in figure 6

Build and train a model

We start with a fully connected network, compile it with the Adam optimizer.

layer_size = [2] + [32] * 5 + [3]
activation = "tanh"
initializer = "Glorot uniform"net = dde.maps.FNN(layer_size, activation, initializer)
model = dde.Model(data, net)
model.compile("adam", lr = 1e-3, )

The model takes a 2D input, has 5 layers of 32 neurons each and produces 3 outputs, namely u, v and p (velocity along x, velocity along y and pressure).

We then set up a training run for it. It can take a while, so we’ll checkpoint it over time. We will also be able to restore from the best checkpoint when the training is done.

checker = dde.callbacks.ModelCheckpoint(
"model/model.ckpt", save_better_only=True, period=1000
)
ITERATIONS1=10000 #@param {type: "integer"}
OUTPUT_BLOB="<bucket where you want to save the model>"run=aiplatform.start_run(run="pinns-adam"+datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")) # Optional - name and start experiment run
losshistory, train_state = model.train(iterations = ITERATIONS1, callbacks=[checker], display_every=1000)

DeepXDE won’t produce tensorboard logs, but it does record training progress, which we can display after the fact.

Also, since the output is a Tensorflow model, we can use the built-in support for cloud storage to save it in a bucket.

# if best model is not the last iteration
#model.restore("model/model.ckpt-<iteration n.>", verbose=1) model.save(OUTPUT_BLOB)
dde.saveplot(losshistory, train_state, issave = False, isplot = True)

Record run

We can also record the progress made so far as a set of metrics and parameters associated with the run in a Vertex AI experiment.

for i in range(len(losshistory.steps)):
print(np.sum(losshistory.loss_train[i]), np.sum(losshistory.loss_test[i]), losshistory.steps[i])
aiplatform.log_time_series_metrics({"train_loss": np.sum(losshistory.loss_train[i]),
"test_loss": np.sum(losshistory.loss_test[i])},
step= losshistory.steps[i])aiplatform.log_metrics({"best_loss_train": model.train_state.best_loss_train.astype(float),
"best_loss_test": model.train_state.best_loss_test.astype(float),
"best_step": int(model.train_state.best_step),
"optimizer": "adam",
"iterations": ITERATIONS1
})

The model itself is an artifact of the training process, but it is not captured as such automatically. We can add it manually to the Vertex artifact register and conclude our first training run.

artifact = aiplatform.Artifact.create(
schema_title="system.Artifact",
uri=OUTPUT_BLOB,
#resource_id=resource_id,
display_name="pinns-adam"+datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S"),
#schema_version=schema_version,
description="Pinns model with adam optimizer",
#metadata=metadata,
#project=project,
#location=location,
)
# End experiment run if manually started above
aiplatform.end_run()

Finally, let us visualize the predicted u, v and p.

samples = geom.random_points(500000)
result = model.predict(samples)
color_legend = [[0, 1.5], [-0.3, 0.3], [0, 35]]
for idx in range(3):
plt.figure(figsize = (20, 4))
plt.scatter(samples[:, 0],
samples[:, 1],
c = result[:, idx],
cmap = 'jet',
s = 2)
plt.colorbar()
plt.clim(color_legend[idx])
plt.xlim((0-L/2, L-L/2))
plt.ylim((0-D/2, D-D/2))
plt.tight_layout()
plt.savefig("pinns-adam-plot"+str(idx)+".png")
plt.show()

Figure 11: u, v and p predicted with FNN and Adam optimizer

Second-order optimizer

The Adam optimizer that we chose before is based on the first-order gradient of the error. It is relatively simple to use (the default parameters are sufficient in most cases), converges quickly when compared to alternatives and is therefore widely adopted in training neural networks. Alas, it is prone to stopping at local minima of the loss and may not be as effective as second-order methods (2) such as L-BFGS. Those are considerably more computationally intensive, and as of writing not in the Tensorflow library. For small, mostly synthetic data sets as those we find with PINNs, we can still use them, but they will run on CPU only.

For that reason, we’ll use them in combination with Adam and follow up with a smaller number of training iterations with L-BFGS.

We record this in a second experiment run.

ITERATIONS2=3000 #@param {type: "integer"}
OUTPUT_BLOB="<bucket for second model>"
dde.optimizers.config.set_LBFGS_options(maxiter = ITERATIONS2)model.compile("L-BFGS")
aiplatform.start_run(run="pinns-lbgfs"+datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S"))
losshistory, train_state = model.train()
model.save(OUTPUT_BLOB)
dde.saveplot(losshistory, train_state, issave = False, isplot = True)

As expected, the loss has further space for improvement with a better optimizer. We store metrics and artifacts with the experiment as before, then plot a prediction for u, v and p.

Figure 13: Prediction after training with L-BFGS optimizer

We can also save the plots with the model in cloud storage.

# Set the name of the bucket where you want to upload the files.
bucket_name = "<your output bucket>"# Create a list of the files you want to upload.
files = glob.glob("*.png")
# Create a storage client.
client = storage.Client()
# Get the bucket.
bucket = client.bucket(bucket_name)
# Upload each file to the bucket.
for file in files:
blob = bucket.blob(file)
blob.upload_from_filename(file)
# Print a message to let the user know that the files have been uploaded.
print("The files have been uploaded to the bucket.")

Vertex AI Job

When we are happy with the results of our experiments, we can automate them by creating a training script and a job to run it.

The script will contain the code that we’ve used so far and accept the fluid and/or geometry parameters as input.

%%writefile navier-stokes.pyPROJECT_ID = "<insert project id here>"
REGION = "<insert region here>"
BUCKET_URI = "<insert path to staging bucket>"
EXPERIMENT_NAME = "<insert experiment name>"
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import random
import deepxde as dde
import os
import glob
from google.cloud import storage
import argparse
import os
parser = argparse.ArgumentParser()
parser.add_argument('--rho', dest='rho',
default=1.0, type=float,
help='Density in Kg/m3')
parser.add_argument('--mu', dest='mu',
default=1.0, type=float,
help='Dynamic viscosity')
parser.add_argument('--u_in', dest='u_in',
default=1.0, type=float,
help='Initial velocity along x in m/s')
parser.add_argument('--D', dest='D', type=float, default=1.0,
help='Distance between plates in m')
parser.add_argument('--L', dest='L', type=float, default=2.0,
help='Length of plates in m')
parser.add_argument('--ITERATIONS1', dest='ITERATIONS1', type=int, default=10000,
help='Iterations w. Adam optimizer')
parser.add_argument('--ITERATIONS2', dest='ITERATIONS2', type=int, default=3000,
help='Iterations w. L-BGFS optimizer')
parser.add_argument('--BUCKET', dest='BUCKET', type=str, default="gm-pinns",
help='Target GS bucket')
args = parser.parse_args()
dde.config.set_default_float("float64")
from deepxde.backend.set_default_backend import set_default_backend
set_default_backend("tensorflow")
# Properties
rho = args.rho
mu = args.mu
u_in = args.u_in
D = args.D
L = args.L
<...code continues...>

The job will require the Vertex Tensorflow training and serving runtimes, so we can define it as follows:

JOB_NAME="navier-stokes-job"+datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")
BUCKET="<insert bucket name>" #target bucket for artifacts
CMDARGS = [
"--rho=" + str(rho),
"--mu=" + str(mu),
"--u_in=" + str(u_in),
"--D=" + str(D),
"--L=" + str(L),
"--ITERATIONS1=" + str(ITERATIONS1),
"--ITERATIONS2=" + str(ITERATIONS2),
"--BUCKET=" + BUCKET,
]job = aiplatform.CustomTrainingJob(
display_name=JOB_NAME,
script_path="navier-stokes.py",
container_uri="us-docker.pkg.dev/vertex-ai/training/tf-gpu.2-14.py310:latest",
requirements=["google-cloud-aiplatform", "google-cloud-aiplatform[autologging]", "deepxde"],
model_serving_container_image_uri="us-docker.pkg.dev/vertex-ai/prediction/tf-gpu.2-14.py310:latest",
)

When we launch it, we’ll want to make sure that we get a GPU instance for those complex calculations. In our example a T4 or L4 will suffice.

MODEL_DISPLAY_NAME = "pinns-model"+datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")# Start the training
vertex_model = job.run(
model_display_name=MODEL_DISPLAY_NAME,
machine_type="g2-standard-4",
accelerator_type="NVIDIA_L4",
accelerator_count=1,
replica_count=1,
base_output_dir="gs://"+BUCKET,
args=CMDARGS,
)

If the required instances are not available, the job will automatically be queued using Dynamic Workload Scheduler for up to 7 days. When they become available, an instance will be provisioned and run.

After submitting, a link to a custom training pipeline will be shown.

INFO:google.cloud.aiplatform.utils.source_utils:Training script copied to:
gs://.../aiplatform-2024-04-11-02:15:22.969-aiplatform_custom_trainer_script-0.1.tar.gz.
INFO:google.cloud.aiplatform.training_jobs:Training Output directory:
gs://... 
INFO:google.cloud.aiplatform.training_jobs:View Training:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/...
INFO:google.cloud.aiplatform.training_jobs:CustomTrainingJob projects/.../locations/us-central1/trainingPipelines/...
current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:View backing custom job:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/...

The job configuration parameters, status and resource utilization are accessible on the Vertex AI console.

Figure 14: Job configuration and resource utilization

and so are the logs.

The models thus trained will also be registered in Vertex AI model registry as “imported”. This will be useful for further reference, but you won’t be able to deploy them from the console, because Vertex does not provide a DeepXDE runtime. You can, however, provide a custom deployment script.

Summary

We have shown how to set up a Google Colab Enterprise instance with DeepXDE, a popular library for physics-informed neural networks (PINNs). We created experiments in Vertex AI to solve Navier-Stokes equations with DeepXDE and record our progress in the form of metrics and parameters. We’ve also been able to store the artifacts we produced (plots, models) in Google cloud storage and register the models with Vertex model registry for further use. Finally, we automated the process with Vertex AI training jobs, so experiments can be easily repeated with different sets of parameters.

References

1 M. Raissi, P. Perdikaris, G. Karniadakis, “Physics Informed Deep Learning (Part I): Data-driven Solutions of Nonlinear Partial Differential Equations”, 2017

2 J. Taylor, W. Wang, B. Bala, T. Bednarz, “Optimizing the optimizer for data driven deep neural networks and physics informed neural networks”, 2022

3 D.J. Acheson, “Elementary Fluid Dynamics”, Oxford University Press, 1990

Source link

Leave a Reply Cancel reply

Related Stories

Different types of artificial intelligence (AI) | by Robert Ishimura Sousa | Apr, 2024

VC-Dimension V.S. Inductive Bias V.S. Biology V.S. Physical Laws : Comprehensive Multi-Disciplinary Table of Machine Learning Classifiers | by Medium_AI_CS_ML | Apr, 2024

Why Machine Learning Is Worth Talking About? | by jupytermishra | Apr, 2024

You may have missed

The Weekly Reorg: Bitcoin Fashion Week

Virtual curating frees artist – Hypergrid Business

Different types of artificial intelligence (AI) | by Robert Ishimura Sousa | Apr, 2024

Azteco Is Helping Millions Buy Bitcoin Without Sharing Their Identity