Part 1: Simple Matrices, and Linear Regression
Series Motivation
I’ve been an MLOps Engineer for the past 5 years, and although my job has less to do with writing low-level Machine Learning functionality on a day-to-day basis, I wanted to make sure that I understand how things work under the hood.
In addition to that I want to learn/get better at Rust. And from the looks of it — so do you!
My goal is to learn rust, and to get a hands-on understanding of Machine Learning techniques. In order to do that I want to write a very simple library, that’s able to take an array of any size and dimension, and then perform some very simple Machine Learning techniques on it. The ultimate test will be using this library for deep learning image recognition.
Starting Remarks
My favourite way to learn anything is by starting from a naive point of view, and then by encountering problems of ever growing complexity and solving them. This way — there’s less room for any logical or intellectual leaps occurring, which usually frustrate me when I’m trying to learn.
This iterative process also helps with direction setting, and it grounds the development process in a certain discipline discouraging working on things that are not directly related to the task at hand.
I’m going to follow this approach, and so I’ll start very simple, and make mistakes (sometimes deliberate) along the way.
Part 1 Description
In part 1 of this series we’ll start with absolutely nothing, and end up with a very simple Matrix
structure, and a LinearRegression
algorithm that works with such Matrix
as input and output.
What API do we want the linear regression to have?
It’s always very good to start the development process by thinking through how our code is going to be used. We want our users to be able to create LinearRegression
structures, whatever they might include. We want them to be able to fit the LinearRegression
line to some data, and we want them to be able to use that LinearRegression
instance to predict values for other data points. The following API is what comes to mind first.
struct LinearRegression{ /* some fields */ }impl LinearRegression {
pub fn new() -> Self { todo!() }
pub fn fit(&mut self, x: Matrix, y: Matrix) { todo!() }
pub fn predict(&self, x: f64) -> f64 { todo!() }
}
What type should Matrix be?
Matrix is a two dimensional array with rows and columns. A very common way of representing matrices is as a nested list of lists:
let matrix = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
];
and we could very easily start by just defining our Matrix
type as rust vectors!
type Matrix = Vec<Vec<f64>>;
Problem number 1 — keeping the data valid
By using the above declaration of the Matrix
type — we’re immediately running head-first into a very big problem. What happens if we write code like this?
let input: Matrix = vec![
vec![1],
vec![3],
vec![4, 5],
];
let output: Matrix = vec![
vec![0],
vec![1],
vec![2],
];let regression = LinearRegression::new();
regression.fit(input, output);
input
is not a valid matrix — its last row has more elements than the rows that precede it. What would we do, if we then try to use this matrix to fit our regression line? We could validate! The fit
function would then have to return a Result
, because it could potentially fail if the passed matrix is not valid.
impl LinearRegression {
/* ... */
pub fn fit(&mut self, x: Matrix, y: Matrix) -> Result<(), String> {
if !validate_matrix(x) {
return Err("x is not a valid Matrix!".to_owned());
}
if !validate_matrix(y) {
return Err("y is not a valid Matrix!".to_owned());
}
todo!()
}
/* ... */
}
But rust has a powerful type system, and we can use it to our advantage here. As explained extremely well in “Parse, don’t validate” — we could instead make sure that Matrix
in our rust code is always valid!
How can we do that? By wrapping the Vec<Vec<f64>>
in our own struct — and then parse any data that is used to initialise the Matrix
.
struct Matrix {
data: Vec<Vec<f64>>,
}impl Matrix {
pub fn new(data: Vec<Vec<f64>>) -> Result<Self, String> {
let inner_dim = match data.first() {
Some(first) => first.len(),
None => 0,
};
for (dim, inner_vec) in data.iter().enumerate() {
if inner_vec.len() != inner_dim {
return Err(format!("vector in position {dim} does not match the inferred inner dimension of {inner_dim}"));
}
}
Ok(Self{ data })
}
}
In this way — if we ever try to pass invalid data to the Matrix::new
function — it will return a Result::Error
.
#[cfg(test)]
mod tests {
use super::*;#[test]
fn test_only_valid_matrices() {
// This matrix is fine!
let matrix = Matrix::new(
vec![
vec![1, 2, 3],
vec![1, 2, 3],
vec![1, 2, 3],
]
);
assert!(matrix.is_ok());
// This matrix should not be possible - it's last row has less elements
// than all the other rows
let matrix = Matrix::new(
vec![
vec![1, 2, 3],
vec![1, 2, 3],
vec![1, 2],
]
);
assert!(matrix.is_err());
}
}
We now have certainty that if we accept Matrix
as an input to our LinearRegression::fit
function — it will always be valid! By parsing data at the point of instantiation — we can remove any validation that would otherwise have been needed.
In a similar vein — if we ever instantiate our LinearRegression
struct like so:
let regression = LinearRegression::new();
/* HERE! */
regression.fit(Matrix::new(vec![vec![1.]]), Matrix::new(vec![vec![10.]]));
between line 1 and 3 (where I’ve added a comment saying “HERE!”) the linear regression is kind of… useless! Its line hasn’t been modelled on any data yet, and so it will always return the same (and rather garbage) predictions. And here again — we could rely on the user to make sure they are using our library appropriately, but as we’re using rust — we can remove such a “useless” LinearRegression
state from ever even compiling!
struct LinearRegression { /* some fields */ }impl LinearRegression {
// The difference is here:
// 1. I removed the `pub fn new` line
// 2. And instead made the `pub fn fit` be the
// entrypoint. I also dropped the `Result` type
// as it's not needed - the `Matrix` type is always valid!
pub fn fit(x: Matrix, y: Matrix) -> Self { todo!() }
pub fn predict(data: Matrix) -> Matrix { todo!() }
}
Problem 1: What happens if the users of our library pass invalid data to linear regression?
Solution 1: We parse all data as Matrix
is created, so that Matrix
always contains only valid data!
Problem number 2 — flattening the data
This problem requires a little bit of future-gazing and a small leap of faith. It has to do with the shape of our matrix, and what happens if we ever want to extend it in the future? Our matrix has only 2 dimensions, but in the future we might want to generalise our library to N-dimensional arrays. In addition what would ever happen if we wanted to “reshape” our matrix? Let’s say that our matrix represents a grayscale image — each f64
number is a pixel value between 0
and 1
representing how white or black it is.
A very pixelated image representing 0 might look something like this as our matrix:
let matrix = Matrix::new(
vec![
vec![1, 1, 1],
vec![1, 0, 1],
vec![1, 1, 1],
],
);
Its dimensions are: 3 x 3
. 3 rows and 3 columns.
In the future, however — if we want to pass this data to for instance a neural network — we’d actually want to flatten this data into a 1 x 9
array. (If this is not obvious — this is where a small leap of faith is required.)
let matrix = matrix.reshape(vec![1, 9]);
How would such a reshape
function look like?
impl Matrix {
/* ... */pub fn reshape(self, new_shape: Vec<usize>) -> Result<Self, String> {
let old_shape: Vec<usize> = self.data.iter().map(|inner| inner.len()).collect();
let old_total_size: usize = old_shape.iter().product();
let new_total_size: usize = new_shape.iter().product();
if old_total_size != new_total_size {
return Err(format!("Matrix of shape {:?} cannot be reshaped into Matrix of shape {:?}", old_shape, new_shape));
}
let mut data = vec![vec![0.0; new_shape[1]]; new_shape[0]];
let mut num_iter = self.data.into_iter().flatten();
for outer_idx in 0..new_shape[0] {
for inner_idx in 0..new_shape[1] {
// .unwrap() here is safe because we already checked that the dimensions match
// if .unwrap() here panics - it means that our old matrix was somehow invalid to begin with,
// and that's a bug with the library, not an error that we should propagate to the user!
data[outer_idx][inner_idx] = num_iter.next().unwrap();
}
}
Ok(Self{ data })
}
}
Do you see the big issue here? We’re accessing the numbers from self.data
in order — num_iter.next()
in each loop iteration. And so the flattened data in both old and new Matrix
are going to be exactly the same, the only difference is the dimensions of the outer and inner Vec
. To drive this point home consider the following example:
let matrix1 = Matrix::new(vec![vec![1, 2, 3]]); // shape (1 x 3)
let matrix2 = Matrix::new(vec![vec![1], vec![2], vec![3]]); // shape (3 x 1)
Data in matrix1
and matrix2
is the same if not for the fact that we’re “distributing” it differently between the outer and inner vectors. And even though all the numbers inside of matrix1
and matrix2
are the same — we need to re-create (copy) the inner and outer vectors on each reshape. This is a waste — exactly because of the invariant that we found (numbers in order are the same). Instead of copying all the numbers on each reshape, what if instead, we stored the numbers in the flattened form to start with, and in order to know what the inner and outer dimensions are — we could store the shape together with the data like so:
struct Matrix {
data: Vec<f64>,
shape: [usize; 2],
}fn check_shape_matches_data(data_size: usize, new_shape: &[usize; 2]) -> Result<(), String> {
let shape_size = new_shape[0] * new_shape[1];
if data_size != shape_size {
return Err(format!("shape {new_shape:?} doesn't match data of size {data_size}"));
}
Ok(())
}
impl Matrix {
pub fn new(data: Vec<f64>, shape: [usize; 2]) -> Result<Self, String> {
check_shape_matches_data(data.len(), &shape)?;
Ok(Self { data, shape })
}
pub fn reshape(mut self, new_shape: [usize; 2]) -> Result<Self, String> {
check_shape_matches_data(self.data.len(), &new_shape)?;
self.shape = new_shape;
Ok(self)
}
}
[usize; 2]
is like Vec<usize>
but we guarantee that it will always have exactly 2 elements (as our matrix has only 2 dimensions). You can see how now we just need to compare the total size of the data
vec to the shape (in both Matrix::new
and Matrix::reshape
. We don’t need to iterate through any inner vectors, and check whether inner dimensions match, etc. (In the future — when we generalise to N-dimensions — we’ll also see how the lack of nested vecs makes that generalisation easier. Can you see how already?).
Most importantly though — the Matrix::reshape
function now doesn’t manipulate the underlying matrix data at all! Each reshape is now a very cheap operation.
Problem 2: Reshaping Matrix based on nested Vec
s is relatively complex and expensive.
Solution 2: Flatten the data
field and store shape
alongside!
Problem number 3 — accessing the data
Our Matrix
is looking better and better! But as we now settled on a pretty good basic solution to how the data is going to be stored — we need to start thinking about how it’s going to be accessed. You see currently, if we want to do something like this:
let matrix = Matrix::new(vec![1, 2, 3], [3, 1]).unwrap();
do_something_with_data(matrix.data);
do_something_with_shape(matrix.shape);
we will get an error! Neither data
nor shape
are public fields! So we can’t access them directly, unless we make them public. Should we do that?
struct Matrix {
pub data: Vec<f64>,
pub shape: [usize; 2],
}
This is a terrible idea! Let’s consider the following example:
let mut matrix = Matrix::new(vec![1, 2, 3], [3, 1]).unwrap();
matrix.data.push(10);
Can you see what’s wrong? We have now allowed the users of Matrix
to invalidate its internal consistency. The data
and shape
are mismatched and there’s nothing stopping this! We have broken our contract — Matrix
is not always valid!
The alternative is to use encapsulation — keep the fields private, and only expose functionality as needed through associated functions and methods. One such functionality might be to be able to see the shape of our matrix (without the ability to change it — reshape
already provides that).
struct Matrix {
data: Vec<f64>,
shape: [usize; 2],
}impl Matrix {
/* ... */
pub fn get_shape(&self) -> &[usize; 2] {
&self.shape
}
}
Sweet! Nice and simple!
But what if we wanted to see what values are in the data
field, or what if we wanted to change them?
It is tempting to start writing something like this:
impl Matrix {
/* ... */
pub fn get(&self, index: [usize; 2]) -> f64 { todo!() }
pub fn get_mut(&mut self, index: [usize; 2]) -> &mut f64 { todo!() }
}
But we can make this behaviour consistent with rusts generic indexing operator by implementing the Index
and IndexMut
operator traits for our Matrix
.
use std::ops::{Index, IndexMut};fn convert_index(shape: &[usize; 2], index: &[usize]) -> usize {
index[0] * shape[1] + index[1]
}
impl Index<[usize; 2]> for Matrix {
type Output = f64;
fn index(&self, index: [usize; 2]) -> &Self::Output {
let data_index = convert_index(&self.shape, &index);
&self.data[data_index]
}
}
impl IndexMut<[usize; 2]> for Matrix {
fn index_mut(&mut self, index: [usize; 2]) -> &mut Self::Output {
let data_index = convert_index(&self.shape, &index);
&mut self.data[data_index]
}
}
impl Index<&[usize]> for Matrix {
type Output = f64;
fn index(&self, index: &[usize]) -> &Self::Output {
if index.len() != self.shape.len() {
panic!("index must have length {}", self.shape.len());
}
let data_index = convert_index(&self.shape, index);
&self.data[data_index]
}
}
impl IndexMut<&[usize]> for Matrix {
fn index_mut(&mut self, index: &[usize]) -> &mut Self::Output {
if index.len() != self.shape.len() {
panic!("index must have length {}", self.shape.len());
}
let data_index = convert_index(&self.shape, index);
&mut self.data[data_index]
}
}
This allows the users of our Matrix
to index it directly like so:
let matrix = Matrix::new(vec![1, 2, 3, 4], [2, 2]);
println!("{}", matrix[[1, 0]]); // prints "3"
matrix[[1, 0]] = 10;
println!("{}", matrix[[1, 0]]); // prints "10"
Problem 3: The users of Matrix
could not access neither data
nor shape
.
Solution 3: We decided to keep the fields private, but expose certain functionality such as indexing or borrowing of shape through methods and operator trait implementation.
Linear Regression
We now have everything we need to implement rudimentary linear regression. I am going to use the following formula for modelling the regression line.
The LinearRegression::fit
takes two vectors — x
and y
. x
is the input data of shape [N, 1]
— there’s N
rows — and each row has only 1 dimension (x on the number line). y
is the expected output. It must have the same dimensions as x
.
pub struct LinearRegression {
intercept: f64, // a in the equation
coefficient: f64, // b in the equation
}impl LinearRegression {
pub fn fit(x: Matrix, y: Matrix) -> Self {
let shape_x = x.get_shape();
let shape_y = y.get_shape();
if shape_x[1] != 1 {
panic!("LinearRegression works only with 1 dimensional data points");
}
if shape_y[1] != 1 {
panic!("LinearRegression works only with 1 dimensional data points");
}
if shape_x[0] != shape_y[0] {
panic!("shape of x and y do not match");
}
// Calculate mean_x and mean_y
let n = shape_x[0];
let mut mean_x = 0.;
let mut mean_y = 0.;
for i in 0..n {
mean_x += x[[i, 0]];
mean_y += y[[i, 0]];
}
mean_x /= n as f64;
mean_y /= n as f64;
// Calculate numerator and denominator;
let mut numerator = 0.;
let mut denominator = 0.;
for i in 0..n {
let x_err = x[[i, 0]] - mean_x;
let y_err = y[[i, 0]] - mean_y;
numerator += x_err * y_err;
denominator += x_err * x_err;
}
let coefficient = numerator / denominator;
let intercept = mean_y - coefficient * mean_x;
Self { intercept, coefficient }
}
pub fn predict(&self, x: f64) -> f64 {
self.intercept + x * self.coefficient
}
}
This works!
#[cfg(test)]
mod tests {
use super::*;#[test]
fn test_predict() {
let regression = LinearRegression { intercept: 5., coefficient: 2.3 };
assert_eq!(regression.predict(10.), 28.);
assert_eq!(regression.predict(99.), 232.7);
}
#[test]
fn test_fit() {
let x = Matrix::new(vec![1., 2., 3., 4., 5., 6.], [6, 1]).unwrap();
let y = Matrix::new(vec![1., 1.25, 1.5, 1.75, 2., 2.25], [6, 1]).unwrap();
let regression = LinearRegression::fit(x, y);
assert_eq!(regression.intercept, 0.75);
assert_eq!(regression.coefficient, 0.25);
assert_eq!(regression.predict(99.), 25.5);
}
}
In part 2 I’m going to clean up our linear regression code, implement additional operator traits to enable arithmetic operations on vectors, implement broadcasting, and start generalising the matrix to N-dimensions.
Feedback
I’m @bamwasylow on Twitter — any feedback is appreciated!