This is series of blog posts in that we will learn about LoRA. Lets first understand SVD and Low ran Matrix Factorization then we will move for Lora Theory and code.

In this blog post, we will deep dive into Singular Value Decomposition (SVD) and explore how it relates to low rank matrix factorization. We will also demonstrate these concepts with Python code using PyTorch.

Singular Value Decomposition (SVD) is a powerful matrix factorization technique used in various fields such as data science, machine learning, and signal processing. It decomposes a rectangular matrix into three constituent matrices, providing valuable insights into the structure and properties of the original matrix.

For a matrix *A* of size *m*×*n*, the SVD of *W* can be expressed as:

`W = U.S.V^T`

Where:

*U*is an*m*×*m*unitary matrix containing the left singular vectors.*S*is an*m*×*n*diagonal matrix with singular values arranged in descending order.*V^T*is the conjugate transpose of an*n*×*n*unitary matrix*V*containing the right singular vectors.

The singular values in *S* are non-negative and represent the importance of each singular vector in capturing the variability of the original matrix.

Low rank matrix factorization is a technique used to approximate a matrix by reducing its rank, thereby capturing essential information with fewer parameters. This can be especially beneficial in scenarios where memory or computational resources are limited.

Let’s dive into the code example provided to understand SVD and low rank matrix factorization better.

`import torch`

torch.random.manual_seed(0)

# Define dimensions and low rank

d, n = 10, 10

low_rank = 2

# Create a low rank matrix W

W = torch.randn(d, low_rank) @ torch.randn(low_rank, n)

print(W)# tensor([[-1.0797, 0.5545, 0.8058, -0.7140, -0.1518, 1.0773, 2.3690, 0.8486,

# -1.1825, -3.2632],

# [-0.3303, 0.2283, 0.4145, -0.1924, -0.0215, 0.3276, 0.7926, 0.2233,

# -0.3422, -0.9614],

# [-0.5256, 0.9864, 2.4447, -0.0290, 0.2305, 0.5000, 1.9831, -0.0311,

# -0.3369, -1.1376],

# [ 0.7900, -1.1336, -2.6746, 0.1988, -0.1982, -0.7634, -2.5763, -0.1696,

# 0.6227, 1.9294],

# [ 0.1258, 0.1458, 0.5090, 0.1768, 0.1071, -0.1327, -0.0323, -0.2294,

# 0.2079, 0.5128],

# [ 0.7697, 0.0050, 0.5725, 0.6870, 0.2783, -0.7818, -1.2253, -0.8533,

# 0.9765, 2.5786],

# [ 1.4157, -0.7814, -1.2121, 0.9120, 0.1760, -1.4108, -3.1692, -1.0791,

# 1.5325, 4.2447],

# [-0.0119, 0.6050, 1.7245, 0.2584, 0.2528, -0.0086, 0.7198, -0.3620,

# 0.1865, 0.3410],

# [ 1.0485, -0.6394, -1.0715, 0.6485, 0.1046, -1.0427, -2.4174, -0.7615,

# 1.1147, 3.1054],

# [ 0.9088, 0.1936, 1.2136, 0.8946, 0.4084, -0.9295, -1.2294, -1.1239,

# 1.2155, 3.1628]])

# Compute the rank of W using SVD

import numpy as np

rank = np.linalg.matrix_rank(W)

print(f"Rank of W is {rank}")

#Rank of W is 2

In the code snippet above:

- We generate a low rank matrix
*W*of dimensions 10×1010×10 using random tensors and matrix multiplication. - We then compute the rank of
*W*using NumPy’s`linalg.matrix_rank`

function, which internally utilizes SVD.

Next, let’s understand SVD in more detail and reconstruct the matrix *W* using low rank factorization.

`# Perform SVD on W`

U, S, V = torch.svd(W)

# Take lower rank approximation

U_r = U[:, :low_rank]

S_r = torch.diag(S[:low_rank])

V_r = V[:, :low_rank].t()

# Reconstruct the matrix using low rank approximation

W_r = U_r @ S_r @ V_r

print(f"Original Matrix W:n{W[:, :5]}")

print(f"Reconstructed Matrix W_r:n{W_r[:, :5]}")# Original Matrix W

# tensor([[-1.0797, 0.5545, 0.8058, -0.7140, -0.1518],

# [-0.3303, 0.2283, 0.4145, -0.1924, -0.0215],

# [-0.5256, 0.9864, 2.4447, -0.0290, 0.2305],

# [ 0.7900, -1.1336, -2.6746, 0.1988, -0.1982],

# [ 0.1258, 0.1458, 0.5090, 0.1768, 0.1071],

# [ 0.7697, 0.0050, 0.5725, 0.6870, 0.2783],

# [ 1.4157, -0.7814, -1.2121, 0.9120, 0.1760],

# [-0.0119, 0.6050, 1.7245, 0.2584, 0.2528],

# [ 1.0485, -0.6394, -1.0715, 0.6485, 0.1046],

# [ 0.9088, 0.1936, 1.2136, 0.8946, 0.4084]])

# Reconstructed Matrix W_r

# tensor([[-1.0797, 0.5545, 0.8058, -0.7140, -0.1518],

# [-0.3303, 0.2283, 0.4145, -0.1924, -0.0215],

# [-0.5256, 0.9864, 2.4447, -0.0290, 0.2305],

# [ 0.7900, -1.1336, -2.6746, 0.1988, -0.1982],

# [ 0.1258, 0.1458, 0.5090, 0.1768, 0.1071],

# [ 0.7697, 0.0050, 0.5725, 0.6870, 0.2783],

# [ 1.4157, -0.7814, -1.2121, 0.9120, 0.1760],

# [-0.0119, 0.6050, 1.7245, 0.2584, 0.2528],

# [ 1.0485, -0.6394, -1.0715, 0.6485, 0.1046],

# [ 0.9088, 0.1936, 1.2136, 0.8946, 0.4084]])

Here’s what happens in this part of the code:

- We perform SVD on
*W*to obtain the singular vectors and values. - We then extract the relevant components for the desired low rank approximation.
- Using these components, we reconstruct the matrix
*W*as*W_r* with reduced rank.

`# Check the equivalence of matrix multiplication using W and A@B`

B = U_r @ S_r

A = V_r

y = W @ x + bias

y1 = (B @ A) @ x + bias# Verify that y and y1 are approximately equal

print(f"Y using W: {y}")

print(f"Y using A@B: {y1}")

print(f"Matrix multiplication is close to equal: {torch.allclose(y, y1)}") #True

# Y using W : tensor([ 2.4027, 1.2373, 0.5844, -2.9918, 1.1609, 0.9889, -3.3747, -0.9453,

# -4.6008, -0.7128])

# Y using A@B : tensor([ 2.4027, 1.2373, 0.5844, -2.9918, 1.1609, 0.9889, -3.3747, -0.9453,

# -4.6008, -0.7128])

# matrix multiplication is close to equal : True

In this section, we:

- Define matrices
*A*and*B*based on the low rank components obtained from SVD. - Compute
*y*using the original matrix*W*and then compute*y*1 using the low rank factorization*A*@*B*. - Verify that both computations yield approximately equal results, showcasing the effectiveness of low rank matrix factorization.

It’s essential to note the significant reduction in parameters achieved by low rank factorization

`print(f"Total Parameters of W {W.nelement()}") #100`

print(f"Total Parameters of AB {A.nelement()+B.nelement()}") #40

This comparison clearly demonstrates the efficiency gains obtained through low rank matrix factorization, making it a valuable technique for optimizing computational tasks involving large matrices.

In conclusion, Singular Value Decomposition (SVD) is a fundamental matrix factorization technique that plays a crucial role in various applications. By leveraging SVD and low rank matrix factorization, we can effectively reduce the number of parameters while preserving important information in matrices, leading to efficient computations and resource utilization.

Through the code examples and explanations provided, we have gained insights into how SVD works, how to perform low rank matrix factorization, and how these concepts can be applied in practice to handle large matrices effectively.

Thank you for reading, and I hope this blog post provides a clear understanding of SVD and low rank matrix factorization. Feel free to experiment with the code examples provided to deepen your understanding further.

In next blog we will write code from scratch for LoRA to understand concepts.