Singular Value Decomposition (SVD), a powerful algorithm in the realm of Machine Learning. This article delves into the practical application of SVD for data reduction and dimensionality reduction.
Before we dive into singular value decomposition, let’s understand the concept of dimensionality reduction. Imagine you have data, like images or text, described with lots and lots of values. The goal of dimensionality reduction is to express this data in a simpler way. Why? Simplifying the description makes it easier to compare, process, and even plot the data. It’s like telling the story of our data using fewer and more straightforward terms.
But why SVD over other methods?
SVD stands out in cases where we prioritize simplicity, efficiency, and robustness. It goes beyond mere dimensionality reduction; SVD is about efficiently capturing and preserving the core components of data. It provides a practical solution to the challenge of extracting data and untangling it into independent components.
SVD:
In linear algebra, the Singular Value Decomposition (SVD) of a matrix is a factorization of that matrix into three matrices. SVD states that any matrix A can be factorized as:
A = USVᵀ
Where,
o U is the orthogonal matrix with orthonormal eigenvectors obtained from AAᵀ,
o S is a diagonal matrix with r elements equal to the root of the positive eigenvalues of AAᵀ or AᵀA (both matrices have the same positive eigenvalues),
o V is the orthogonal matrix with orthonormal eigenvectors chosen from AᵀA.
This formulation is represented by:
Anxm= Unxn Snxm Vmxm
An important detail to highlight is the construction of the diagonal matrix S.
Here, we arrange the singular values, in descending order along the diagonal of S. And we order the eigenvectors such that vectors with higher eigenvalues come before those with smaller values.
This lays the foundation for expressing matrix A as a sum of matrices, providing us with a unique insight into the underlying structure of the data. We’ll come to that later.
Now you might’ve a question ‘What’s the point in factorizing the matrix’. Let’s understand this with an example :
Let us think about the Users to Movies matrix. Let’s say we are working for a user review website and we have a matrix that consists of users and movies. User-based on their interest would have watched and rated a set of movies that they have seen. This means every column represents a different movie whereas every row corresponds to a different user.
We would like to decompose matrix A into thinner and smaller matrices U,Σ, and V. Here we can notice we have set users who prefer watching Sci-fi movies and another set of users (bottom three users) who prefer watching non-Sci-fi movies.
Here movies can be broken down into two groups Sci-fi and Non-Sci-fi and also users can be grouped down into Sci-fi lovers and Non-Sci-fi lovers. After applying Singular Value decomposition the resultant matrices can be shown below,
Left Singular Matrix (U): Columns of matrix U can be thought of as concepts, the first column of U corresponds to the SciFi concept and the second column of U corresponds to the Romance concept. What it basically shows is first 4 users correspond to scifi concept and the last 3 users correspond to romance concept. Matrix U would be “Users-to-Concept” similarity matrix. Each value in matrix U determines how much a given user corresponds to a given concept (in our case there are two concepts, SciFi and romance concept). In the given matrix for eg, the first user corresponds to SciFi-concept whereas the fifth user corresponds to romance-concept.
Singular Values (Σ): In this diagonal matrix, each diagonal values are non zero positive value. Each value depicts the strength of every concept. For instance, it can be seen “strength” of SciFi concept is higher than that of romance concept.
Right Singular Matrix (V): V is a “movie-to-concept” matrix. For instance, it shows that first three movies heavily belongs to the first concept i.e. SciFi concept while last two belongs to second concept which is romance concept.
We do have third concept but as we can notice the strength of third concept is too poor that we can ignore it and is nothing but the noise in our data.
A as the Sum of Its Matrix Components:
Consider the matrix U & S, the product U×S will yields:
Here the column vectors (orthonormal vector ui of matrix AAT is scaled to factor σi).
Then the product U×S×Vᵀ will result in:
The SVD decomposition can be recognized as a sum of matrices,
A = σ1u1v1ᵀ + σ2u2v2ᵀ + … + σnunvnᵀ
In practical scenarios involving large matrices, the significance of the last few singular values (Square root of eigenvalues, …σn-2, σn-1,σn ) diminishes. To approximate the matrix A efficiently, we consider only a certain range of eigenvalues, typically denoted as ‘r’. This selective approximation is known as the ‘Rank of Approximation’.
Then matrix A is approximated to:
A’ = σ1u1v1T + σ2u2v2T + … + σrurvrT
Now, you might wonder, why bother expressing matrix A as a sum of r matrices with the same dimensions (m×n)? The beauty of this SVD formulation lies in its ability to express the components of A.
It provides an important way to break down an m × n array of entangled data into r components. Since uᵢ and vᵢ are unit vectors, we can even ignore terms (σᵢuᵢvᵢᵀ) with very small singular value σᵢ.
Let’s illustrate the power of this approach with a worked example. Consider a matrix, A, with dimensions 10×9:
Instead of manual calculations, I will use Python libraries to do the calculations and later give you some examples of using SVD in data science applications. Employing a python program, I extracted the matrices, U,S and VT .
We get,
U =
S =
Vᵀ =
Upon observing the matrix S, it becomes evident that only two singular values, namely σ1 and σ2, significantly contribute, while the majority of other singular values converge to zero. Consequently, matrix A can be represented as the sum of matrices σ1u1v1T and σ2u2v2T.
A = σ1u1v1ᵀ + σ2u2v2ᵀ
Where,
σ1u1v1ᵀ =
And σ2u2v2ᵀ =
In this context, the initial matrix A is generated using four vectors: u1 and u2, both of dimension 10×1, and v1ᵀ and v2ᵀ, both of dimension 1×9. Additionally, the matrix involves two singular values, σ1 and σ2. It’s important to note that not all matrices exhibit a reduction in singular values, as seen in our case. Upon examining the 10×9 matrix under consideration, one can observe that the last seven columns are formed through linear combinations of the first two columns. Thus, only two eigen vectors hold significant importance.
Application in image compression:
Assume an image as a matrix of pixels with dimensions 2000×1500. Consider the intricate details within this matrix, where the closeness between consecutive columns is evident. The similarity is striking, particularly between the first and second columns, extending to subsequent columns. This intricacy underscores the idea that these columns are not entirely independent; rather, they exhibit patterns and relationships. In light of this, expressing these columns as a product of σiuiviᵀ highlights the interplay of singular values, singular vectors, and their role in capturing the nuanced structures within the image. The parameter ‘r’ becomes crucial in determining how much of this complexity is retained or reduced, influencing the quality of the final result. Through a Python program, I explored these nuances by generating images based on varying values of ‘r,’ unraveling the intercolumnar relationships and showcasing the impact of dimensionality reduction on image representation.
The impact of different ‘r’ factors on image is given below:
In determining the value of ‘r,’ consider plotting the cumulative sum of singular values. This graph reveals the cumulative energy retained as ‘r’ increases. Identify the point where the curve levels off, indicating diminishing returns. This threshold offers a guide for selecting ‘r,’ enabling a balance between compression and information preservation.(Refer python program for graph)
References: