Practical Linear Algebra -- NFM & SVD

1. Motivation

We want to decompose a matrix using an outer product of two vectors

2. SVD Singular Value Decomposition

2.1. Motivation

we except the topics to be orthogonal, so it can express most information

2.2. Idea

2.3. Application

recommendation
calculate pseudoinverse
data compression
PCA

3. NMF Non-negative Matrix Factorization

3.1. Motivation

It’s hard to explain face image while using PCA since there is negative value

So this method avoid negative value

Benefits: fast and easy to use

3.2. Idea

Deprecate orthogonal idea, another idea: constrain them to be non-negative

Goal: Do decomposition
$$
V \approx W H
$$
Approach: add penalty to punish negative elements, that means the result will be just a approximation, and not-unique

Using SGD(Stochastic Gradient Descent), add

3.3. Application

Face decomposition
bioinformatics
Topic modeling

4. Faster SVD

4.1. Motivation

Normal SVD is pretty slow, we need to speed it up
Data are often missing or inaccurate
Take advantage if GPUs

We are just interested in the largest singular values

So introduce randomized algorithms that

more stable
needed matrix-vector products can be done in parallel
performance guaranteed

4.2. Idea

Use a smaller matrix

Before: calculate SVD of $A(m\times n)$
Basic Math: Johnson-Lindenstrauss Lemma

a small set of points in a high-dimensional space can be embedded into a space of much lower dimension in such a way that distances between the points are nearly preserved.

After: calculate SCD of $B(m,r)=AQ\space,\space r<<n$

4.3. Computation

Find an approximation $Q$ to the range of $A$, here range means linear combination.

That means $Q$ has similar column space with $A$ but fewer columns
$$
A\approx Q Q^tA
$$

Construct $B=Q^TA$, which is small
Compute the SVD of $B=S \Sigma V^{T}$
Since $A\approx Q Q^T A=QB=QS \Sigma V^{T}$

if we set $U=QS$, we can get
$$
A\approx U\Sigma V^T
$$

Done!

But how do we get $Q$ in step 1?

Take a bunch of random vector $w_i$ and form a matrix $W$, in this way ,$AW$ represent the space that column of $A$ can span.
Calculate $AW=QR$, where $Q$ form an orthonormal basis for $Aw$