The use of transforms in data compression algorithms is at least 40 years old. The goal of this three-part series of posts is to provide the mathematical background necessary for understanding transforms and to explain why they are a valuable part of many compression algorithms.

I’m focusing on video since that’s my particular interest. Part 1 reviewed vectors, the dot product and orthonormal bases. Part 2 (this post) will introduce the use of matrices for describing both one and two-dimensional transforms. Finally, Part 3 gives an example and explain intuitively why transforms are a valuable part of video compression algorithms.

The view of transforms introduced in Part 1 can be efficiently expressed using matrix notation. I will assume that readers are familiar with three concepts from matrix algebra: matrix multiplication, the matrix transpose and the inverse of a matrix.

Consider an arbitrary set of n orthonormal unit vectors. As discussed in Part 1, these vectors define a transform (see eq. 1 in Part 1). Eq. 1 can be compactly written in matrix form as

\underline{y}=A\underline{x}

where A is the matrix that has as its rows the orthonormal basis vectors that define our transform (recall that the superscript T means we write the vector as a row):

A=\begin{pmatrix} \underline{u}_{0}^{T} \\ \underline{u}_{1}^{T} \\ \underline{u}_{1}^{T} \\ \underline{u}_{2}^{T} \\ \vdots \\ \underline{u}_{n-1}^{T}\end{pmatrix}

We see immediately from the rule of matrix multiplication that

y_0=\underline{u_0}\cdot \underline{x};y_1=\underline{u_1}\cdot \underline{x};...

In other words y, the transform of x, is obtained by a matrix multiplication. The matrix multiplication gives us the projections of x onto the basis vectors. How about the inverse transform? With respect to eq. 1m we can left-multiply both sides by the inverse of the matrix A to obtain

A^{-1}\underline{y}=\underline{x}

or equivalently

\underline{x}=A^{-1}\underline{y}

So the inverse transform is also a matrix multiplication. Furthermore, because the rows of A are a set of orthonormal vectors, the inverse of A is just AT! (Note that I’m limiting our discussion to vectors and matrices with real-valued components; however, the ideas presented here are easy to generalize to complex numbers. Complex valued matrices and vectors are needed for the DFT, but the most popular transforms for image compression, including the DCT, can all be computed with real arithmetic.)

To see that A-1 is the transpose of A, ponder the following matrix equation until the orthonormality of the u vectors makes it clear that the equation is true:

\begin{pmatrix} \underline{u}_{0}^{T} \\ \underline{u}_{1}^{T} \\ \underline{u}_{1}^{T} \\ \underline{u}_{2}^{T} \\ \vdots \\ \underline{u}_{n-1}^{T}\end{pmatrix}(\underline{u}_{0}\underline{u}_{1}\underline{u}_{2} \cdots \underline{u}_{n-1})=\begin{pmatrix}1 & 0 & 0 & \cdots & 0 \\ 0 & 1 & 0 & \cdots & 0 \\ \space & \space & \vdots \\ 0 & 0 & 0 & \cdots & 1 \end{pmatrix} = 1

Note the integral role that the dot product plays at every step. Multiplying two matrices together is just computing the dot product between the appropriate row and column. To compute the i, j’th component of the product of two matrices, we compute the dot product between the i‘th row of the left matrix and the j’th column of the right matrix.

Pulling all this together, we have the following sequence of equations for the inverse transform:

\underline{x}=A^{T}\underline{y}

\underline{x}=(\underline{u}_{0}\underline{u}_{1}\underline{u}_{2}\cdots \underline{u}_{n-1})\underline{y}

\underline{x}=(\underline{u}_{0}\underline{u}_{1}\underline{u}_{2}\cdots \underline{u}_{n-1})\begin{pmatrix} y_{0} \\ y_{1} \\ y_{2} \\ \vdots \\ y_{n-1} \end{pmatrix}

\underline{x}=y_{0}\underline{u}_{0}+y_{1}\underline{u}_{1}+y_{2}\underline{u}_{2}+\cdots+y_{n-1}\underline{u}_{n-1}

The last equation in this sequence makes clear that x is reconstructed as a linear combination of the orthonormal basis vectors and that the components of y provide the appropriate weights.

When dealing with a digital image, it is natural to think of the signal as being inherently two-dimensional. After all, digital images comprise multiple rows of pixels. In other words, the image to be transformed is itself a matrix! Of course, we could concatenate the rows of an image one after another and turn the 2D image signal into a one-dimensional signal, but it is better to confront the 2D nature of image data head-on. The transform perspective we outlined above is based on a 1D perspective. How can we extend it to deal with 2D data?

It turns out that a large class of important transforms, including the DFT and the DCT, are separable. This means that a two-dimensional data matrix like an image can be 2D transformed by doing a 1D transform on all the columns, storing the transformed columns in an intermediate matrix, and then doing a 1D transform on all the rows of the intermediate matrix. Fortunately, this process can also be easily expressed with matrix operations. Consider the following equation, where A is our 1D transform matrix and X is a 2D data matrix

W=AX

Now, X is nothing more than a collection of 1D column vectors, and the matrix multiply operation transforms each column of X using the matrix A. So the matrix W is the intermediate matrix mentioned above — the matrix obtained by transforming all the columns of X. Now consider the following:

Y=WA^{T}

Recall that AT is just

A^{T}=(\underline{u}_{0}\underline{u}_{1}\underline{u}_{2}\cdots\underline{u}_{n-1})

So we see that this matrix multiply operation transforms each row of W. In other words, the matrix Y is the desired 2D transform and we can write

Y=AXA^{T}

With respect to the inverse transform, recalling that A-1 is AT, we can left multiply eq. 3 by AT, and right multiply the result by A to get the following expression for the 2D inverse transform

X=A^{T}YA

It is natural at this point to ask the following question: What plays the role of basis vectors in 2D transforms? The answer is basis matrices. Before deriving an expression for the basis matrices, it is convenient to introduce some new notation. Let the matrix 1i,j be the matrix with a 1 in location i,j and a 0 in all other locations (here i indexes the row and j indexes the column). Let yi,j denote the component of Y in row i and column j. Finally, we define the “product” of two vectors a and b as follows (note that this vector product is nothing more than standard matrix multiplication of an n-by-1 matrix with a 1-by-n matrix):

\underline{a}\space\underline{b}^{T}=\begin{pmatrix} a_{0}b_{0} & a_{0}b_{1} & a_{0}b_{2} & \cdots & a_{0}b_{n-1} \\ a_{1}b_{0} & a_{1}b_{1} & a_{1}b_{2} & \cdots & a_{1}b_{n-1} \\ a_{2}b_{0} & a_{2}b_{1} & a_{2}b_{2} & \cdots & a_{2}b_{n-1} \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ a_{n-1}b_{0} & a_{n-1}b_{1}\ & a_{n-1}b_{2} & \cdots & a_{n-1}b_{n-1}\end{pmatrix}

Now we’re ready to rewrite eq. 4 and obtain expressions for the basis matrices. We proceed by first writing Y as a sum of n2 appropriately weighted 1i,j matrices (i and j each vary from 0 to n-1 as we step through every row and column in the Y matrix). Then we distribute the matrix multiplications and simplify. Mathematically, we have the following:

X=A^{T}YA \newline X=A^{T}(y_{0,0}1_{0,0}+y_{0,1}1_{0,1}+ \cdots + y_{n-1,n-1}1_{n-1,n-1})A \newline X = y_{0,0}A^{T}1_{0,0}A+y_{0,1}A^{T}1_{0,1}A+\cdots+y_{n-1,n-1}A^{T}1_{n-1,n-1}A \newline X=y_{0,0}\underline{u}_{0}\underline{u}_{0}^{T}+y_{0,1}\underline{u}_{0}\underline{u}_{1}^{T} + \cdots + y_{n-1,n-1}\underline{u}_{n-1}\underline{u}_{n-1}^{T} \newline X=y_{0,0}B_{0,0}+y_{0,1}B_{0,1}+ \cdots + y_{n-1, n-1}B_{n-1, n-1}

where the i, j’th basis matrix is given by

B_{i,j}=\underline{u}_{i}\underline{u}_{j}^{T}

From eq. 5 it is clear that there are n2 basis matrices and that the i, j’th matrix is scaled by yi,j in order to reconstruct X. Furthermore, the basis matrices have a certain intuitive appeal. They are constructed by taking every possible product of the basis vectors.

In Part 3, I’ll pull all this together with an example, and discuss the discrete cosine transform (DCT), a transform useful for image compression.