*The use of transforms in data compression algorithms is at least 40 years old. The goal of this three-part series of posts is to provide the mathematical background necessary for understanding transforms and to explain why they are a valuable part of many compression algorithms. *

*I’ll focus on video since that’s my particular interest. Part 1 (this post) will review vectors, the dot product and orthonormal bases. Part 2 introduces the use of matrices for describing both one and two-dimensional transforms. Finally, Part 3 gives an example and explains intuitively why transforms are a valuable part of video compression algorithms.*

Remember back in high school when you learned the “dot product” between two vectors? It turns out that this operation is at the heart of a powerful way of thinking about transforms such as the DFT, the DCT, and the integer transform used in H.264. In this post, part 1 of a three-part series, I review some key concepts we’ll need, including vectors, the dot product and orthonormal bases. So let’s get started!

An *n* dimensional vector is just a collection of *n* numbers. By convention the numbers are usually written in a column, like this:

\underline{x}=\begin{pmatrix} x_{0} \\ x_{1} \\ x_{2} \\ \vdots \\ x_{n-1} \end{pmatrix}

Vector concepts are valuable in signal processing because a discrete, or digital, signal is nothing more than a collection of numbers. In other words, a discrete signal (of finite length) is a vector. The numbers in a vector can be samples of a music waveform, pixels in an image or samples from any signal that tickles your fancy. We call the numbers that make up a vector the *components* of the vector. Any operation that can be performed on a vector can be thought of as an operation performed on a discrete signal.

Now, an *n-*dimensional vector can be viewed in two ways:

- as a point in
*n*-dimensional space (i.e., the components of the vector are the coordinates of a point); or - as a directed line segment with its tail at the origin and its head at the point specified by its coordinates (this is the way we normally think of a vector — as an “arrow” with a definite direction).

The Pythagorean Theorem tells us how long a vector is as a function of its components:

\| \underline{x} \| = \sqrt{\displaystyle\sum_{i=0}^{n-1}x_{i}^{2}}

If you have two vectors, then the *dot product* between them is computed by multiplying their corresponding components together and summing up the products. Mathematically we express this as

\underline{x} \cdot \underline{y} = \displaystyle\sum_{i=0}^{n-1}x_{i}y_{i}

We see immediately that

\| \underline{x} \| = \sqrt{ \underline{x} \cdot \underline{x}}

It can also be shown that

\underline{a} \cdot \underline{b} = \| \underline{a} \|\| \underline{b} \| \cos \theta

This is just the vector formulation of the famous law of cosines you learned in trigonometry class, where θ is the angle between the vectors ** a** and

**. Now, recall that the cosine of a 90 degree angle is zero — so from this equation, it is clear that**

*b***and**

*a***are**

*b**orthogonal*, i.e., the angle between them is 90 degrees, if their dot product is zero.

If a vector has length one, we call it a *unit* vector. We can always form a unit vector that points in the same direction as an arbitrary vector ** b** by applying the formula

\underline{u} = \dfrac{\underline{b}}{\| \underline{b} \|}

(Recall that multiplying or dividing a vector by a scalar means multiplying or dividing each component of the vector by the scalar.)

Finally, we can find the projection of a vector ** a** in the direction of a unit vector

**by computing the dot product between them:**

*u*\underline{a} \cdot \underline{u} = \| \underline{a} \| \cos \theta

Let’s consider an example for the case of three-dimensional vectors. We note that (1, 0, 0), (0, 1, 0), and (0, 0, 1) are all unit vectors; furthermore, they are all orthogonal to each other because the dot product between any pair of them is zero. Of course (1, 0, 0) points in the direction of the *x*‑axis, (0, 1, 0) points in the direction of the *y*‑axis, and (0, 0, 1) points in the direction of the *z*‑axis. Because these three unit vectors are all orthogonal to each other we say these vectors are *orthonormal* to each other (here “normal” refers to the fact that the vectors have a length of 1—they’ve been normalized).

Now, let *x*^{T} = (*x*_{0}, *x*_{1}, *x*_{2}) be an arbitrary vector in 3D space. (The superscript “T” means transpose; when we write a vector horizontally we refer to it as the transpose of the vertical vector. But I may be sloppy at times and let the context make clear whether a vector is horizontally or vertically oriented.) The dot product of ** x **with (1, 0, 0) is

*x*

_{0}, the dot product of

**with (0, 1, 0) is**

*x**x*

_{1}, and the dot product of

**with (0, 0, 1) is**

*x**x*

_{2}. In other words,

*x*

_{0},

*x*

_{1}, and

*x*

_{2}are the projections of the vector

**in the directions of the**

*x**x*,

*y*, and

*z*axes.

Furthermore, the unit vectors (1, 0, 0), (0, 1, 0), and (0, 0, 1) form an *orthonormal basis* for the set of all 3D vectors because any 3D vector can be written as a scaled sum of these three vectors. In particular:

(x_{0},x_{1},x_{2})=x_{0}(1,0,0)+x_{1}(0,1,0)+x_{2}(0,0,1)

Now we’re ready for a key insight into transforms. *The vectors (1, 0, 0), (0, 1, 0), and (0, 0, 1) are* *not the only orthonormal basis vectors*. For example, one easy way to get another set of orthonormal vectors is to rotate the unit vectors that point in the *x*-axis and *y*‑axis directions by 45 degrees, while leaving the (0, 0, 1) vector unchanged. Doing this we get the three unit vectors (0.707, 0.707, 0), (-0.707, 0.707, 0), and (0, 0, 1). It is easy to compute the dot product of every pair of these vectors and show that they are mutually orthogonal; it is also easy to see that all the vectors have a length of 1. What if we take the projection of (*x*_{0}, *x*_{1}, *x*_{2}) onto each of these new vectors? We get

y_{0} = (0.707,0.707,0) \cdot (x_{0},x_{1},x_{2})=0.707x_{0}+0.707x_{1} \newline y_{1} = (-0.707,0.707,0) \cdot (x_{0},x_{1},x_{2})=-0.707x_{0}+0.707x_{1} \newline y_{2} = (0,0,1) \cdot (x_{0},x_{1},x_{2})=x_{2}

Now we can express the vector *x*^{T} as a sum of these three new unit vectors as follows:

(x_{0},x_{1},x_{2})= y_{0}(0.707,0.707,0)+y_{1}(-0.707,0.707,0)+y_{2}(0,0,1)

In other words, given the three projections *y*_{0}, *y*_{1} and *y*_{2}, we can reconstruct the numbers *x*_{0}, *x*_{1} and *x*_{2} from them by using them to scale their corresponding unit vectors and then adding up the scaled vectors.

In fact, for *any* set of three orthonormal vectors, we can compute the projection of ** x** onto each of the vectors, and given these projections, we can reconstruct

**exactly as demonstrated above. Conceptually, we “transform”**

*x***into**

*x***by computing the projections onto the unit vectors of our orthonormal basis (eq. 1); given**

*y***, we “inverse transform” back to**

*y***by using the components of**

*x***to scale the basis vectors and then add them up (eq. 2). Every set of orthonormal basis vectors defines a transform.**

*y*When the goal is to compress the vector (i.e., the discrete signal) ** x**, then it may be easier to first transform

**into a new vector**

*x***and compress**

*y***instead. Of course, given**

*y***, the receiver can apply the inverse transform to recover the original vector**

*y***. If the right orthonormal vectors are chosen — in other words, if the right transform is used — the benefit can be dramatic! — but more on that in Part 3. First, we need to develop a more efficient formulation of the above concepts using matrix notation. That will be the topic of Part 2.**

*x*