In the first post of this three-part series, I listed four points that I hope my readers will agree with at the end of this series. The second post addressed the first two points of the four. In this post, Part Three of the series, I will demonstrate the final two points:

1. Phase distortions generally have less effect on human perception than magnitude distortions; and
2. Two audio clips can be recognized by humans as matching despite having dramatically different spectrograms.

In Part One I explained the concept of a spectrogram and how it is computed using the DFT. In Part Two we looked at the effect of distortions on human aural perception. We found that in some cases phase distortions change the time domain waveform but have no effect on our perception. In other cases, phase distortions clearly affect the audio, but the distortions have no impact on our ability to easily recognize a clip. In this final part, we will look at the effect of spectral magnitude distortions. Unlike the case with phase distortions, magnitude distortions change the spectrogram.

Let’s begin with the clip x03.wav first introduced in Part One. Recall that it is a sum of five sinusoids, with frequencies of the five sinusoids, in Hz, are {500 Hz, 1000 Hz, 1500 Hz, 2000 Hz, 2500 Hz}. The magnitudes of the five sinusoids are {1000, 2000, 750, 1000, 1500}. The waveform x03.wav was formed from the following sum:

x03(n)=\displaystyle\sum_{l=1}^{5}a_{l}\cos(2 \pi f_{l}nT+\theta_{l})

where the sampling rate is 48 kHz (T=1/48,000) and the five phase values f are {0, 0, 0, 0, 0}. What happens if we change the magnitude values to the five randomly chosen values {427, 716, 2113, 1382, 373}? The result is the file x05.wav. Waveforms for both x03.wav and x05.wav are shown below:

The spectrograms of these two waveforms appear below (x03.wav is on top). The frequencies are the same, although the 2500 Hz sine wave is barely visible in x05.wav because its magnitude is so small. It is clear that the intensities for each frequency in x05.wav are different from those in x03.wav; in other words, the spectrogram has changed.

If you listen to these two clips you’ll hear that it is easy to tell them apart. Note how different this is from the case examined in Part Two where a random change of the sinusoids’ phases led to a waveform that sounded the same, although its time domain graph showed it was different. This is an illustration of the fact that the ear is generally less sensitive to phase distortions than to magnitude distortions. Furthermore, if you compare x05.wav’s waveform above to that of x04.wav in Part Two I think you’ll agree that it is hard to look at two time domain waveforms and know in advance whether they will sound the same, or different, as a reference waveform (in this case, x03.wav).

In Part Two we did the experiment of keeping the magnitudes for the Spock clip the same while assigning random phases to the DFT coefficients. This does not change the spectrogram. What if we assign random magnitudes to the DFT coefficients while keeping the phases the same? This DOES change the spectrogram. Below are pictures of the spectrograms in these two cases; the corresponding audio files can be found in spock_ran_mag.wav and spock_m.wav.