<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Cardinal Peak&#039;s Blog</title>
	<atom:link href="http://www.cardinalpeak.com/blog/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://www.cardinalpeak.com/blog</link>
	<description>Engineering for embedded products</description>
	<lastBuildDate>Thu, 10 May 2012 22:54:45 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.2</generator>
		<item>
		<title>How Robust is Audio Perception in the Face of Deliberate Magnitude and Phase Distortions? (Part 2)</title>
		<link>http://www.cardinalpeak.com/blog/?p=1115</link>
		<comments>http://www.cardinalpeak.com/blog/?p=1115#comments</comments>
		<pubDate>Tue, 17 Apr 2012 20:53:57 +0000</pubDate>
		<dc:creator>Mike Perkins Managing Partner</dc:creator>
				<category><![CDATA[Audio]]></category>
		<category><![CDATA[Perk]]></category>
		<category><![CDATA[Theory]]></category>
		<category><![CDATA[audio processing]]></category>
		<category><![CDATA[DFT]]></category>

		<guid isPermaLink="false">http://www.cardinalpeak.com/blog/?p=1115</guid>
		<description><![CDATA[In this post I will demonstrate that dramatically different time domain waveforms can lead to virtually the same audio perception, and two waveforms with identical spectrograms can sound quite different.
]]></description>
			<content:encoded><![CDATA[<p>In the <a href="http://www.cardinalpeak.com/blog/?p=1077">first post</a> of this three-part series, I listed four points that I hope my readers will agree with at the end of this series. In this post, Part Two of the series, I will demonstrate the first two of those four points:</p>
<ol>
<li>Dramatically different time domain waveforms can lead to virtually the same audio perception; and</li>
<li>Two waveforms with identical spectrograms can sound quite different.</li>
</ol>
<p>&nbsp;</p>
<p>In <a href="http://www.cardinalpeak.com/blog/?p=1077">Part One</a>, I summarized how a vector of length <em>N</em> real-valued audio samples is transformed by the DFT into an equal-length vector complex transform coefficients. The transform coefficients give us the magnitudes and phases of the sinusoids composing the vector of audio samples, so we sometimes refer to the transform coefficients as the <em>spectrum</em> of the audio samples. I will also use the term <em>time domain</em> when discussing the raw audio samples, and the term <em>frequency domain</em> when referring to the transformed coefficients (the spectrum).</p>
<p>Now, if we deliberately change the magnitudes of the transform coefficients, we introduce a <em>magnitude distortion</em>. When the distorted transform coefficients are used to reconstruct<em> </em>the time-domain audio samples, they will no longer be the same as the original audio samples. On the other hand, if we deliberately change the phases of the transform coefficients, we introduce a <em>phase distortion</em>. Both of these distortions are <em>spectral distortions</em> because they change the spectrum of the audio samples. Because there is a one-to-one relationship between a vector of audio samples and its spectrum, any change to the spectrum will cause a distortion in the reconstructed time domain samples.</p>
<p>Imagine processing a digitized audio clip in the following manner:</p>
<ol>
<li>Break the clip into non-overlapping blocks of <em>N</em> samples each</li>
<li>Apply a Discrete Fourier Transform to each length <em>N</em> block
<ol>
<li>Generate a spectrogram from the DFTs</li>
</ol>
</li>
<li>Spectrally distort the coefficients of each block in some manner
<ol>
<li>Generate a spectrogram from the distorted DFTs</li>
</ol>
</li>
<li>Synthesize <em>N</em> audio samples from the distorted transform coefficients, by performing an inverse DFT</li>
<li>Compare the original time-domain samples to the distorted samples. We’ll both graph them and listen to them.
<ol>
<li>Compare the spectrogram of the original samples to the spectrogram of the distorted samples.</li>
</ol>
</li>
</ol>
<p>Let’s begin with the clip <a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/x03.wav">x03.wav</a> introduced in the first post of this series. It is a sum of five sinusoids, with frequencies of {500 Hz, 1000 Hz, 1500 Hz, 2000 Hz, 2500 Hz}. The magnitudes of the five sinusoids are {1000, 2000, 750, 1000, 1500}. The waveform x03.wav was formed from the following sum:</p>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/04/1115-eq1.png"><img class="aligncenter size-full wp-image-1117" title="1115-eq1" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/04/1115-eq1.png" alt="" width="215" height="54" /></a></p>
<p>where the sampling rate is 48 KHz (<em>T=1/48,000</em>) and the five phase values f are {0, 0, 0, 0, 0}. What happens if we change the phase values, in degrees, to the five randomly chosen values {0, -48, 67, 33, -62}? The result is the waveform <a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/04/x04.wav">x04.wav</a>. Note that the spectrographs of x03.wav and x04.wav are <em>identical</em> because only the <em>phases</em> are being distorted. Both spectrographs look like this:</p>
<p>When I listen to these two waveforms, I cannot tell them apart. Nevertheless, it is easy to see that they are different in the time domain. Snippets from each waveform are shown below:</p>
<div id="attachment_1127" class="wp-caption aligncenter" style="width: 510px"><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/04/x03-waveform.png"><img class="size-full wp-image-1127" title="x03-waveform" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/04/x03-waveform-reduced.png" alt="" width="500" height="268" /></a><p class="wp-caption-text">x03 waveform</p></div>
<div id="attachment_1129" class="wp-caption aligncenter" style="width: 510px"><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/04/x04-waveform.png"><img class="size-full wp-image-1129" title="x04-waveform" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/04/x04-waveform-reduced.png" alt="" width="500" height="268" /></a><p class="wp-caption-text">x04 waveform</p></div>
<p>These graphs demonstrate the first point I wanted to make: dramatically different time domain waveforms can lead to the same audio perception. Perhaps this is really not so surprising—after all, files compressed with the MP3 and AAC algorithms are commonplace. Abstractly, these algorithms can be viewed as techniques for mapping <em>M</em> bits onto <em>N</em> bits where <em>N &lt; M</em>. For algorithms such as these that achieve significant compression, <em>N</em> is much less than <em>M</em>, and the mapping distorts the waveform (we therefore say these algorithms are lossy, not lossless). Most of the time we cannot hear the difference between the original and compressed waveforms. Nevertheless, I think it is humbling and important to keep in mind that the ear can be easily fooled into thinking that two distinctly different time-domain waveforms are “identical” when in fact they are not.</p>
<p>How about the case where phase distortions are applied to real music as opposed to the synthetically-generated periodic waveform above? Consider the <a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/spock_m.wav">spock_m.wav</a> file introduced in post 1. What happens if we set the phase of every transform coefficient to zero? It sounds like this: <a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/04/spock_phase0.wav">spock_phase0.wav</a>. A graph of the same spot in the two waveforms is shown below (spock_m.wav is in red):</p>
<div id="attachment_1125" class="wp-caption aligncenter" style="width: 510px"><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/04/spock-phase0-waveform.png"><img class="size-full wp-image-1125" title="spock-phase0-waveform-reduced" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/04/spock-phase0-waveform-reduced.png" alt="" width="500" height="268" /></a><p class="wp-caption-text">Spock_m in green, Spock_phase0 in red</p></div>
<p>In this case there is no denying that you can hear the difference between the waveforms, <em>even though they have identical spectrograms</em>! Recall that this was the second point I set out to make. (In terms of simply recognizing what you are hearing, however, I’m sure you had no difficulty in identifying <a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/04/spock_phase0.wav">spock_phase0.wav</a> as the Spock clip, even though it sounds different than <a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/spock_m.wav">spock_m.wav</a>.)</p>
<p>How about if we randomly change the phase of every coefficient in every DFT block by using a random number generator to generate a phase value between -π and π for each coefficient? Doing so we obtain <a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/04/spock_phase_ran.wav">spock_phase_ran.wav</a>. This clip is surprisingly easy to recognize, even if Spock does sound like he’s suffering from some weird space sickness. The original and distorted time domain waveforms are shown below for the same spot as graphed above.</p>
<div id="attachment_1123" class="wp-caption aligncenter" style="width: 510px"><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/04/spock-phase-ran-waveform.png"><img class="size-full wp-image-1123" title="spock-phase-ran-waveform-reduced" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/04/spock-phase-ran-waveform-reduced.png" alt="" width="500" height="268" /></a><p class="wp-caption-text">Spock_m in green, Spock_phase_ran in red</p></div>
<p>Finally, just in case you are a Sherlock Holmes fan, here are the corresponding two waveforms for that theme song: <a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/04/holmes_phase0.wav">holmes_phase0.wav</a>, <a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/04/holmes_phase_ran.wav">holmes_phase_ran.wav</a><span style="text-decoration: underline;">. </span></p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cardinalpeak.com/blog/?feed=rss2&#038;p=1115</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/x03.wav" length="288046" type="audio/wav" />
<enclosure url="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/04/x04.wav" length="288046" type="audio/wav" />
<enclosure url="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/spock_m.wav" length="1761324" type="audio/wav" />
<enclosure url="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/04/spock_phase0.wav" length="1761324" type="audio/wav" />
<enclosure url="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/04/spock_phase_ran.wav" length="1761324" type="audio/wav" />
<enclosure url="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/04/holmes_phase0.wav" length="1916972" type="audio/wav" />
<enclosure url="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/04/holmes_phase_ran.wav" length="1916972" type="audio/wav" />
		</item>
		<item>
		<title>Spectral Analysis with the DFT</title>
		<link>http://www.cardinalpeak.com/blog/?p=1077</link>
		<comments>http://www.cardinalpeak.com/blog/?p=1077#comments</comments>
		<pubDate>Fri, 23 Mar 2012 20:33:46 +0000</pubDate>
		<dc:creator>Mike Perkins Managing Partner</dc:creator>
				<category><![CDATA[Audio]]></category>
		<category><![CDATA[Perk]]></category>
		<category><![CDATA[Theory]]></category>
		<category><![CDATA[audio processing]]></category>
		<category><![CDATA[DFT]]></category>

		<guid isPermaLink="false">http://www.cardinalpeak.com/blog/?p=1077</guid>
		<description><![CDATA[You may have encountered spectral analysis. The basic idea is to take a waveform, in our case an audio clip, and determine which frequency components are in it. Think of passing light through a prism and breaking it into a rainbow.]]></description>
			<content:encoded><![CDATA[<p>We have recently been working on a system that requires short audio clips to be compared against a reference set of longer audio segments to determine if a match for the short clip exists in the reference set. In many applications, including ours, the short clip may have been recorded and processed—even compressed—prior to comparing it to the segments in the reference set. Many approaches for solving audio matching and related problems (including speaker and word recognition) rely on spectrograms. See, for example:</p>
<ul>
<li><a href="http://labrosa.ee.columbia.edu/matlab/alignmidiwav/">http://labrosa.ee.columbia.edu/matlab/alignmidiwav/</a></li>
<li><a href="http://code.google.com/p/py-astm/">http://code.google.com/p/py-astm/</a></li>
<li><a href="http://laplacian.wordpress.com/2009/01/10/how-shazam-works/">http://laplacian.wordpress.com/2009/01/10/how-shazam-works/</a></li>
<li><a href="https://ccrma.stanford.edu/~jos/st/Spectrograms.html">https://ccrma.stanford.edu/~jos/st/Spectrograms.html</a></li>
</ul>
<p>However, in this three part blog series I will not delve into the details of spectrogram matching. In fact, I want to do the opposite: I want to show how robust human audio perception is in the face of deliberate spectral distortions. In this context, robustness means our ability to recognize one clip as being a distorted version of another.</p>
<p>This post provides a very brief overview of the DFT and spectrograms, and introduces the audio waveforms I’ll be using in this series of posts. Part Two will look at the effect of phase distortions on our ability to recognize a clip. Finally, Part Three will consider spectral magnitude distortions.</p>
<p>At the end of this series, I hope you’ll agree with me that:</p>
<ol>
<li>Dramatically different time domain waveforms can lead to virtually the same audio perception;</li>
<li>Two waveforms with identical spectrograms can sound quite different;</li>
<li>Phase distortions generally have less effect on human perception than magnitude distortions; and</li>
<li>Two audio clips can be recognized by humans as matching despite having dramatically different spectrograms.</li>
</ol>
<p>&nbsp;</p>
<p>You may have already encountered spectral analysis. The basic idea is to take a waveform, in our case an audio clip, and determine which frequency components are in it. Think of passing light through a prism and breaking it into a rainbow. The best-known tool for this sort of analysis is the Fourier Transform. For this series of posts we’ll use the Discrete Fourier Transform, the appropriate choice for sampled data. I’m going to provide a brief review of the DFT below, but if you want to dig deeper, I highly recommend <a href="https://ccrma.stanford.edu/~jos/mdft/">this book</a>.</p>
<p>The DFT transforms a vector of length <em>N</em> real-valued samples, such as audio samples, into a vector of Length <em>N</em> complex transform coefficients. The DFT transform is invertible, so that the original audio samples can be obtained from the transform coefficients. To make this a bit more concrete, let</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-1079" title="1077-eq1" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/1077-eq1.png" alt="" width="130" height="27" /></p>
<p>be <em>N</em> real-valued audio samples obtained at a sampling rate of <em>Fs.</em> The sampling period is therefore <em>T=1/Fs.</em> Let</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-1080" title="1077-eq2" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/1077-eq2.png" alt="" width="134" height="24" /></p>
<p>be the <em>N</em> complex-valued DFT transform coefficients. The original audio samples can be uniquely reconstructed from the transform coefficients via the following formula:</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-1081" title="1077-eq3" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/1077-eq3.png" alt="" width="187" height="51" /></p>
<p>Conceptually, what this formula is saying is: Each transform coefficient multiplies a complex exponential basis vector. <a href="http://en.wikipedia.org/wiki/Euler's_formula">Euler’s formula</a> tells us that these complex exponential basis vectors are sinusoidal in nature. The weighted sum of all the basis vectors yields the original waveform.</p>
<p>The frequency of the <em>k</em>’th DFT basis vector is given by <em>kFs/N. </em>The FFT is a fast algorithm for computing the DFT transform coefficients. I wrote a previous blog series (<a href="http://www.cardinalpeak.com/blog/?p=674">part 1</a>, <a href="http://www.cardinalpeak.com/blog/?p=740">part 2</a>, <a href="http://www.cardinalpeak.com/blog/?p=781">part 3</a>) on the use of transforms for image compression, and those posts also contain basic information about DFT-like transforms, emphasizing a matrix representation. <em> </em></p>
<p>Now, a complex number, <em>a + ib</em>, can be graphed as a point in the two-dimensional plane with the real part on the “x-axis” and the imaginary part on the “y-axis”:</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-1098" title="imaginary coordinates" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/imaginary-coordinates.png" alt="" width="250" height="237" /></p>
<p>Thinking of complex numbers as two-dimensional points, it is clear that they can also be represented in polar form, where the magnitude and phase are given by</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-1082" title="1077-eq4-5" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/1077-eq4-5.png" alt="" width="112" height="60" /></p>
<p>I have chosen to write the inverse tangent in this form to emphasize that the phase can be any number between -π and π. A complex number can lie in any of the four quadrants of the plane, and the signs of <em>a</em> and <em>b</em> determine in which quadrant the phase lies. We also have the relationship</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-1083" title="1077-eq6" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/1077-eq6.png" alt="" width="95" height="25" /></p>
<p>We can apply this to the inverse DFT formula as follows. Let</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-1084" title="1077-eq7" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/1077-eq7.png" alt="" width="103" height="27" /></p>
<p>Each transform coefficient therefore has a magnitude and a phase. Substituting these terms into the expression for <em>x(n)</em> we have</p>
<div><span style="font-family: Cambria; font-size: small;"><img class="aligncenter size-full wp-image-1085" title="1077-eq8" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/1077-eq8.png" alt="" width="202" height="54" /></span></div>
<p>We see that each transform coefficient changes the magnitude and shifts the phase of its corresponding complex exponential basis vector. In other words, the transform coefficients give us the magnitudes and phases of the sinusoids composing the vector of <em>N</em> audio samples.</p>
<p>So what is a spectrogram? Let’s assume that the audio clip has been sampled at 48 kHz. In its simplest form, a spectrogram can be created by: 1) breaking the audio clip into a series of non-overlapping segments (vectors) of some constant length <em>N</em>; 2) transforming each vector of samples into its corresponding DFT coefficients; 3) creating a new vector from the transform vector by taking each coefficient’s magnitude; 4) plotting the magnitudes in a 3-D graph (each magnitude vector is one row in the graph). One arrangement for the 3‑D graph is time along the y-axis, frequency along the x-axis, and magnitude along the z-axis. Using this convention, the spectrogram below illustrates a single 2 KHz sinusoidal tone. (<a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/x01.wav">Listen</a> to this tone.)</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-1089" title="x01" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/x01.png" alt="" width="500" height="35" /></p>
<p>In this spectrogram, time starts at the top and progresses downward. Think of the spectrogram as growing downward over time. Since only a single sinusoid is present, we see a thin straight line, indicating that only a single frequency is present as time progresses. The spectrogram is normalized so that, for each row in the picture, the largest magnitude DFT coefficient in that row is maximally white (i.e., it is 255 on an 8-bit scale). In terms of frequency, DC is on the left of the spectrograph, and the highest frequency is on the right. I used a DFT size of 4096. Since the sampling frequency is 48 kHz, the frequency increment between DFT basis vectors is 11.72 = 48,000/4,096 Hz. However, I only plotted the first 500 coefficients, so the highest frequency present—all the way on the right of the spectrograph—is 500 * 11.72 = 5,859 Hz.</p>
<p>The spectrogram below illustrates a clip with a sinusoid at 2 kHz for 1.5 seconds followed by a sinusoid at 3 kHz for 1.5 seconds. (<a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/x02.wav">Listen</a> to this sample.)</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-1090" title="x02" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/x02.png" alt="" width="500" height="35" /></p>
<p>The normalization of the spectrograph magnitudes is in dB. The highest power DFT coefficient present for a row is 0 dB, and the other coefficients in that row are displayed relative to the highest power coefficient. The display is linearly stretched so that a sinusoid 20 dB below the 0dB coefficient would be pure black (i.e., 0 on an 8-bit gray scale).</p>
<p>The spectrogram below illustrates five sinusoidal tones at different frequencies—all multiples of 500 Hz—with different amplitudes. (<a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/x03.wav">Listen</a> to this sample.)</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-1091" title="x03" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/x03.png" alt="" width="500" height="35" /></p>
<p>I’ll be using this particular waveform more in future posts.</p>
<p>Finally, below are two spectrograms from actual music. The first spectrogram is for a short segment of the 1980’s BBC Sherlock Holmes theme song, while the second is a song by Spock.</p>
<p>Holmes:</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-1086" title="holmes_m" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/holmes_m.png" alt="" width="500" height="234" /></p>
<p>Spock:</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-1088" title="spock_m" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/spock_m.png" alt="" width="500" height="215" /></p>
<p>Listen to these clips while looking at the spectrograms (<a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/holmes_m.wav">Holmes</a>, <a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/spock_m.wav">Spock</a>). Can you see where the Spock song switches from music to voice? Can you see the violin’s vibrato in the Holmes song?</p>
<p>Coming soon: what happens to our ability to identify these clips if we intentionally distort the magnitudes and phases of the DFT coefficients from which the spectrograms are derived?</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cardinalpeak.com/blog/?feed=rss2&#038;p=1077</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/x01.wav" length="288046" type="audio/wav" />
<enclosure url="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/x02.wav" length="288046" type="audio/wav" />
<enclosure url="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/x03.wav" length="288046" type="audio/wav" />
<enclosure url="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/holmes_m.wav" length="1916972" type="audio/wav" />
<enclosure url="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/spock_m.wav" length="1761324" type="audio/wav" />
		</item>
		<item>
		<title>Measuring the Bitrate of a Video Stream</title>
		<link>http://www.cardinalpeak.com/blog/?p=1054</link>
		<comments>http://www.cardinalpeak.com/blog/?p=1054#comments</comments>
		<pubDate>Tue, 06 Mar 2012 22:27:04 +0000</pubDate>
		<dc:creator>Howdy Pierce Managing Partner</dc:creator>
				<category><![CDATA[Howdy]]></category>
		<category><![CDATA[Networking]]></category>
		<category><![CDATA[Video]]></category>
		<category><![CDATA[adaptive bit rate]]></category>
		<category><![CDATA[network capture]]></category>
		<category><![CDATA[Wireshark]]></category>

		<guid isPermaLink="false">http://www.cardinalpeak.com/blog/?p=1054</guid>
		<description><![CDATA[Occasionally we need to measure the bitrate of a particular video stream on the network. In this example I will show how to measure the data rate of a video streamed from Amazon.com.]]></description>
			<content:encoded><![CDATA[<p>Occasionally we need to measure the bitrate of a particular video stream on the network. Since I have found myself explaining somewhat regularly how to do this with <a href="http://www.wireshark.org/">Wireshark</a>, I thought it might be worthwhile to post the instructions here.</p>
<p>In this example I will show how to measure the data rate of a video streamed from Amazon.com<em>.</em> The same technique can be used to measure the rate of <em>any </em>network stream. And if you combine this approach with what I outlined in <a href="http://www.cardinalpeak.com/blog/?p=519">this post</a>, you can measure the bitrate of a stream that is being consumed by an embedded device like an iPad or Blu-ray player.</p>
<p>Here’s what you do:</p>
<ol>
<li>Start Wireshark and set it to capture all traffic.</li>
<li>Go to the application or website of interest, and start the video playing. Because the video is usually pretty bursty, you will want to average the measurement over an interesting period of time—at least a minute, and probably longer. In the screen shots below, I let my video play for 57 seconds.</li>
<li>After capturing the video for a period of time, stop the Wireshark capture.</li>
</ol>
<p>You probably captured quite a bit of data, so you need to narrow in to the particular TCP stream(s) in question. In Wireshark, select Statistics &gt; Conversations, and then select the TCP tab. In my case there were 64 different TCP streams captured during the one-minute Wireshark capture. Presumably the stream we want is the one where the largest number of bytes were delivered, so to narrow in this stream, you’ll want to scroll to the right a little and sort the Conversations window by the column “Bytes A ← B”:</p>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/conversations-window-annotated.png"><img class="aligncenter size-full wp-image-1060" title="Conversations window" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/conversations-window-annotated.png" alt="" width="500" height="221" /></a></p>
<p>Sure enough, one of the streams that we captured saw 14,363,612 bytes transferred from a server to our client. That’s probably the video. Doing a reverse hostname lookup on the server’s IP address gets this:</p>
<pre><strong>% host 23.3.68.6</strong></pre>
<pre class="last">6.68.3.23.in-addr.arpa domain name pointer a23-3-68-6.deploy.akamaitechnologies.com.</pre>
<p>At this point, there are two ways to determine the bitrate of this particular stream.</p>
<p><strong>The first and easiest way</strong> is to scroll the Conversation window to the right. Wireshark is telling me that this particular stream lasted for 42.7 seconds and had an average bitrate of 2.689 Mbps:</p>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/conversations-window-bitrate-detail.png"><img class="aligncenter size-full wp-image-1056" title="Conversations window bitrate detail" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/conversations-window-bitrate-detail.png" alt="" width="500" height="78" /></a></p>
<p>This is interesting for a couple of reasons. One: Wow, Amazon is delivering 2.6 Mbps of video to us—that seems really high! (Especially since the content in question was not particularly hard to compress or delivered at stunningly high quality.) Two: I watched roughly 57 seconds of video when performing this capture, but Wireshark is saying that the video was only moving across the network for roughly 43 seconds (the “Duration” column).</p>
<p>Both observations imply that the video was buffered. We know that the 14,363,612 bytes transferred in the capture contained <em>at least</em> enough video for 57 seconds of display. Dividing 14,363,612 by 57 seconds reduces the average bitrate to 2.015 Mbps—and this is a ceiling, because there was probably some unplayed video left in our decoder’s buffer when I stopped the capture.</p>
<p>Which brings us to the <strong>second and deeper way to understand the bitrate of a particular stream</strong>. To do this, you need to narrow the display filter so that only the TCP stream of interest is shown. The easiest way I know to do this is to select the stream in the Conversations window, and then click the “Follow Stream” button at the bottom. This will do two things: One, it will narrow the display filter in the main window to be exactly what we want. And two, it will bring up a “Follow TCP Stream” window that we don’t need for our purposes. So after clicking “Follow Stream” you should close both the Follow TCP Stream window and also the Conversations window. You should be left with something that looks like this:</p>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/main-window.png"><img class="aligncenter size-full wp-image-1062" title="Wireshark main window" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/main-window.png" alt="" width="500" height="309" /></a></p>
<p>Note the display filter—in this case <code>tcp.stream eq 73</code>. That’s an internal Wireshark index. You could equally construct a rule with some AND and OR operators and the IP address and TCP port numbers, but this way we got Wireshark to figure that out for us.</p>
<p>Now for the fun. Select Statistics &gt; IO Graphs. Like most of Wireshark, the window that comes up is both very powerful, and evidently designed by someone with no eye for user interfaces. To make the chart meaningful, you will want to set the Y Axis to “Bits/Tick”, as shown here:</p>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/IO-Graphs-window.png"><img class="aligncenter size-full wp-image-1061" title="IO Graphs window" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/03/IO-Graphs-window.png" alt="" width="500" height="313" /></a></p>
<p>Here you can see when the data transited the network, and it’s awfully interesting. It looks like Amazon is pushing out quick bursts of data, and each burst contains at least 35 seconds worth of encoded video. That first hump in the graph bursts to over 10 Mbps, but since it contains video for at least 35 seconds, the underlying video is encoded at a rate no higher than 1.1 Mbps. Cool!</p>
<p>Related posts:</p>
<ul>
<li><a href="http://www.cardinalpeak.com/blog/?p=519">Sniffing iPad Traffic</a></li>
<li><a href="http://www.cardinalpeak.com/blog/?p=775">An explanation of ABR and Progressive Download video</a></li>
</ul>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cardinalpeak.com/blog/?feed=rss2&#038;p=1054</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Future of Clutter</title>
		<link>http://www.cardinalpeak.com/blog/?p=1038</link>
		<comments>http://www.cardinalpeak.com/blog/?p=1038#comments</comments>
		<pubDate>Wed, 11 Jan 2012 14:58:32 +0000</pubDate>
		<dc:creator>Howdy Pierce Managing Partner</dc:creator>
				<category><![CDATA[Engineering Management]]></category>
		<category><![CDATA[Howdy]]></category>

		<guid isPermaLink="false">http://www.cardinalpeak.com/blog/?p=1038</guid>
		<description><![CDATA[If you’re looking to predict what technologies will be obsolete soon, visiting the Kodak booth at CES is not a bad place to start. I came to this realization as I was watching a demo from one of Kodak’s partners, [...]]]></description>
			<content:encoded><![CDATA[<p>If you’re looking to predict what technologies will be obsolete soon, visiting the <a href="http://www.kodak.com/ek/US/en/Home.htm">Kodak</a> booth at <a href="http://www.cesweb.org/">CES</a> is not a bad place to start.</p>
<p>I came to this realization as I was watching a demo from one of Kodak’s partners, <a href="http://www.unibind.com/site/index.php?lang=us">Unibind</a>. Unibind is demonstrating a new machine at CES that allows retailers to create a hardbound book out of pictures taken by a customer in a manner of minutes. The machine sells to the retailer for less than $10,000, and it was actually a pretty cool demo (with apologies for the poor smartphone snapshots):</p>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/01/Unibind.jpg"><img class="aligncenter size-full wp-image-1040" title="Unibind" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2012/01/Unibind.jpg" alt="" width="500" height="251" /></a></p>
<p>Now, I don’t know the first thing about this market, and perhaps Unibind’s new product will be successful for them in the short term. In the long term I wouldn’t bet on it.</p>
<p>Watching this demo, it struck me that in the not-too-distant future, everything that <em>can </em>be delivered into my house digitally <em>will</em> be delivered that way.</p>
<p>We’re in the midst of this transition now: My family no longer gets a daily newspaper, because we read newspapers on the iPad. We’re in the process of eliminating magazine subscriptions for the same reason. Music and movies no longer come on round plastic discs, but are downloaded and, increasingly, streamed form the cloud. Negatives and prints of our photos disappeared long ago. Board games are being replaced by electronic versions. We don’t keep catalogs and junk mail, because when we want to buy something we go to a web site. And although I have an emotional attachment to books, in the last nine months I’ve transitioned the majority of my reading to my Kindle.</p>
<p>Ultimately, we’ll have far less clutter floating around our family rooms, even as there continues to be a proliferation in the number of devices to read, view, and listen to all this content. Even the mess of cables that once accompanied these devices is slowly shrinking, owing both to standardization on a smaller set of connectors and the accelerating shift to wireless data and inductive charging. The clean, minimalist look of modernist architecture was just 50 years ahead of its time.</p>
<p>So thanks, but a coffee table book filled with pictures of my kids just feels so &#8230; <em>2010</em>.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cardinalpeak.com/blog/?feed=rss2&#038;p=1038</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>We&#8217;ve Moved</title>
		<link>http://www.cardinalpeak.com/blog/?p=1013</link>
		<comments>http://www.cardinalpeak.com/blog/?p=1013#comments</comments>
		<pubDate>Tue, 15 Nov 2011 16:08:20 +0000</pubDate>
		<dc:creator>Howdy Pierce Managing Partner</dc:creator>
				<category><![CDATA[Engineering Management]]></category>
		<category><![CDATA[Howdy]]></category>

		<guid isPermaLink="false">http://www.cardinalpeak.com/blog/?p=1013</guid>
		<description><![CDATA[I’m a little late in posting this here, but it’s been a busy couple of months. As has already been reported in the local press, Cardinal Peak moved in late September. We’ve been adding some folks in the past year, [...]]]></description>
			<content:encoded><![CDATA[<p>I’m a little late in posting this here, but it’s been a busy couple of months. As has already <a href="http://www.bcbr.com/article.asp?id=60618">been reported</a> in the local press, Cardinal Peak moved in late September.</p>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2011/11/1380FPC.jpg"><img class="aligncenter size-full wp-image-1015" title="1380 Forest Park Circle" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2011/11/1380FPC.jpg" alt="" width="500" height="270" /></a></p>
<p>We’ve been adding some folks in the past year, and we had outgrown our previous location. But we also had a couple of other goals in addition to simply securing more space:</p>
<ul>
<li>We wanted to get our team into a single location that would encourage collaboration and also allow our developers to have natural light and, when they wanted it, silence to think.</li>
<li>We also wanted to get some flexible lab space that would lend itself to rapid reconfiguration—so we could rapidly take on new projects.</li>
</ul>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2011/11/CP-sign.jpg"><img class="aligncenter size-full wp-image-1016" title="Cardinal Peak sign" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2011/11/CP-sign.jpg" alt="" width="500" height="368" /></a></p>
<p>The new space is great: Our engineers are housed two or three to an office, which seems from my experience to be the ideal tradeoff between enough interaction to encourage collaboration and enough serenity to think through hard problems. The offices are arranged around the perimeter of the building so that every person has big windows that actually open. And, as befits our location in Boulder County, we’ve also got dedicated indoor bike parking and men’s and women’s showers.</p>
<p>In the center of the space we’ve got two large labs plus a server room that can be easily reconfigured as our project mix changes over time.</p>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2011/11/hardware_lab.jpg"><img class="aligncenter size-full wp-image-1017" title="hardware_lab" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2011/11/hardware_lab.jpg" alt="" width="500" height="347" /></a></p>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2011/11/software_lab.jpg"><img class="aligncenter size-full wp-image-1018" title="software_lab" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2011/11/software_lab.jpg" alt="" width="500" height="341" /></a></p>
<p>If you’re in the area, let us know—we’d love to give a tour!</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cardinalpeak.com/blog/?feed=rss2&#038;p=1013</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using Lossy Video Compression in the Courtroom</title>
		<link>http://www.cardinalpeak.com/blog/?p=990</link>
		<comments>http://www.cardinalpeak.com/blog/?p=990#comments</comments>
		<pubDate>Thu, 25 Aug 2011 14:40:12 +0000</pubDate>
		<dc:creator>Howdy Pierce Managing Partner</dc:creator>
				<category><![CDATA[Howdy]]></category>
		<category><![CDATA[Video]]></category>

		<guid isPermaLink="false">http://www.cardinalpeak.com/blog/?p=990</guid>
		<description><![CDATA[I’m at the DSI conference in Las Vegas today, presenting a primer for law enforcement investigators on how video compression works and trying to answer the question of why “lossy” compression should be considered reliable for use in courtrooms. 

The lack of trust in digital media compression in a forensic setting is primarily a PR issue for the media compression industry, if such an industry can be said to exist. We use terms like “lossy compression” and “predicted blocks”—terms that have relatively precise technical meaning. But these terms also have a slightly different meaning to laymen, and that everyday meaning isn’t exactly reassuring if you’re a judge relying on testimony compressed using a lossy compression algorithm. ]]></description>
			<content:encoded><![CDATA[<p>I’m at the <a href="http://www.dsi-vegas.com/Presentations.aspx">DSI conference</a> in Las Vegas today, presenting a primer for law enforcement investigators on how video compression works and trying to answer the question of why “lossy” compression should be considered reliable for use in courtrooms. (My slides are available <a href="http://www.cardinalpeak.com/downloads/DSI_2011.pptx">here</a>, and I welcome comments on them.) I think I was invited to speak because of our <a href="http://casecracker.cardinalpeak.com">CaseCracker</a> product, which is used to record custodial interrogations, although what I’m discussing is only slightly related.</p>
<p>The lack of trust in digital media compression in a forensic setting is primarily a PR issue for the media compression industry, if such an industry can be said to exist. We use terms like “lossy compression” and “predicted blocks”—terms that have relatively precise technical meaning. But these terms also have a slightly different meaning to laymen, and that everyday meaning isn’t exactly reassuring if you’re a judge relying on testimony compressed using a lossy compression algorithm. So it’s important for lawyers and investigators working in the criminal justice system to understand how image compression works.</p>
<p>The technical meaning of “lossy compression” is that the process of encoding followed by the process of decoding doesn’t output the exact same file as the source file you started out with:</p>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2011/08/lossy-system-pic1.png"><img class="aligncenter size-full wp-image-992" title="Compression System" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2011/08/lossy-system-pic1.png" alt="" width="500" height="104" /></a></p>
<p>When we say the output file isn’t the same as the source file, what we mean is that a byte-for-byte comparison of the two files will fail—not that a guy protesting his innocence will be turned into a different guy admitting his guilt. In fact, with a well-implemented codec, the mathematical lossiness shouldn’t be subjectively noticeable at all. Intuitively, everyone knows that: Nobody worries about using lossy media compression for recording videos of their kids’ birthdays or pictures of their vacations.</p>
<p>But still, it’s worth thinking about the question as to how to state with certainty that lossy compression algorithms should be considered reliable for courtroom use.</p>
<p>In preparing for this talk, I tried to think of all the ways that video compression is lossy. I came up with four independent sub-processes that each contribute to a codec’s overall lossiness:</p>
<ul>
<li>Resolution reduction: Often the video resolution is reduced prior to encoding, because this can dramatically diminish the number of bits to encode. The result is that the output is fuzzier and less crisp.</li>
<li>Color sub-sampling: The human eye is not equally sensitive to luminance and chrominance changes, so chroma is normally subsampled, which typically reduces the color information in the picture by a factor of 4 and the total uncompressed size of the picture by a factor of 2. The color sub-sampling is not usually perceptible except in test patterns explicitly designed to expose it.</li>
<li>Noise reduction and other pre-filtering: Sometimes video encoders, particularly expensive ones, will filter the image prior to encoding in order to remove noise and otherwise make the image easier to compress. This might result in a softer image in certain cases, but again it normally won’t make any subjective difference in the output.</li>
<li>Quantization: This is a technical term that loosely translates to “rounding”. The basic idea is that the human eye can’t usually discern small differences in intensity. So why waste a lot of bits faithfully preserving the difference between a 66% gray block and a 69% gray block, when the viewer will perceive them as the same thing anyway? By quantizing both blocks to an average value—say, 67% gray—the encoder is able to dramatically reduce the amount of information it needs to send. (The same concept applies to high frequencies in the image.) Quantization is responsible for the majority of lossiness in video compression, but again, its use is normally not perceptible except in the lab.</li>
</ul>
<p>I’m not a lawyer, thank heaven, but I’m pretty sure the relevant legal issue is whether a piece of video evidence <em>accurately reproduces the event it purports to record.</em> And so in a law enforcement setting, the ultimate answer is that someone who is trusted needs to be able to testify that a particular video clip faithfully represents what happened.</p>
<p>Related posts:</p>
<ul>
<li><a href="http://www.cardinalpeak.com/blog/?p=908">Elcomsoft&#8217;s Hack of Image Authentication</a></li>
<li><a href="http://www.cardinalpeak.com/blog/?p=861">A Propeller-Head Visits Vegas, part 1</a></li>
<li><a title="Edit “Transforms for Video Compression, part 1: Vectors, the Dot Product, and Orthonormal Bases”" href="http://www.cardinalpeak.com/blog/?p=674">Transforms for Video Compression, part 1: Vectors, the Dot Product, and Orthonormal Bases</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.cardinalpeak.com/blog/?feed=rss2&#038;p=990</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using Royalties to Pay for Engineering Services</title>
		<link>http://www.cardinalpeak.com/blog/?p=957</link>
		<comments>http://www.cardinalpeak.com/blog/?p=957#comments</comments>
		<pubDate>Fri, 27 May 2011 18:28:12 +0000</pubDate>
		<dc:creator>Mike Perkins Managing Partner</dc:creator>
				<category><![CDATA[Engineering Management]]></category>
		<category><![CDATA[Perk]]></category>

		<guid isPermaLink="false">http://www.cardinalpeak.com/blog/?p=957</guid>
		<description><![CDATA[At Cardinal Peak, we are willing to include royalties as part of our compensation under certain circumstances, but there are powerful constraints limiting our appetite for such deals. So in this post I’ll explain a few of our reservations about royalties.
]]></description>
			<content:encoded><![CDATA[<p>Recently, I <a href="http://www.cardinalpeak.com/blog/?p=941">wrote about</a> two different payment models for engineering services: time &amp; materials, and fixed price. I briefly mentioned the concept of using a promise of downstream participation in a product’s sales as way to compensate the provider of engineering services.</p>
<p>At Cardinal Peak, we are willing to include royalties as part of our compensation under certain circumstances, but there are powerful constraints limiting our appetite for such deals. So in this post I’ll explain a few of our reservations about royalties.</p>
<p>From the customer’s perspective, the idea of compensating an engineering services provider like Cardinal Peak through royalties has two attractions. One, it can reduce the up-front cost and create a situation where both risk and reward are shared between the customer and Cardinal Peak. And two, which is related, the idea is that the royalty can make Cardinal Peak feel more committed to the customer’s business success.</p>
<p>Clearly, the effect on Cardinal Peak of depending on royalties for part of our compensation is to force us to share in the customer’s <em>business risk</em>—in other words, to bet on the customer’s success. Although a royalty stream can clearly be structured in such a way that Cardinal Peak’s potential return is greater than it would be under a T&amp;M model, it is important to bear in mind that <em>we have</em> <em>little or no ability to affect the customer’s</em> <em>business risk</em>. This naturally limits our willingness to make this bet.</p>
<p>We don’t run the customer. We don’t sit on the customer’s board and the customer won’t be calling us for business advice. For example, we cannot affect how many sales people the customer hires, how much it spends on marketing, how much money it raises from investors, which deals it chooses to pursue, which products it chooses to develop, which markets it targets with the product we designed for them, and so forth. There is even the risk that the customer decides at some point after introducing “our” product that it would rather focus the bulk of its effort on some other product or market that doesn’t result in a royalty payment to us.</p>
<p>Finally, there are practical difficulties associated with negotiating royalty streams. For example, are they time limited or tied to the number of units sold? Are they associated with specific products or are they a fixed percentage of the customer’s total revenue stream? Do royalties also apply to derivatives of the initial product, even if the derivatives aren’t developed by Cardinal Peak? Often we know very little about the customer’s business situation and have never read their business plan (if they have one).</p>
<p>To make a long story short, there is usually so much we don’t know about the business, and so much we can’t control about how the customer is run, that our appetite for sharing the business risk—although <em>definitely greater than zero</em>—is finite. Needless to say, the more established the customer is in its market, and the longer its track record, the more willing we are to make this bet. But in my experience, the converse is also true: The more certain the customer is that the product will sell, the less likely they are to offer us a royalty.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cardinalpeak.com/blog/?feed=rss2&#038;p=957</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using Windows Movie Maker with the Kodak Zi8</title>
		<link>http://www.cardinalpeak.com/blog/?p=950</link>
		<comments>http://www.cardinalpeak.com/blog/?p=950#comments</comments>
		<pubDate>Wed, 18 May 2011 16:42:51 +0000</pubDate>
		<dc:creator>Ben Mesander Partner</dc:creator>
				<category><![CDATA[Ben]]></category>
		<category><![CDATA[ffmpeg]]></category>
		<category><![CDATA[Kodak Zi8]]></category>

		<guid isPermaLink="false">http://www.cardinalpeak.com/blog/?p=950</guid>
		<description><![CDATA[In a previous blog post, I mentioned I had a Kodak Zi8 video camera. This past weekend, I decided I wanted to try Windows Movie Maker (WMM) to edit videos produced with it, instead of the built-in Arcsoft MediaImpression software [...]]]></description>
			<content:encoded><![CDATA[<p>In a <a href="http://www.cardinalpeak.com/blog/?p=240">previous blog post</a>, I mentioned I had a <a href="http://store.kodak.com/store/ekconsus/en_US/pd/Zi8_Pocket_Video_Camera/productID.156585800">Kodak Zi8</a> video camera. This past weekend, I decided I wanted to try <a href="http://www.microsoft.com/windowsxp/downloads/updates/moviemaker2.mspx">Windows Movie Maker</a> (WMM) to edit videos produced with it, instead of the built-in Arcsoft MediaImpression software that the camera installs on the PC it is connected to.</p>
<p><img class="aligncenter size-full wp-image-241" title="Kodak Zi8" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2009/10/zi8-1.png" alt="Kodak Zi8" width="245" height="218" /></p>
<p>(Although I haven&#8217;t tested it, the procedure described below probably also works well with the newer Kodak cameras such as the PlaySport and PlayTouch that also use an <a href="http://www.ambarella.com/">Ambarella</a> encoder.)</p>
<p>The first issue I ran into was that WMM would not import the .MOV files the camera produces. Apparently if you use Windows 7, Windows Live Movie Maker may be able to import these files directly—I found conflicting reports on the web. But I have Windows XP, so I needed to find a way to use the older WMM.</p>
<p>I experimented with converting the output of the camera to several formats and found the format that worked best was AVI format files. I also tried WMV and MPEG-2 formats, but had issues with video artifacts in the editor.</p>
<p>First, you will need to install an appropriate set of codecs for your PC. To do this, I installed the <a href="http://ffdshow-tryout.sourceforge.net/">ffdshow-tryouts package</a> on my PC.</p>
<p>Then, I used <a href="http://www.ffmpeg.org/">ffmpeg</a> to convert the h.264/MOV format file the camera produces to MPEG-4/AVI format:</p>
<p><code>ffmpeg –i input.MOV –sameq output.AVI</code></p>
<p>In the line above, I’m transcoding the video from h.264 to MPEG-4 part 2 at the same time I am converting the container format from MOV to AVI. This seems to be the combination that WMM is happiest with, in terms of preview working well and so forth. My version of WMM seems to want an AVI container—although it supports some others, but not as well. Then the question becomes what codec to use inside it, and h.264 doesn’t seem to work on XP or Vista. So the combination of MPEG-4 part 2 / AVI seems to be the answer that works well on all three platforms (XP, Vista, and Windows 7).</p>
<p>I was able to import the resulting video into WMM and edit it. I did find that WMM had two peculiarities. The first is that it does not store the aspect ratio of the video in the project. For video from the Zi8, you will want to choose a 16:9 aspect ratio. The default is 4:3. To change this, in WMM, click “Tools” on the Movie Maker menu, and then choose “Options”. In the resulting dialog, click on the “Advanced” tab and choose the 16:9 widescreen aspect ratio.</p>
<p>The other limitation is somewhat disappointing: WMM will only output standard definition video (480p). I shot the original footage in 1080p, but there was no way to preserve this through the editing process. While the Arcsoft video editor that comes with the camera offers fewer creation options, it does handle HD video throughout the workflow, and the final video has fewer artifacts and is of higher quality.</p>
<p><object width="499" height="284"><param name="movie" value="http://www.youtube.com/v/cjmAWkmtah0?fs=1&amp;hl=en_US&amp;rel=0" /><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><embed type="application/x-shockwave-flash" width="499" height="284" src="http://www.youtube.com/v/cjmAWkmtah0?fs=1&amp;hl=en_US&amp;rel=0" allowfullscreen="true" allowscriptaccess="always"></embed></object></p>
]]></content:encoded>
			<wfw:commentRss>http://www.cardinalpeak.com/blog/?feed=rss2&#038;p=950</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why We Engage on a Time and Materials Basis</title>
		<link>http://www.cardinalpeak.com/blog/?p=941</link>
		<comments>http://www.cardinalpeak.com/blog/?p=941#comments</comments>
		<pubDate>Tue, 10 May 2011 20:34:15 +0000</pubDate>
		<dc:creator>Mike Perkins Managing Partner</dc:creator>
				<category><![CDATA[Engineering Management]]></category>
		<category><![CDATA[Perk]]></category>

		<guid isPermaLink="false">http://www.cardinalpeak.com/blog/?p=941</guid>
		<description><![CDATA[When a prospective customer calls us, they generally have one of two engagement models in mind: time and materials (T&#038;M), or fixed price. This post explains why Cardinal Peak strongly prefers T&#038;M engagements to FP ones—and why savvy customers should, too.
]]></description>
			<content:encoded><![CDATA[<p>When a prospective customer calls us, they generally have one of two engagement models in mind: time and materials (T&amp;M), or fixed price (FP).</p>
<p>Just to define the terms, T&amp;M is a billing model where a professional services firm such as Cardinal Peak invoices our customer a set amount for every hour we work; plus we pass through expenses (“materials”) at our cost. We expend effort up front to work out a detailed, estimated schedule and budget for the project, and we work hard to meet it. But at the end of the day, the up-front estimate is precisely that—an estimate—and we do not provide a guarantee we can complete the project on the schedule and budget we supplied. (Our track record is pretty good, though: We wouldn’t have too many repeat customers if we missed our estimates often.)</p>
<p>T&amp;M contrasts with the fixed-price billing model, where, as the name suggests, a services provider provides an up-front cost, and the customer makes payments as project milestones are reached. In the FP model, all the risk is on the services provider, because the amount of money the customer is paying is set up front.<span id="more-941"></span></p>
<p>(There is a third model that occasionally comes up, where a customer wants to avoid paying the services firm up front altogether, and instead the services firm is given some sort of participation in the sales of the product once it is in the marketplace. We call this a “royalty model”, and I will have more to say about it in a subsequent post.)</p>
<p>With the introduction out of the way, this post explains why Cardinal Peak strongly prefers T&amp;M engagements to FP ones—and why savvy customers should, too.</p>
<p>First, we do understand why our customer has a desire to obtain a fixed price bid. A fixed price allows the customer to bound its implementation cost up front. However, the customer achieves this benefit by pushing the entire <em>implementation risk</em> onto us.</p>
<p>The approach we must employ to mitigate this risk is:</p>
<ul>
<li>insist on highly detailed requirements documents up front;</li>
<li>significantly design the system prior to making a bid in order to enable a reasonable estimate to be generated; and</li>
<li>substantially mark up our best estimate to provide a buffer for the unanticipated, because “stuff” happens—and in addition, risk has a cost in all economic transactions, and if we are going to carry more risk we are going to insist on additional compensation.</li>
</ul>
<p>Unfortunately, rarely is a project so simple that the customer has developed a sufficiently detailed requirements document prior to approaching us. Which leads us to the first big FP problem: Significant work needs to be done, both refining project requirements and developing potential architectures, before we can generate anything close to a reasonable estimate. And the amount of work needed is generally so much that we need to charge for it, because real engineering is occurring during this phase.</p>
<p>Of course a well-managed T&amp;M project still requires a schedule and cost estimate, but in this case such up-front work is no problem. The customer simply engages Cardinal Peak for a Phase I scoping effort, which is usually relatively short. We then work closely with the customer to solidify the requirements, and use the result to generate a design, an engineering schedule, and an estimate. This Phase I effort is very valuable. It gives the customer a chance to get to know us, and at the end of this phase they have a solid requirements document under their belt and a professional estimate of the work required to complete the project. The customer can either walk away at this point, or move on to a Phase II implementation.</p>
<p>Unfortunately, in the FP approach, customers generally feel that Phase I design and planning work should be a Cardinal Peak business development expense. The problem is that the amount of up front work is almost always too large for this to be practical from our perspective. I suppose Cardinal Peak could try to treat such up front work as a business overhead, but this approach increases our cost on all projects we undertake.</p>
<p>Anyway, one way or another, let’s assume that a project eventually moves into Phase II: active implementation. Here we encounter the next big problem with FP projects. It is almost guaranteed that during the course of implementation, new product possibilities will emerge that are attractive to the customer. Put another way, the business reality in which the customer lives will change, and this will lead to a desire on the part of the customer to change the project in some manner. Requests along the following lines are common: “Can we add this feature, modify that one, and drop this other one?” Or, “Can we change the architecture so that we can easily add a wireless option in the future?” In a FP engagement, most requests like this require engineering analysis to determine their impact on the total number of project hours, and then a signed change order to officially adjust the fixed price of the contract. And usually this implies substantial project management formalism that adds cost to the project and, more importantly, interferes with our ability to make progress quickly.</p>
<p>Later still the project will move into the release stage, typically a beta test. Although Cardinal Peak’s team is great, I’d be lying if I told you that the systems we develop are always bug free from day one! The problem from our perspective is that one man’s easily avoided trivial annoyance is another man’s bug. In an FP model, the customer tends to think that until the last possible annoyance is worked out of the system, the project isn’t over—which is in fact rational behavior on their part for a FP deal. Why would they think otherwise? The customer has no skin in the game with respect to trading off the cost of fixing an annoyance versus its actual impact on sales or customer satisfaction.</p>
<p>The process I’ve just described for a FP project should sound familiar, because it is basically a 1980s-style “<a href="http://en.wikipedia.org/wiki/Waterfall_model">waterfall</a>” project management model that has been pretty thoroughly debunked. It’s slow and expensive. Unfortunately the business realities of the FP deal push us into it, to the detriment of both Cardinal Peak <em>and</em> our customer.</p>
<p>Indeed, for almost all our customers, speed is <em>everything</em>. They need to get started NOW! And the cleanest fastest deal to negotiate is always a straight T&amp;M deal. We can start work immediately once we agree on a rate chart and a couple of other standard contract terms. Anything else takes substantially more time to negotiate in order to address both Cardinal Peak’s and the customer’s legitimate concerns. Even writing the contract can be time consuming.</p>
<p>Sometimes customers think that they’ll save money with a FP deal, but in practice this couldn’t be further from the truth. First, for the reasons outlined above, there is a lot more management overhead in a FP project, and we’ll include the cost of that overhead in our pricing. Second, that risk premium we charge is pretty large. So really the only time a FP project ends up being cheaper for the customer is if they get “lucky” and we misprice the implementation risk markup. And we’ve gotten quite good at not making that mistake…a few scars quickly teach that lesson.</p>
<p>We do understand and respect our new customers’ fears that, in a T&amp;M engagement, we will have no incentive to be efficient in our work. However, we believe this concern is best addressed in other ways. In particular, we assign an experienced project manager to every project. Among other things, his or her job is to keep close tabs on the team’s implementation progress and communicate it to our customer on a frequent basis. A tight feedback loop between our customer and us allows quick course corrections to be made when necessary, and leads to increased and mutual trust over time.</p>
<p>Furthermore, unless you’ve personally experienced it, you’ll be amazed at the strength of the inherent incentives for Cardinal Peak to feel invested in our customer’s success in a T&amp;M deal. When our customers are successful in their business and happy with the work we do, they come back to us for more and more. We want to be an important part of our customers’ engineering effort for the next decade! It is always easier for us to sell new projects to an existing customer than to find a new one.</p>
<p><em>Related: Previously, I wrote about <a href="http://www.cardinalpeak.com/blog/?p=667">irrational optimism</a> while planning a project, and my partner Howdy has written about the <a href="http://www.cardinalpeak.com/blog/?p=362">cost of an engineer-hour</a>.</em></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cardinalpeak.com/blog/?feed=rss2&#038;p=941</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>ElcomSoft&#8217;s Hack of Image Authentication</title>
		<link>http://www.cardinalpeak.com/blog/?p=908</link>
		<comments>http://www.cardinalpeak.com/blog/?p=908#comments</comments>
		<pubDate>Fri, 29 Apr 2011 21:15:50 +0000</pubDate>
		<dc:creator>Howdy Pierce Managing Partner</dc:creator>
				<category><![CDATA[Howdy]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[checksum protection]]></category>
		<category><![CDATA[SHA-1]]></category>

		<guid isPermaLink="false">http://www.cardinalpeak.com/blog/?p=908</guid>
		<description><![CDATA[There was some interesting news yesterday about the cracking of an image authentication mechanism built into Nikon cameras. The hack was announced in what seemed to me like a rather adolescent post on the ElcomSoft company blog: ElcomSoft Co. Ltd. [...]]]></description>
			<content:encoded><![CDATA[<p>There was some <a href="http://hardware.slashdot.org/story/11/04/28/2015211/Nikons-Image-Authentication-Insecure">interesting news</a> yesterday about the cracking of an image authentication mechanism built into Nikon cameras. The hack was announced in what seemed to me like a rather adolescent post on the <a href="http://elcomsoft.com/">ElcomSoft</a> <a href="http://blog.crackpassword.com/2011/04/nikon-image-authentication-system-compromised/">company blog</a>:</p>
<blockquote><p>ElcomSoft Co. Ltd. researched Nikon’s Image Authentication System, a secure suite validating if an image has been altered since capture, and discovered a major flaw. The flaw allows anyone producing forged pictures that will successfully pass validation with Nikon’s Image Authentication Software. The weakness lies in the manner the secure image signing key is being handled in Nikon digital cameras….</p>
<p>In order to “fix” the problem, Nikon would have to re-design the way the signing key is being stored in the camera. They would have to hire someone who knows security well, which is what they should’ve done from the very beginning. They would have to publicly admit the existence of the problem in their old cameras. They would have to revoke the old signing key via an update to Nikon Image Authentication Software. They would have to generate a new signing key.</p></blockquote>
<p>ElcomSoft has previously hacked the same feature in Canon cameras, so at a minimum we can conclude that Nikon is not alone in hiring engineers who don’t know about security. But in fact I don’t think that’s the problem; instead I think this entire feature is doomed to failure regardless of the skill of the engineers implementing it. Fundamentally, the feature is oversold and doesn’t actually promise what people want it to promise.<span id="more-908"></span></p>
<p>In addition to being found in high-end DSLRs, checksum protection is a common feature in digital video recorders (DVRs), especially those intended for law enforcement use. (We’ve implemented it ourselves in <a href="http://casecracker.cardinalpeak.com">CaseCracker</a>.) Regardless of the product, the goal is always to prove that a particular file—whether a still image or a video clip—wasn’t altered after it was recorded. This is especially desirable if the file is going to be used for legal or evidentiary purposes.</p>
<p>Let’s start by looking at the problem Nikon is trying to solve. Assume the customer is a police department, and assume the Nikon camera is being used to record crime scene photos. The police want to defend against an accusation that, after a particular picture was taken, somebody malicious edited it in order to frame an innocent person.</p>
<p>So Nikon is trying to devise an algorithm that an attacker cannot apply to his edited photo, and then overwrite the original checksum (which is stored in the JPEG EXIF data) with the one from the modified version. Meanwhile the algorithm needs to be self-contained—it must run entirely inside the camera. Which is in the attacker’s possession.</p>
<p>Here’s a simplified diagram of how this is usually implemented:</p>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2011/04/checksum-protection.png"><img class="aligncenter size-full wp-image-909" title="checksum protection" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2011/04/checksum-protection.png" alt="" width="500" height="148" /></a></p>
<p>The way the system is supposed to work is that the camera takes the digital source file, runs some “Magic Formula” over it, includes the time and date that it is running the magic formula, and then stores the resulting checksum. In Nikon’s case, ElcomSoft says the “Magic Formula” is two SHA-1 hashes run over, respectively, the image data and the metadata, and then encrypted with a private key hardcoded into the camera, using a 1024-bit RSA encryption algorithm.</p>
<p>(One obvious attack vector with all these systems is the security of the clock reference, since on most cameras the attacker can simply change the time and date on the camera and shoot any doctored photo he desires. Setting this aside, however….)</p>
<p>To make the system work, Nikon had to rely on keeping the “Magic Formula” and the private key secret. The problem is that, because the private key must be hardcoded into the system, it’s hard to keep it very secret, because it is discoverable by a clever attacker using a debugger.</p>
<p>So that’s why, in my opinion, all checksum protection features boil down to a case of <a href="http://en.wikipedia.org/wiki/Security_through_obscurity">security through obscurity</a>. Sure, the checksum protection moves the bar by making it slightly harder to maliciously edit photos or video. But it doesn’t prevent it.</p>
<p>The only true way to prove that video hasn’t been edited since the time of recording is to subject the recording to the same rigorous <a href="http://en.wikipedia.org/wiki/Chain_of_custody">chain-of-custody</a> procedures used for other evidence.</p>
<p>I will be the first to admit that I’m not “someone who knows security well”; ElcomSoft can sneer away at me and I will nevertheless manage to get to sleep tonight. But I can only come up with two ideas for things a vendor implementing this feature might do to make it more secure, other than what Nikon did:</p>
<ul>
<li>One, the vendor could move the checksum algorithm down into an ASIC on the camera instead of running it in software. This approach implies higher development costs and much less ability to modify the feature in the future, but it would make it harder (though not impossible!) to hack.</li>
<li>Two, if we could assume the camera is always connected to the Internet, the camera could send the SHA-1 checksum of the picture to a secure, trusted server; the server would then apply a timestamp and store the checksum. This approach raises some big privacy issues, and once you have a law enforcement camera on the Internet you have some other attack vectors to worry about. We’re probably a few years away from having ubiquitous enough network connectivity for this, but it’s really the only bulletproof way I can see to solve this problem.</li>
</ul>
<p>If you can see other solutions that I’m missing, please leave a comment!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cardinalpeak.com/blog/?feed=rss2&#038;p=908</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

