<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Cardinal Peak&#039;s Blog</title>
	<atom:link href="http://www.cardinalpeak.com/blog/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://www.cardinalpeak.com/blog</link>
	<description>Engineering for embedded products</description>
	<lastBuildDate>Thu, 02 Sep 2010 21:15:38 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Did the Manhattan Transfer use Auto-Tune?</title>
		<link>http://www.cardinalpeak.com/blog/?p=639</link>
		<comments>http://www.cardinalpeak.com/blog/?p=639#comments</comments>
		<pubDate>Thu, 02 Sep 2010 21:15:38 +0000</pubDate>
		<dc:creator>Howdy Pierce Managing Partner</dc:creator>
				<category><![CDATA[Audio]]></category>
		<category><![CDATA[Howdy]]></category>
		<category><![CDATA[audio processing]]></category>
		<category><![CDATA[auto-tune]]></category>

		<guid isPermaLink="false">http://www.cardinalpeak.com/blog/?p=639</guid>
		<description><![CDATA[I recently came across an allegation on Amazon.com that got me thinking. The review in question is by Andrew Grobengieser, and it is critical of the Manhattan Transfer’s latest album, The Chick Corea Songbook. Grobengieser alleges:
As a lifetime fan, I was unbelievably excited to hear of the release of a Chick Corea songbook. And then [...]]]></description>
			<content:encoded><![CDATA[<p>I recently came across an allegation on Amazon.com that got me thinking. The review in question is by Andrew Grobengieser, and it is critical of the Manhattan Transfer’s latest album, <a href="http://www.amazon.com/gp/product/B002IVLWG0/ref=s9_simh_gw_p15_i1?pf_rd_m=ATVPDKIKX0DER&amp;pf_rd_s=center-2&amp;pf_rd_r=09YW3Y03K4MBTM0A1HZ8&amp;pf_rd_t=101&amp;pf_rd_p=470938631&amp;pf_rd_i=507846">The Chick Corea Songbook</a>. Grobengieser <a href="http://www.amazon.com/review/R15OHRG7XHGJK7/ref=cm_cr_dp_cmt?ie=UTF8&amp;ASIN=B002IVLWG0&amp;nodeID=5174#wasThisHelpful">alleges</a>:</p>
<blockquote><p>As a lifetime fan, I was unbelievably excited to hear of the release of a Chick Corea songbook. And then I listened. It only took me a moment before a sinking feeling set in, as I realized that ManTran, one of the best-blending and most in-tune vocal ensembles in recorded-music history, has succumbed to the scourge of modern recording known as &#8220;Auto-Tune&#8221;. Yes, Manhattan Transfer fans, welcome to the world GLEE and Cher. It&#8217;s all over the place on group harmonies, and even rears its ugly head on a few of the solo vocals.</p>
<p>I mean, really. Why ON EARTH would this production choice be made? It takes what are otherwise very hip and adventuresome arrangements, and makes them roboticized, metallic, cold, and inhuman.</p></blockquote>
<p>It seems to me that it’s one thing to allege that a weekly TV musical is using Auto-Tune, but quite another to level the accusation at four <a href="http://londonjazz.blogspot.com/2010/05/review-manhattan-transfer.html">vocal jazz icons</a>.</p>
<p>I am by no means anything approaching an expert on this topic—just an interested fan. But the engineer in me was curious: Is it actually possible to detect the use of Auto-Tune?</p>
<p>First, I did a little background research. Auto-Tune is a tool that can be used to correct the pitch of recorded singing. Evidently it can be used in a subtle or blatant manner; Andy Hildebrand, the inventor of Auto-Tune, <a href="http://www.pbs.org/wgbh/nova/tech/hildebrand-auto-tune.html">says</a>:</p>
<blockquote><p>At one extreme, Auto-Tune can be used very gently to nudge a note more accurately into tune. In these applications, it is impossible for skilled producers, musicians, or algorithms to determine that Auto-Tune has been used. On the other hand, when used as an effect, such as in hip-hop, Auto-Tune usage is obvious to all. Everything in between is subject to an individual&#8217;s unique listening skills.</p></blockquote>
<p>This raises the question: Assuming that the Manhattan Transfer is attempting to use Auto-Tune in a subtle manner, how can Grobengieser detect its use? (In a follow-up comment to his review, he claims he is “a trained musician with years of experience dealing with vocal group intonation.”) Frankly, I didn’t believe he could detect it, so I decided to try to learn more.</p>
<p>According to <a href="http://quezi.com/5608">one site</a>:</p>
<blockquote><p>The most important parameter is the retune speed – the time it takes Auto-Tune to glide the note to its perfect pitch. For maximum realism, the retune speed must be set to a value close to the retune speed of the singer’s natural voice. . . . But Auto-Tune’s retune speed can be set to any value right down to zero, which means that notes instantly jump to the exact pitch. This effect is decidedly un-natural. If the singer glides smoothly from one note to another, Auto-Tune will suddenly jump from one note to the next when the mid-point between them is reached.</p></blockquote>
<p>I believe you can hear the un-natural Auto-Tune effect with a zero retune speed in <a href="http://www.youtube.com/watch?v=LbXiECmCZ94">this Cher song</a>, which according to various web sources also seems to be the first use of Auto-Tune as a sound effect (in 1998).</p>
<p>But let’s assume that the Manhattan Transfer is trying to hide the use of Auto-Tune, in which case their recording engineer would presumably use a retune speed that approximates a “natural” value.</p>
<p>Hildebrand’s <a href="http://www.google.com/patents/about?id=wcAWAAAAEBAJ&amp;dq=5973252">original patent</a> for Auto-Tune, also from 1998, has a relatively clear explanation of his invention and how it works. (In my experience, the technical clarity is unusual for a patent!) If you’re interested, I recommend the discussion from the middle of column 3 to the middle of column 6.</p>
<p>I wondered if possibly we could detect Auto-Tune because the notes would be too perfect. The song “500 Miles High” begins with an <em>a capella</em> intro in which it is easy to isolate the first note sung by Janis Siegel. I brought this song into <a href="http://audacity.sourceforge.net/">Audacity</a> and zoomed in to the first one second of the left channel, and then selected Analyze &gt; Plot Spectrum.</p>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/09/500-miles.png"><img class="aligncenter size-full wp-image-642" title="Siegel on &quot;500 Miles High&quot;" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/09/500-miles.png" alt="" width="500" height="229" /></a></p>
<p>This is a reasonably crude method, but if you can use it at a point in the music where you can isolate a single voice, it can show some interesting information. Above, if I’m remembering my music theory class correctly, you can see that Siegel is singing an “A”. You can see the fundamental in the first peak, which is highlighted with the thin vertical line in the screenshot above. To the right are all the harmonics.</p>
<p>As you can see, the plot shows that Siegel didn’t hit a perfect “A”—that would have been at <a href="http://www.phy.mtu.edu/~suits/notefreqs.html">220 Hz</a>. Instead, she’s at 216 Hz, which would be noticeably flat. I am definitely no expert, but I’m thinking that if you’re going to use Auto-Tune, why not get the note correct?</p>
<p>There is a similar intro to the Manhattan Transfer song “Gentleman With a Family” from 1991’s <a href="http://www.amazon.com/Offbeat-Avenues-Manhattan-Transfer/dp/B0000027HM/ref=sr_1_2?ie=UTF8&amp;s=music&amp;qid=1283336849&amp;sr=8-2">The Offbeat of Avenues</a>. I picked this song because it starts out similarly to “500 Miles,” and also because 1991 puts it well before Auto-Tune would have been in use. In this case, the intro isn’t <em>a capella</em>, so there is some instrumentation playing and thus it’s a little harder to isolate just the singer’s voice. However, selecting the left channel from 20.5 to 21.5 seconds in this song yields the following frequency analysis:</p>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/09/gentleman.png"><img class="aligncenter size-full wp-image-643" title="Siegel on &quot;Gentleman with a Family&quot;" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/09/gentleman.png" alt="" width="500" height="229" /></a></p>
<p>I am pretty certain that the highlight is again on the fundamental of Siegel’s voice—she is hitting a C at 262 Hz. (I believe the peaks to the left are lower tones from the instruments.) Here, before the days of Auto-Tune, she’s dead-on. Of course, she was also 19 years younger!</p>
<p>There are many more sophisticated methods of analysis that suggest themselves. It would be interesting to plot the frequencies over time—perhaps a voice held on a long note without any variation would be a likely indicator of the use of Auto-Tune. If we could isolate each singer onto a separate voice track, it would even be possible to run the pitch detection portion of the Auto-Tune algorithm; if this indicated that tuning was necessary, it would probably be a good clue that Auto-Tune wasn’t used in the studio. My colleague Kevin Gross suggested looking at the vibrato and timbre, because vibrato is removed altogether by Auto-Tune (and then artificial vibrato is usually added back in, according to the patent), and timbre would be changed when samples are added or dropped as part of the tuning process.</p>
<p>Obviously, I can’t really conclude anything from what I’ve done so far. In his Amazon review, Grobengieser doesn’t specify <em>where</em> he thinks he hears Auto-Tune on <em>The Chick Corea Songbook</em>; possibly he’s not talking about the intro to “500 Miles”. Or possibly my analysis tools are not sophisticated enough to detect the use of Auto-Tune. Or possibly if you are an audio engineer trying to sneak a little Auto-Tune into a jazz recording, you are smart enough not to correct to the exact pitch. I have no idea. To my ears the Transfer occasionally sounds just a little off-key on this album, which I ascribe to their age (but it also argues against the use of Auto-Tune). Again, though, I&#8217;m no expert.</p>
<p>I’d welcome your thoughts in the comments!</p>
<p><em>Howdy Pierce is a managing partner of Cardinal Peak with a technical background in multimedia systems, software engineering and operating systems.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.cardinalpeak.com/blog/?feed=rss2&amp;p=639</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>High-School Prior Art</title>
		<link>http://www.cardinalpeak.com/blog/?p=633</link>
		<comments>http://www.cardinalpeak.com/blog/?p=633#comments</comments>
		<pubDate>Tue, 17 Aug 2010 17:23:46 +0000</pubDate>
		<dc:creator>Ben Mesander Partner</dc:creator>
				<category><![CDATA[Ben]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[Patents]]></category>

		<guid isPermaLink="false">http://www.cardinalpeak.com/blog/?p=633</guid>
		<description><![CDATA[Based on a Slashdot link, I read a blog post today about the Oracle/Google lawsuit over Java virtual machine patents.
Among the list of software patents that Oracle has decided to sue Google with are two that I’m fairly certain that I wrote prior art for. This was in 1983 or 1984, while I was a [...]]]></description>
			<content:encoded><![CDATA[<p>Based on a Slashdot link, I read a <a href="http://blog.headius.com/2010/08/my-thoughts-on-oracle-v-google.html">blog post</a> today about the Oracle/Google lawsuit over Java virtual machine patents.</p>
<p>Among the list of software patents that Oracle has decided to sue Google with are two that I’m fairly certain that I wrote prior art for. This was in 1983 or 1984, while I was a high school student. This work was published in a book by the University of Oklahoma press, and I believe I still have a copy somewhere.</p>
<p>It seems that the bar for patents is not very high if patents are granted for something a high school student could come up with a decade earlier!</p>
<p>In ’83 or ’84, I wrote a hybrid interpreter/compiler for a <a href="http://en.wikipedia.org/wiki/Forth_%28programming_language%29">Forth</a>-like language.</p>
<p>I had read the book <em><a href="http://www.amazon.com/Threaded-Interpretive-Languages-Design-Implementation/dp/007038360X/ref=sr_1_1?ie=UTF8&amp;s=books&amp;qid=1282062007&amp;sr=8-1">Threaded Interpretive Languages: Their Design and Implementation</a></em> and decided to write such a language for my <a href="http://en.wikipedia.org/wiki/TRS-80">TRS-80 model 1</a>. I gave it the snappy name, TIL16U, which stood for Threaded Interpretive Language, 16 bit unsigned ints. The only numeric datatype was the 16-bit unsigned integer, which also nicely represents a pointer datatype on the <a href="http://en.wikipedia.org/wiki/Zilog_Z80">Z-80 processor</a>.</p>
<p>I took a slightly different approach to writing the language than the book described. Initially I wrote a simple single-pass compiler from the source language to a simple bytecode in TRS-80 <a href="http://www.trs-80.com/trs80-info-level2.htm">Level II BASIC</a>.</p>
<p>Once I could generate bytecode from ASCII source, I wrote an interpreter for it in Z-80 assembly. The interpreter for such a simple stack based language is very small, and it was dwarfed by the inclusion of a text editor (incorporated from an article in 80 Microcomputing with a source listing) and cassette I/O routines to load/save my programs off to tape. I used the initial compiler written in BASIC to compile the runtime library for my interpreter.</p>
<p>My library data structure consisted of a routine name stored as an ASCII string, a length, a flag to indicate whether a given routine was implemented in Z-80 machine code or in bytecode, and then the routine body. While interpreting, the TIL16U interpreter would look at the bytecode for a call routine, and following this would be a routine name. The interpreter would look up the routine name in the library, and either jump to it directly (if it was in machine code), or continue interpreting at the start of the routine (if it was in bytecode).</p>
<p>At the time, I was very interested in writing videogames—my parents wouldn’t allow me to have an Atari 2600—and the 1.77 Mhz processor on the TRS-80 was not particularly powerful. So the lookup step bothered me performance-wise. I knew I could implement a symbol table and a second pass in the compiler to patch in the addresses of routines, but this made the language less dynamic. I wanted to be able to type in code interactively and have it work, and it was also a hassle to store addresses in the code because whenever I built a new version of the interpreter or the runtime library they might have to change.</p>
<p>So I hit upon the expedient of making the interpreter selectively compile bits of the running program to machine code as it went and overwriting the bytecode with machine code that could be directly jumped to, and pushing the address of the interpreter on the stack so that when a Z-80 RET instruction executed, the interpreter continued executing. I thought it was cool that the more you ran a program, the faster it would go.</p>
<p>In particular, the bytecode to invoke a library routine could be replaced by a three-byte Z-80 CALL instruction (a one byte opcode plus two bytes of address).</p>
<p>So when the interpreter found such a bytecode sequence, rather than execute it, it would look up the symbol address, overwrite it with the machine instruction, PUSH the return address of the interpreter on the stack and then jump to it. In the future, the interpreter would look at the opcode byte and know just to jump directly to the code sequence. (As an aside, it is interesting how much of the Z-80 instruction set is still burned into my brain. I can still remember an unconditional jump, or JP, is 0xC3, and the RET instruction is 0xC9, thanks to writing this code.)</p>
<p>I believe the symbolic routine name to numeric address is the mechanism described in <a href="http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&amp;Sect2=HITOFF&amp;p=1&amp;u=/netahtml/PTO/search-bool.html&amp;r=1&amp;f=G&amp;l=50&amp;co1=AND&amp;d=PTXT&amp;s1=RE38,104.PN.&amp;OS=PN/RE38,104&amp;RS=PN/RE38,104">one of the patents</a> in the Oracle lawsuit, granted in 1992, and last updated on April 29, 2003.</p>
<p>Additionally, <a href="http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&amp;Sect2=HITOFF&amp;p=1&amp;u=/netahtml/PTO/search-bool.html&amp;r=1&amp;f=G&amp;l=50&amp;co1=AND&amp;d=PTXT&amp;s1=6,910,205.PN.&amp;OS=PN/6,910,205&amp;RS=PN/6,910,205">this patent</a>, also referenced in the lawsuit, covers overwriting virtual machine instructions with native code and executing those instead of the bytecode. This patent was originally filed in 1997, and was last updated in 2005.</p>
<p>I’m not claiming that I have perfect prior art for these two patents—I haven’t even studied them very closely—but I believe the techniques I used are at least highly similar to what is patented. Only a patent court can decide if something infringes, anyway.</p>
<p>I’d like to claim this was entirely because of my brilliance, but I think it is more likely that the US Patent Office has been granting software patents that are so obvious that a reasonably bright teenager with a computer can come up with them.</p>
<p>To finish the story, the most successful application of TIL16U did actually turn out to be video games. My parents bought me a <a href="http://www.trs-80.org/chromatrs">ChromaTRS</a> color video board for my TRS-80 (&#8220;with 15 vivid colors!&#8221;), and I wrote several games for it, one of which I remember was a parachute jump game, where you jumped out of a plane and tried to land on a target. The plane speed and wind speed would vary, so you had to judge how the windsock was looking to get the high score.</p>
<p><em>Ben Mesander has more than 18 years of experience—not counting high school!—leading software development teams and implementing software. His strengths include Linux, C, C++, numerical methods, control systems and <a href="../../expertise/signalprocessing.php">digital signal processing</a>. His experience includes <a href="../../expertise/embeddedsoftware.php">embedded software</a>, scientific software and enterprise software development environments.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.cardinalpeak.com/blog/?feed=rss2&amp;p=633</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Real-Time Ethernet</title>
		<link>http://www.cardinalpeak.com/blog/?p=625</link>
		<comments>http://www.cardinalpeak.com/blog/?p=625#comments</comments>
		<pubDate>Mon, 16 Aug 2010 14:17:46 +0000</pubDate>
		<dc:creator>Ben Mesander Partner</dc:creator>
				<category><![CDATA[Ben]]></category>
		<category><![CDATA[Networking]]></category>
		<category><![CDATA[Software Development]]></category>

		<guid isPermaLink="false">http://www.cardinalpeak.com/blog/?p=625</guid>
		<description><![CDATA[A recent project for a customer was to implement a transport for a real-time signal processing application over Gigabit Ethernet (GbE). The project was especially interesting, because our customer’s requirement was for extremely low latency: The transport needed to be able to send replies to incoming packets with a custom EtherType within 74.4 μsec. This [...]]]></description>
			<content:encoded><![CDATA[<p>A recent project for a customer was to implement a transport for a real-time signal processing application over Gigabit Ethernet (GbE). The project was especially interesting, because our customer’s requirement was for extremely low latency: The transport needed to be able to send replies to incoming packets with a custom <a href="http://en.wikipedia.org/wiki/EtherType">EtherType</a> within 74.4 μsec. This exceeded the capabilities of existing Ethernet protocols.</p>
<p>There are several approaches possible for real-time networking under Linux, including <a href="http://www.rtai.org">RTAI</a> with the <a href="http://www.rtnet.org">RTnet</a> hard real-time network stack.</p>
<p>However, the customer was already using <a href="http://www.redhat.com">Red Hat Linux</a> servers to perform the signal processing, so we decided to try the less intrusive Red Hat <a href="http://www.redhat.com/mrg">MRG</a> kernel to see if it could be modified to incorporate the desired networking protocol. MRG incorporates many kernel patches to improve real-time <a href="http://www.redhat.com/mrg/realtime">performance</a> and lower latency.</p>
<p>Since the timing requirements were so tight, I decided that only an in-kernel implementation was likely to succeed. If the code had to switch between kernel mode and user mode, this would require additional overhead, interactions with the Linux scheduler, and moving the data between the kernel and user address spaces.</p>
<p>I started development by building two server-class Linux machines running Red Hat Linux with the MRG kernel, connected back-to-back with a GbE cable. Then I wrote a user-mode test program to send an Ethernet packet of the appropriate type via a raw socket. I now had a way to send a packet from one machine to another. I set up Wireshark to monitor the traffic between the two machines</p>
<p>Initially, I attempted to modify the device-independent portion of the Ethernet stack within the Linux kernel to detect the incoming packet and respond to it. Unfortunately, I had to discard this approach when I found it could not meet the customer’s latency requirement.</p>
<p>So I experimented with modifying the Ethernet drivers for various GbE PCIe Ethernet cards in the MRG kernel. The basic approach I took was to invoke the outgoing packet interrupt service routine from the incoming packet interrupt service routine. The driver has to be in just the right state for this to work correctly, so it required some study of the driver source code. Fortunately most Linux Ethernet drivers share some common framework, so it did not require a complete re-engineering effort to try different cards, but there are definitely some differences.</p>
<p>I initially started with a <a href="http://www.realtek.com/">Realtek</a> card and driver, and had some success. I installed the kernel driver on both machines, so that once I sent a single packet from one machine to another via my user-space program, the two machines would continuously exchange Ethernet frames with each other at the maximum speed achievable.</p>
<p>This made measurement challenging because my modifications to the driver were at a low enough level that the outgoing packets did not go through the device-independent Ethernet stack in Linux, and thus were not captured by Wireshark; and additionally, Wireshark was too slow to keep up and often dropped packets. I was, however, able to use tcpdump to capture packet headers to a file and then look at them later with Wireshark to make measurements. The latency could be estimated by looking at the inter-packet timing from packets sent by the other machine and dividing by two. Using tcpdump mostly got rid of most of the packet losses, but it would still sometimes drop some. I resorted to putting a serial number in each packet which I could then examine to determine if I had lost a packet or not.</p>
<p>I also tried inserting a GbE switch between the two machines and monitoring packets from a third machine. However, I found the switch introduced 20 μsec of additional latency on average, and greatly increased the deviation of the measurements. And still, sometimes packets were lost. So I discarded this approach and used tcpdump running on one of the machines being tested.</p>
<p>I was unable to reliably meet the 74.4 μsec spec under load with the Realtek card, so the next card I tried was an Intel GbE card. This card used the Intel e1000e driver, which is supported directly by Intel rather than the reverse-engineered driver the Realtek card used. Unfortunately I was unable to find a card supported by the version of the e1000e driver contained in the MRG kernel, so I downloaded the <a href="http://sourceforge.net/projects/e1000">latest</a> Intel driver. This driver was significantly more complex than the MRG version of the driver, and I made some measurements.</p>
<p>Finally, I modified the Broadcom <a href="http://www.broadcom.com/support/Ethernet_nic/faq_drivers.php">bnx2</a> driver in the MRG kernel. This driver is directly supported by Broadcom. I was able to use the MRG kernel driver with my card, and so I made appropriate modifications.</p>
<p>I found this card, like the Intel card, was also able to keep up with the desired data rate. The Broadcom card ended up having slightly lower latency measurements than the Intel card. I suspect this is due to the MRG kernel driver for bnx2 having less locking overhead and shorter code paths than the stock Intel e1000e driver.</p>
<p>In the end, with the Broadcom bnx2 driver with my modifications, we achieved an average latency measurement of 58 μsec, which was comfortably under the 74.4 μsec requirement. Additionally, we tested continuously over a several day period, monitoring for missing packet serial numbers, and none were detected.</p>
<p>The customer&#8217;s initial protocol required sending several smaller packets in either direction, but ultimately an additional speedup could have been realized by using <a href="http://en.wikipedia.org/wiki/Jumbo_frame">jumbo frames</a>. RTAI/RTnet could also have been used, but I do not think it would have been significantly faster, although it may have reduced the variability of the latency.</p>
<p><em><em>Ben Mesander has more than 18 years of experience leading software development teams and implementing software. His strengths include Linux, C, C++, numerical methods, control systems and </em><a href="../../expertise/signalprocessing.php"><em>digital signal processing</em></a><em>. His experience includes </em><a href="../../expertise/embeddedsoftware.php"><em>embedded software</em></a><em>, scientific software and enterprise software development environments.</em></em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.cardinalpeak.com/blog/?feed=rss2&amp;p=625</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>If only we had better test content&#8230;</title>
		<link>http://www.cardinalpeak.com/blog/?p=621</link>
		<comments>http://www.cardinalpeak.com/blog/?p=621#comments</comments>
		<pubDate>Thu, 12 Aug 2010 20:19:10 +0000</pubDate>
		<dc:creator>Mike Perkins Managing Partner</dc:creator>
				<category><![CDATA[Perk]]></category>
		<category><![CDATA[Video]]></category>
		<category><![CDATA[video quality_measurement]]></category>

		<guid isPermaLink="false">http://www.cardinalpeak.com/blog/?p=621</guid>
		<description><![CDATA[I just saw this news about research that says you notice compression artifacts less if you like the content of a particular video clip:
Using four studies, Kortum, along with co-author Marc Sullivan of AT&#38;T Labs, showed 100 study participants 180 movie clips encoded at nine different levels, from 550 kilobits per second up to DVD [...]]]></description>
			<content:encoded><![CDATA[<p>I just saw <a href="http://scienceblog.com/37469/video-quality-less-important-when-youre-enjoying-what-youre-watching/">this news</a> about research that says you notice compression artifacts less if you like the content of a particular video clip:</p>
<blockquote><p>Using four studies, Kortum, along with co-author Marc Sullivan of AT&amp;T Labs, showed 100 study participants 180 movie clips encoded at nine different levels, from 550 kilobits per second up to DVD quality. Participants viewed the two-minute clips and then were asked about the video quality of the clips and desirability of the movie content.</p>
<p>Kortum found a strong correlation between the desirability of movie content and subjective ratings of video quality.</p></blockquote>
<p>(The original paper seems to be <a href="http://hfs.sagepub.com/content/early/2010/05/12/0018720810366020.abstract">here</a>, beyond a pay wall.)</p>
<p>Makes me wonder about the classic test footage with the calendar and the model train!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cardinalpeak.com/blog/?feed=rss2&amp;p=621</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Creating Single Frame Movies</title>
		<link>http://www.cardinalpeak.com/blog/?p=594</link>
		<comments>http://www.cardinalpeak.com/blog/?p=594#comments</comments>
		<pubDate>Fri, 09 Jul 2010 19:43:38 +0000</pubDate>
		<dc:creator>Ben Mesander Partner</dc:creator>
				<category><![CDATA[Ben]]></category>
		<category><![CDATA[Video]]></category>
		<category><![CDATA[ffmpeg]]></category>
		<category><![CDATA[flickr]]></category>
		<category><![CDATA[MP4]]></category>

		<guid isPermaLink="false">http://www.cardinalpeak.com/blog/?p=594</guid>
		<description><![CDATA[My camera (an Olympus SP-570UZ) allows me to optionally record a four-second audio clip with each photo I take. I haven’t used this feature much, because I typically upload my photos to Flickr, and there’s been no good way to associate the audio with the video. Ideally, I would like an audio player to appear [...]]]></description>
			<content:encoded><![CDATA[<p>My camera (an <a href="http://www.olympusamerica.com/cpg_section/cpg_archived_product_details.asp?fl=&amp;id=1367">Olympus SP-570UZ</a>) allows me to optionally record a four-second audio clip with each photo I take. I haven’t used this feature much, because I typically upload my photos to <a href="http://www.flickr.com/">Flickr</a>, and there’s been no good way to associate the audio with the video. Ideally, I would like an audio player to appear below the photo, but there aren’t really any public audio sharing websites with much longevity. And, in any case, Flickr won’t allow me to embed an audio player in my photo description.</p>
<p>Recently, it occurred to me that since Flickr allows short movies (up to 1:30 long), maybe I could create a single-frame movie with the still picture as the frame and the audio as the sound track. Then the Flickr movie player would serve as the control for the audio, and the audio and the video would stay associated with each other.</p>
<p>I decided to try to use <a href="http://www.ffmpeg.org/">ffmpeg</a> to create the movie, since it seems to be able to do almost anything with video and audio. The command line for ffmpeg is a bit obscure, so this blog post documents about two hours of my time spent getting it to work.</p>
<p>My camera produces 3648×2736 JPEG images, and the audio files are 8 kHz sample rate, mono, 8 bit unsigned PCM samples in WAV file format. I decided my goal would be to create a motion JPEG (MJPEG) encoded AVI file with maximum quality.</p>
<p>I started by searching the web to see if anyone had done this before. By studying those examples and experimenting, I came up with the following ffmpeg command line:</p>
<pre class="last">ffmpeg.exe -loop_input -shortest -f image2 -r 0.25 -i P910033.jpg -i P910033.wav -vcodec mjpeg -qscale 1 -t 4 foo.avi</pre>
<p>Most of my attempts caused ffmpeg to hang. But eventually, I got the error message below:</p>
<pre>Duration: 00:00:04.00, start: 0.000000, bitrate: N/A</pre>
<pre>Stream #0.0: Video: mjpeg, yuvj422p, 3648x2736, 0.25 tbr, 0.25 tbn, 0.25 tbc</pre>
<pre>[wav @ 01a80050]Estimating duration from bitrate, this may be inaccurate</pre>
<pre>Input #1, wav, from 'P6060033.wav':</pre>
<pre>Duration: 00:00:04.02, bitrate: 64 kb/s</pre>
<pre>Stream #1.0: Audio: pcm_u8, 8000 Hz, 1 channels, u8, 64 kb/s</pre>
<pre>[mp2 @ 01ac6310]Sampling rate 8000 is not allowed in mp2</pre>
<pre>Output #0, avi, to 'foo.avi':</pre>
<pre>Stream #0.0: Video: mjpeg, yuvj422p, 3648x2736, q=2-31, 200 kb/s, 90k tbn, 0</pre>
<pre>.25 tbc</pre>
<pre>Stream #0.1: Audio: mp2, 8000 Hz, 1 channels, s16, 64 kb/s</pre>
<pre>Stream mapping:</pre>
<pre>Stream #0.0 -&gt; #0.0</pre>
<pre>Stream #1.0 -&gt; #0.1</pre>
<pre class="last">Error while opening encoder for output stream #0.1 - maybe incorrect parameters such as bit_rate, rate, width or height</pre>
<p>At last I understood the problem: ffmpeg needs the audio sampled at some rate other than 8 kHz. So I decided to use <a href="http://audacity.sourceforge.net/">Audacity</a>, another open source application, to upsample the sound. However, now Audacity was unhappy with this audio format.</p>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/07/error-importing.jpg"><img class="aligncenter size-full wp-image-596" title="Error Importing" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/07/error-importing.jpg" alt="" width="318" height="132" /></a></p>
<p>So I used Project-&gt;Import Raw Data, and selected my WAV file. I set up the import with the following parameters:</p>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/07/import-raw-data.jpg"><img class="aligncenter size-full wp-image-597" title="Import Raw Data" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/07/import-raw-data.jpg" alt="" width="393" height="292" /></a></p>
<p>I knew this would work, because the WAV file format consists of a header, followed by PCM data, in this case 8 kHz unsigned samples. So the result in the audio editor would be an audio file with the WAV header as a noisy sound at the start, followed by the data I wanted. The selected (darker) portion of the WAV file below is the header. I used Edit-&gt;Cut to remove it.</p>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/07/import.jpg"><img class="aligncenter size-full wp-image-598" title="Import" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/07/import.jpg" alt="" width="500" height="197" /></a></p>
<p>Finally, I tried to save the audio at a different sample rate. The audio file has a pulldown menu that lets you change the sample rate, but it doesn’t do what I wanted—what it does is play the audio file back at a different rate with aliasing.</p>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/07/no-no-no.jpg"><img class="aligncenter size-full wp-image-599" title="no-no-no" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/07/no-no-no.jpg" alt="" width="274" height="598" /></a></p>
<p>Instead, after consulting the Audacity documentation, I discovered you use the menu in at the lower left corner of the main Audacity window to set the sample rate.</p>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/07/project-rate.jpg"><img class="aligncenter size-full wp-image-600" title="project-rate" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/07/project-rate.jpg" alt="" width="168" height="208" /></a></p>
<p>Change this to 48000, and choose File-&gt;Export as WAV to save at the new sample rate. I re-ran ffmpeg, and the resulting AVI file would play in <a href="http://www.apple.com/quicktime/download/">QuickTime</a> and <a href="http://www.videolan.org/vlc/">VLC player</a> (although VLC crashes afterwards), but it would not work in <a href="http://www.microsoft.com/windows/windowsmedia/default.mspx">Windows Media Player</a> (audio played, no video), <a href="http://www.divx.com/">divx</a>, <a href="http://www.real.com/realplayer/search">realplayer</a>, or Flickr. So, I decided to try encoding to mp4 instead with the following command:</p>
<pre class="last">ffmpeg.exe -loop_input -shortest -f image2 -r 0.25 -i P910033.jpg -i P910033.wav bar.mp4</pre>
<p>The resulting <a href="http://cardinalpeak.com/blog/wp-content/uploads/2010/07/haena-beach.mp4">mp4 file</a> plays in all the media players (although, again, VLC crashes after playing it), and Flickr can read it successfully as well.  <a href="http://www.Flickr.com/photos/benmesander/4754133744/">Here</a> is what it looks like on Flickr:</p>
<p><object type="application/x-shockwave-flash" width="500" height="375" data="http://www.flickr.com/apps/video/stewart.swf?v=71377" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000"><param name="flashvars" value="intl_lang=en-us&#038;photo_secret=9aa302fdb7&#038;photo_id=4754133744&#038;flickr_show_info_box=true&#038;hd_default=false"></param><param name="movie" value="http://www.flickr.com/apps/video/stewart.swf?v=71377"></param><param name="bgcolor" value="#000000"></param><param name="allowFullScreen" value="true"></param><embed type="application/x-shockwave-flash" src="http://www.flickr.com/apps/video/stewart.swf?v=71377" bgcolor="#000000" allowfullscreen="true" flashvars="intl_lang=en-us&#038;photo_secret=9aa302fdb7&#038;photo_id=4754133744&#038;flickr_show_info_box=true&#038;hd_default=false" height="375" width="500"></embed></object></p>
<p>Using size as a proxy for quality, however, the encoded video is much smaller than the input JPEG file. Can someone suggest additional flags to ffmpeg to improve the encoding quality?</p>
<p><em>Ben Mesander has more than 18 years of experience leading software development teams and implementing software. His strengths include Linux, C, C++, numerical methods, control systems and </em><a href="../../expertise/signalprocessing.php"><em>digital signal processing</em></a><em>. His experience includes </em><a href="../../expertise/embeddedsoftware.php"><em>embedded software</em></a><em>, scientific software and enterprise software development environments.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.cardinalpeak.com/blog/?feed=rss2&amp;p=594</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Creating the Orton Effect in Gimp</title>
		<link>http://www.cardinalpeak.com/blog/?p=569</link>
		<comments>http://www.cardinalpeak.com/blog/?p=569#comments</comments>
		<pubDate>Thu, 20 May 2010 14:15:45 +0000</pubDate>
		<dc:creator>Ben Mesander Partner</dc:creator>
				<category><![CDATA[Ben]]></category>
		<category><![CDATA[Image Processing]]></category>
		<category><![CDATA[Gimp]]></category>
		<category><![CDATA[Orton effect]]></category>

		<guid isPermaLink="false">http://www.cardinalpeak.com/blog/?p=569</guid>
		<description><![CDATA[Recently I decided to learn how to write scripts in the Gimp image editing program to automate certain tasks. The first task I wanted to automate was the Orton effect. This is an effect invented by Michael Orton in the 1990’s, which consists of taking two copies of an image, one blurred, and one sharp, [...]]]></description>
			<content:encoded><![CDATA[<p>Recently I decided to learn how to write scripts in the <a href="http://www.gimp.org/">Gimp image editing program</a> to automate certain tasks. The first task I wanted to automate was the <a href="http://www.naturephotographers.net/articles0106/dw0106-1.html">Orton effect</a>. This is an effect invented by <a href="http://www.amazon.com/Michael-Orton/e/B001K8E2J6/ref=ntt_athr_dp_pel_1">Michael Orton</a> in the 1990’s, which consists of taking two copies of an image, one blurred, and one sharp, and mixing them to produce an image with a dreamy quality. It is especially well suited to landscape and flower photography.</p>
<p>The Orton effect was originally achieved by taking two photos: a well-focused image that was overexposed by two stops, and an out-of-focus image of the same scene that was overexposed by one stop. These were then printed as slides and sandwiched together to produce the final image.</p>
<p>With digital photography, one way to achieve this effect is to shoot a single raw image of a scene. The raw image can be developed to two JPEGs, one at +1 EV (Exposure Value), and the other at +2. My script blurs the +1 EV image with a two dimensional <a href="http://en.wikipedia.org/wiki/Gaussian_blur">Gaussian filter</a> with a standard deviation of 40 pixels, loads the second +2 EV image, sharpens it with an <a href="http://en.wikipedia.org/wiki/Unsharp_masking">unsharp mask</a>, and then overlays the two images. There are a variety of ways the images can be overlaid, but I prefer to multiply them, which enhances the color saturation in light areas. This is done by the Gimp by calculating (blur layer × sharp layer) / 255, which results in the image darkening, and an increase in color saturation.</p>
<div id="attachment_570" class="wp-caption aligncenter" style="width: 510px"><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/05/or-original-lantana.jpg"><img class="size-full wp-image-570" title="Original Lantana" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/05/or-original-lantana.jpg" alt="" width="500" height="375" /></a><p class="wp-caption-text">Original</p></div>
<div id="attachment_571" class="wp-caption aligncenter" style="width: 510px"><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/05/or-orton-lantana.jpg"><img class="size-full wp-image-571" title="Orton Lantana" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/05/or-orton-lantana.jpg" alt="" width="500" height="375" /></a><p class="wp-caption-text">Orton</p></div>
<p>My Gimp script to do this is available on the <a href="http://registry.gimp.org/node/24441">Gimp plugin registry</a>.</p>
<p>The soft focus of the colors and the sharpness of the image got me thinking: Is the Orton effect really equivalent to heavily subsampling the chroma channels of the image, and sharpening the luma channel? <a href="http://www.jpeg.org/">JPEG</a> and <a href="http://mpeg.chiariglione.org/">MPEG</a> compression both make use of the fact that the human eye is not as sensitive to chroma (<a href="http://en.wikipedia.org/wiki/Chrominance">color</a>) as it is to brightness (<a href="http://en.wikipedia.org/wiki/Luma_%28video%29">luma</a>). Typically, both still and video compression uses <a href="http://en.wikipedia.org/wiki/4:2:0">4:2:0 chroma subsampling</a> to reduce the number of bits used to represent color information in compressed images without a perceptible quality difference to the human visual system.</p>
<p>I decided to test my theory. It turns out the Gimp has the ability to decompose an image into its <a href="http://en.wikipedia.org/wiki/YCbCr">YCbCr</a> luma and chroma components used in the JPEG and MPEG compression process.</p>
<div id="attachment_579" class="wp-caption aligncenter" style="width: 510px"><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/05/or-squid.jpg"><img class="size-full wp-image-579" title="Squid Original" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/05/or-squid.jpg" alt="" width="500" height="375" /></a><p class="wp-caption-text">Original</p></div>
<table border="0" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td width="163" valign="top">
			<div id="attachment_585" class="wp-caption aligncenter" style="width: 173px"><img class="size-full wp-image-585" title="Squid Y" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/05/or-squid-Y.jpg" alt="" width="163" height="122" /><p class="wp-caption-text">Y</p></div>
		</td>
<td width="163" valign="top">
			<div id="attachment_583" class="wp-caption aligncenter" style="width: 173px"><img class="size-full wp-image-583" title="Squid Cb" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/05/or-squid-Cb.jpg" alt="" width="163" height="122" /><p class="wp-caption-text">Cb</p></div>
		</td>
<td width="163" valign="top">
			<div id="attachment_584" class="wp-caption aligncenter" style="width: 173px"><img class="size-full wp-image-584" title="Squid Cr" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/05/or-squid-Cr.jpg" alt="" width="163" height="122" /><p class="wp-caption-text">Cr</p></div>
		</td>
</tr>
</tbody>
</table>
<p>Once I had the image split into its separate components, the Gimp allowed me to apply my Gaussian filter to just the Cb and Cr components, and then regenerate a new color image from the components.</p>
<div id="attachment_574" class="wp-caption aligncenter" style="width: 510px"><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/05/or-squid-new-40.jpg"><img class="size-full wp-image-574 " title="Squid New 40" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/05/or-squid-new-40.jpg" alt="" width="500" height="375" /></a><p class="wp-caption-text">A Gaussian filter applied to just the chroma planes</p></div>
<div id="attachment_576" class="wp-caption aligncenter" style="width: 510px"><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/05/or-squid-orton.jpg"><img class="size-full wp-image-576" title="Squid Orton" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/05/or-squid-orton.jpg" alt="" width="500" height="375" /></a><p class="wp-caption-text">The same picture using the Orton effect plugin above</p></div>
<p>Unfortunately, as you can see, the image is nothing like the image that underwent Orton processing—my intuition was wrong. However, it is interesting to see just how much one can low-pass filter an image without a huge impact on the image. I increased the standard deviation of my Gaussian filter from 40 pixels to 100 with the following result—the image is still recognizable and doesn’t look too bad, although the color bleeds outside the lines. It’s interesting to note that the resulting JPEG is also smaller because the low-pass filtered chroma information is easier to compress.</p>
<div id="attachment_575" class="wp-caption aligncenter" style="width: 510px"><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/05/or-squid-new-100.jpg"><img class="size-full wp-image-575 " title="Squid New 100" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/05/or-squid-new-100.jpg" alt="" width="500" height="375" /></a><p class="wp-caption-text">A 100-pixel Gaussian filter applied to the chroma planes</p></div>
<p>Additionally, it is interesting to see what happens if we decompose our squid into <a href="http://en.wikipedia.org/wiki/RGB_color_space">RGB</a> components instead of YCbCr and filter two of them with a 100-point deviation Gaussian filter.</p>
<div id="attachment_577" class="wp-caption aligncenter" style="width: 510px"><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/05/or-squid-rgb-100.jpg"><img class="size-full wp-image-577" title="Squid RGB 100" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/05/or-squid-rgb-100.jpg" alt="" width="500" height="375" /></a><p class="wp-caption-text">A 100-pixel Gaussian filter applied to two of the three RGB planes</p></div>
<p>Yuck. We can clearly see the advantage of chroma subsampling here over RGB subsampling.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cardinalpeak.com/blog/?feed=rss2&amp;p=569</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The Basics of 3D Image Acquisition</title>
		<link>http://www.cardinalpeak.com/blog/?p=550</link>
		<comments>http://www.cardinalpeak.com/blog/?p=550#comments</comments>
		<pubDate>Mon, 26 Apr 2010 22:31:01 +0000</pubDate>
		<dc:creator>Mike Perkins Managing Partner</dc:creator>
				<category><![CDATA[Perk]]></category>
		<category><![CDATA[Theory]]></category>
		<category><![CDATA[Video]]></category>
		<category><![CDATA[3D]]></category>
		<category><![CDATA[Image Processing]]></category>

		<guid isPermaLink="false">http://www.cardinalpeak.com/blog/?p=550</guid>
		<description><![CDATA[One of our clients is heavily involved in 3D video and has been for several years. However, several are just now starting to think about it because of the uptick of interest in the consumer electronics world. Enough questions have been posed to us recently that it seemed worthwhile to me to pull together a [...]]]></description>
			<content:encoded><![CDATA[<p>One of our clients is heavily involved in 3D video and has been for several years. However, several are just now starting to think about it because of the <a href="http://www.cardinalpeak.com/blog/?p=544">uptick of interest in the consumer electronics world</a>. Enough questions have been posed to us recently that it seemed worthwhile to me to pull together a few basic facts regarding 3D stereopair imaging and stereo disparity.</p>
<p>First, we need a simple model of a lens. Consider the diagram below:</p>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/04/3d_basics_lens_overview.png"><img class="aligncenter size-full wp-image-558" title="3d_basics_lens_overview" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/04/3d_basics_lens_overview.png" alt="" width="500" height="246" /></a></p>
<p>In this picture, the long horizontal line that passes through the center of the lens is called the lens axis. The lens has the property that rays that pass through the center of the lens are undeviated. Therefore, the ray from the top of the tree, at a distance <em>l </em>to the left of the lens, passes straight through the center of the lens. (The tree has a height of <em>h</em>.) The lens also has the property that rays that arrive perpendicular to the lens are refracted to pass through the focal point of the lens. The focal point lies on the lens axis and is a distance <em>f</em> from the center of the lens. The intersection of these two rays shows where the image of the tree will be formed. You can see that the image of the tree is upside down, and has a new height <em>h’</em>. The image is formed a distance <em>d</em> to the right of the focal point.</p>
<p>By using similar triangles we see first that</p>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/04/3d_basics_eq_1.png"><img class="aligncenter size-full wp-image-551" title="3d_basics_eq_1" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/04/3d_basics_eq_1.png" alt="" width="36" height="31" /></a></p>
<p>Using a different pair of similar triangles we also see that</p>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/04/3d_basics_eq_2.png"><img class="aligncenter size-full wp-image-552" title="3d_basics_eq_2" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/04/3d_basics_eq_2.png" alt="" width="51" height="31" /></a></p>
<p>Solving the first equation above for <em>h’</em>, substituting the result into the second equation and simplifying, we derive the following relationship:</p>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/04/3d_basics_eq_3.png"><img class="aligncenter size-full wp-image-553" title="3d_basics_eq_3" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/04/3d_basics_eq_3.png" alt="" width="69" height="31" /></a></p>
<p>This is the fundamental equation of a simple lens. It shows that as the object gets further and further from the lens, i.e. as <em>l</em> increases, the distance of the image of the object from the focal plane decreases, i.e. <em>d </em>gets smaller. We can assume that the camera’s image sensor is located at a distance <em>f</em> from the lens, is perpendicular to the lens axis, and that all objects more than a certain distance away from the lens will be in focus. In other words, the image of all sufficiently distant objects will appear on the focal plane where the image sensor is located.</p>
<p>In the case of 3D video, two cameras are used to acquire a sequence of stereopair images, one from the left camera and one from the right. Different stereo geometries are possible, but the most common one is to place the two cameras horizontally apart from each other by a distance <em>i</em>, and to keep their focal planes coplanar. The diagram below illustrates this configuration:</p>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/04/3d_basics_stereo.png"><img class="aligncenter size-full wp-image-559" title="3d_basics_stereo" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/04/3d_basics_stereo.png" alt="" width="500" height="270" /></a></p>
<p>The horizontal line at the bottom is the focal plane; it is clear from the diagram that the focal planes are coplanar. The lenses are a distance <em>f</em> from the focal plane and are separated by a distance of <em>i</em> from each other. We assume that a small object (or a point on a larger object) is located a distance <em>l</em> from the lens plane and a distance <em>m</em> to the right of the axis of the right lens. We want to know where the image of that object appears in the left and the right camera. In particular, we want to know if we overlaid the left image on top of the right image, how far apart would the images appear? Mathematically, we want to know the disparity, which we define to be</p>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/04/3d_basics_eq_4.png"><img class="aligncenter size-full wp-image-554" title="3d_basics_eq_4" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/04/3d_basics_eq_4.png" alt="" width="53" height="12" /></a></p>
<p>where <em>s1</em> and <em>s2</em> are the distances from the image point to the intersection of the lens axis with the focal plane for the left and the right cameras respectively. Note that we are assuming that the object being imaged is far enough away that its image forms on the focal plane.</p>
<p>Using our favorite trick of similar triangles we have the following two equations:</p>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/04/3d_basics_eq_5.png"><img class="aligncenter size-full wp-image-555" title="3d_basics_eq_5" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/04/3d_basics_eq_5.png" alt="" width="70" height="28" /></a></p>
<p>and</p>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/04/3d_basics_eq_6.png"><img class="aligncenter size-full wp-image-556" title="3d_basics_eq_6" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/04/3d_basics_eq_6.png" alt="" width="60" height="28" /></a></p>
<p>Solving the first equation for <em>s1</em>, the second equation for <em>s2,</em> taking the difference and simplifying yields</p>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/04/3d_basics_eq_7.png"><img class="aligncenter size-full wp-image-557" title="3d_basics_eq_7" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/04/3d_basics_eq_7.png" alt="" width="33" height="28" /></a></p>
<p>Although this expression was derived for an object to the right of the axis of the right camera, it is easy to show in a similar manner that it is also true for an object between the axes of the two cameras as well as for an object to the left of the axis of the left camera.</p>
<p>So what does this equation tell us? First, it says that for this particular camera geometry, the disparity is only a function of the separation between the two cameras, <em>i</em>, and the distance of the object from the lens plane, <em>l</em>. Second the equation tells us that the disparity increases as we increase the separation between the cameras. Finally, it tells us that the disparity decreases as the object gets further away from the cameras, approaching zero for objects an infinite distance away. (You can see this when you watch 3D content without wearing the special 3D glasses: The “distant” objects can be seen by the naked eye, whereas the near objects appear blurry to the naked eye, because the value of ρ is greater.)</p>
<p>It should be clear from this equation that if a stereopair is available, and corresponding points can be found in the left and right pictures, that the disparity between those points can be measured, and the distance to the point can be computed.</p>
<p><em><em>Mike Perkins, Ph.D., is a managing partner of Cardinal Peak and an expert in algorithm development for <a href="../../expertise/digitalvideo.php">video</a> and <a href="../../expertise/signalprocessing.php">signal processing</a> applications.</em></em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.cardinalpeak.com/blog/?feed=rss2&amp;p=550</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Minimizing Development Costs on Low-to-Mid Volume Products</title>
		<link>http://www.cardinalpeak.com/blog/?p=536</link>
		<comments>http://www.cardinalpeak.com/blog/?p=536#comments</comments>
		<pubDate>Thu, 22 Apr 2010 18:36:10 +0000</pubDate>
		<dc:creator>Mike Deeds Partner</dc:creator>
				<category><![CDATA[Engineering Management]]></category>
		<category><![CDATA[Mike_Deeds]]></category>
		<category><![CDATA[hardware engineering]]></category>

		<guid isPermaLink="false">http://www.cardinalpeak.com/blog/?p=536</guid>
		<description><![CDATA[My last post suggested ways to reduce parts costs in a low-to-mid volume product. This post explores ways to keep development costs low while still creating a cost-effective product.
You can’t escape the fact that it takes money to create a low cost product. It is estimated that the first version of the iPhone had COGS [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.cardinalpeak.com/blog/?p=508">My last post</a> suggested ways to reduce parts costs in a low-to-mid volume product. This post explores ways to keep development costs low while still creating a cost-effective product.</p>
<p>You can’t escape the fact that it takes money to create a low cost product. It is estimated that the first version of the iPhone had COGS (cost-of-goods-sold) of <a href="http://financial-alchemist.blogspot.com/2009/07/apple-inc-aapl-iphones-substantial.html">around $200</a>. That is very impressive given all of the included features. However, it is also estimated that Apple <a href="http://www.wired.com/gadgets/wireless/magazine/16-02/ff_iphone?currentPage=all">spent around $150M</a> over 30 months to design the iPhone. In addition to that direct investment, Apple was also able to entice their component vendors to spend huge amounts of money to design custom ICs for the device.</p>
<p>Our clients typically do not have that kind of money to invest in a new product. (Although please give us a call if you do!) And even if they did, it often doesn’t make business sense to invest a lot of money in upfront engineering in order to reduce COGS, unless you have high volumes. So it’s more typical that we are performing a balancing act between the budget available for engineering, and meeting the target COGS that is necessary for a successful product.</p>
<p>Here are some guidelines that we use to keep development costs low and still create cost-competitive products.</p>
<ul>
<li>Start with solid requirements definition.</li>
</ul>
<p style="padding-left: 55px;">The most successful design projects start with clear product requirements. Starting a project with known requirements (and not changing them during the product design phase) is the best way to minimize development costs. Of course, most projects aren’t so lucky as to have perfectly defined requirements. It usually takes the management team some time to balance the requirements, development cost targets, product cost targets, and project time-lines. Still, the more quickly you can define what the product has to do, the less the engineering team will thrash—and thrashing costs money.</p>
<ul>
<li>Keep the schedule short.</li>
</ul>
<p style="padding-left: 55px;">This is pretty obvious, but the longer a project takes, the more it costs. We’re big believers in rapidly getting a solid, but perhaps less-fully-featured, product to market. In addition to keeping development costs low, this strategy also allows you to quickly gain feedback from the market and focus investment for an enhancement release on the most important areas.</p>
<ul>
<li>Hire an experienced design team.</li>
</ul>
<p style="padding-left: 55px;">An experienced design team will put together a more accurate project schedule and budget, is more likely to meet the project deadlines, and is better prepared to overcome the inevitable hurdles along the way.</p>
<ul>
<li>Explore the trade-offs between custom hardware design and using off-the-shelf components.</li>
</ul>
<p style="padding-left: 55px;">For low to mid-volume products, using off-the-shelf subsystems and components can be a very tempting way to reduce development costs. However, off-the-shelf subsystems (such as single board computers, power supplies, cases, etc) can be quite a bit more expensive than custom designed hardware. It is worth a thorough investigation of what solutions might exist to reduce your design effort yet still meet the product cost targets. Small hardware vendors may be willing to modify their products to fit your application and costs, and may not charge NRE to do this. As with all vendors, negotiating goes a long way towards minimizing your costs.</p>
<ul>
<li>Put together a hardware prototype early in the project.</li>
</ul>
<p style="padding-left: 55px;">In most product development projects, the software effort is larger than the hardware effort. As such, we structure projects to give the software engineers as much time as possible to work on the target hardware. This maximizes the time available for low level hardware and software bugs to be found and resolved.</p>
<ul>
<li>Start with reference designs and evaluation boards where possible.</li>
</ul>
<p style="padding-left: 55px;">Most semiconductor products have reference designs and evaluation boards available to give you a head start. For a low-volume product, these designs can be especially helpful to minimize development time.</p>
<ul>
<li>Use the vendor Field Application Engineers</li>
</ul>
<p style="padding-left: 55px;">IC vendors usually have FAEs that will help integrate their products into your design. FAEs are usually willing to do schematic reviews, help with software drivers, and even help debug parts of the system if necessary. These folks increase the chance of a successful first revision design and can reduce the development time of a product.</p>
<ul>
<li>Use open source software where possible.</li>
</ul>
<p style="padding-left: 55px;">We are big advocates of open source software here at Cardinal Peak. There is huge opportunity for reducing the development time and costs in a project by using open source modules. However, it is not easy to properly integrate open source software (especially real-time or embedded modules) into a complex product. To properly take advantage of the open-source benefits, a team that has experience with this is imperative.</p>
<p><em>Mike Deeds is an expert in embedded systems </em><a href="http://www.cardinalpeak.com/expertise/hardwaredesigns.php"><em>hardware</em></a><em> and </em><a href="http://www.cardinalpeak.com/expertise/embeddedsoftware.php"><em>software engineering</em></a><em>, including FPGA design and computer architecture.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.cardinalpeak.com/blog/?feed=rss2&amp;p=536</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Thoughts on 3D after NAB</title>
		<link>http://www.cardinalpeak.com/blog/?p=544</link>
		<comments>http://www.cardinalpeak.com/blog/?p=544#comments</comments>
		<pubDate>Mon, 19 Apr 2010 16:24:33 +0000</pubDate>
		<dc:creator>Mike Perkins Managing Partner</dc:creator>
				<category><![CDATA[Perk]]></category>
		<category><![CDATA[Video]]></category>
		<category><![CDATA[3D]]></category>

		<guid isPermaLink="false">http://www.cardinalpeak.com/blog/?p=544</guid>
		<description><![CDATA[I just returned from this year’s NAB show, where I was bombarded with 3D demos in virtually every booth. Most of the factors driving this 3D superabundance originate outside of the broadcast industry itself. First, TV manufacturers are hot on 3D as a way to get everyone who just bought an HDTV to upgrade to [...]]]></description>
			<content:encoded><![CDATA[<p>I just returned from this year’s <a href="http://www.nabshow.com/2010/default.asp">NAB show</a>, where I was bombarded with 3D demos in virtually every booth. Most of the factors driving this 3D superabundance originate outside of the broadcast industry itself. First, TV manufacturers are hot on 3D as a way to get everyone who just bought an HDTV to upgrade to a new 3D enabled display. Cinema owners like 3D because they can charge more for the tickets. The Blu-ray consortium recently <a href="http://www.blu-ray.com/news/?id=3924">standardized</a> a method for storing 3D video on BD discs, and hopes to enable and piggyback on the efforts of the TV manufacturers as a way <a href="http://www.pcworld.com/article/172747/will_3d_tv_be_blurays_savior.html">to drive sales of 3D Blu-ray players</a> and finally displace DVDs. Hollywood has begun producing more 3D movies too, including the phenomenally successful <a href="http://www.avatarmovie.com/">Avatar</a> (which will be released on 3D Blu-ray very soon). So naturally the broadcast industry needs to be prepared to author and carry 3D signals.</p>
<p>Most of the demos I saw were nothing more than “here, put on these glasses”…in other words, “me too” type demos. (Although after seeing all this 3D interest I did wonder if Philips regretted terminating their auto stereographic display effort last year!).</p>
<p>Nevertheless, I did see two problems addressed that I found technically interesting. First, real-time 2D to 3D conversion (see <a href="http://newsroom.jvc.com/2010/01/jvc-introduces-if-2d3d1-stereoscopic-image-processor-to-help-3d-content-creators-improve-workflow/#more-801">here</a> and <a href="http://www.crc.gc.ca/en/html/crc/home/mediazone/whatsnew/apr12-16_10_1">here</a>), and second, <a href="http://www.qoesystems.com/QMaster3D.html">automatic 3D quality monitoring</a>.</p>
<p>I worked a lot on the problem of <a href="http://adsabs.harvard.edu/abs/1992ITCom..40..684P">compressing stereopairs</a> as part of my Ph.D. research, and I also spent time thinking about 3D video quality assessment. However, I had never considered the problem of real-time 2D to 3D conversion, so the show got me thinking. It’s a pretty tricky problem!</p>
<p>Converting a 2D video stream to 3D can be partitioned into two fundamental steps. First, creating a depth map for each video image, and second, using the depth map to construct a second viewpoint. Although both steps are challenging, the first step feels substantially harder to me.</p>
<p>With regards to the first step, a sequence of 2D video images must be analyzed to extract a depth map. Several special cases are worth discussing, but I’ll only mention two. First, consider the case where the camera is stationary and a 3D object moves through the field of view. The closer points on that object will have frame-to-frame pixel displacements that are larger than those for object points that are further away. Therefore, one useful approach for deriving information for a depth map would be the following: a) segment the image into two regions: moving and stationary; b) segment the moving areas into distinct objects using various clues such as color and proximity; c) find distinct matching points on the moving objects in two different frames; d) determine depths for those matching points based on the measured point displacements; e) interpolate the depth map for non-matched moving object points.</p>
<p>As a second special case, consider the situation where nothing is moving in the video sequence for many frames in a row. In this case, occlusion becomes a major depth cue. If one object is in front of another, then it will occlude the background object, and it must be closer. If an image can be segmented into objects, and an occlusion map can be deduced, then different depths can be assigned to different objects based on where they lie in the occlusion map. Other clues that may be algorithmically exploitable could stem from perspective considerations applied to the edges of identified objects.</p>
<p>Many powerful depth clues will be hard to take advantage of algorithmically—although humans can exploit them easily—because they involve recognizing objects. For example, we can easily recognize two humans in a picture, and determine whether or not they are adults or children. We know that if two adult males appear in the picture, and one appears substantially taller than the other (and isn’t holding a basketball), then the shorter one is further away. I suspect that taking advantage of this sort of knowledge is beyond the capability of today’s real-time (and non real-time!) processing. Nevertheless, I was amazed at how well the systems I saw at the show appeared to work.</p>
<p>With regards to the second step, given an image and a depth map, a second view can be created from the first by displacing each pixel of the original image with a disparity value corresponding to its depth. In practice it won’t be that easy. Why? Because after displacing pixels with their appropriate disparities, gaps will appear in the new image. These gaps result from image detail that is visible in one image but not in the other, so the gaps will need to be interpolated or otherwise synthesized in some reasonable way.</p>
<p>With regards to 3D video quality assessment, I just want to interject a note of caution. I was encouraged to see that several vendors have made progress in developing systems that automatically approximate the “mean opinion scores” that subjective human evaluation tests would assign to various image sequences. However, when dealing with 3D video, the sum is greater than the parts. If the algorithmic approach implemented is to naively apply 2D image quality assessment to the left and right pictures independently, and then average the scores together, the result is likely to not correspond at all to a human’s subjective viewing experience. For those of you who wear glasses, like me, you can experience this directly if one of your eyes is better than the other. Take off our glasses, look at the world around you, and you will see it with the resolution of your better eye; but you will still have stereoscopic vision. This effect will ultimately need to be taken into account in automatic systems that purport to algorithmically assess the quality of a 3D image sequence.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cardinalpeak.com/blog/?feed=rss2&amp;p=544</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sniffing iPad Traffic</title>
		<link>http://www.cardinalpeak.com/blog/?p=519</link>
		<comments>http://www.cardinalpeak.com/blog/?p=519#comments</comments>
		<pubDate>Wed, 07 Apr 2010 23:34:53 +0000</pubDate>
		<dc:creator>Howdy Pierce Managing Partner</dc:creator>
				<category><![CDATA[Howdy]]></category>
		<category><![CDATA[Networking]]></category>
		<category><![CDATA[iPad]]></category>
		<category><![CDATA[network capture]]></category>
		<category><![CDATA[Wireshark]]></category>

		<guid isPermaLink="false">http://www.cardinalpeak.com/blog/?p=519</guid>
		<description><![CDATA[For a project I’m working on, I was wondering how a particular video-related feature on Apple’s new iPad works. In order to figure that out, I thought it would be interesting to connect a network sniffer in-line with my shiny new iPad, so I could capture and analyze all the network traffic flowing to and [...]]]></description>
			<content:encoded><![CDATA[<p>For a project I’m working on, I was wondering how a particular video-related feature on Apple’s new iPad works. In order to figure that out, I thought it would be interesting to connect a network sniffer in-line with my shiny new iPad, so I could capture and analyze all the network traffic flowing to and from the device.</p>
<p>Although I did this with the iPad, the technique below is not specific to it; you could use the approach below to capture network traffic to any Wi-Fi-enabled mobile device, like an iPod Touch or a Palm Pre.</p>
<p>An easy way to do this is to configure a computer to serve as a bridge between an Ethernet network and an ad-hoc Wi-Fi network. Then, by running <a href="http://www.wireshark.org/">Wireshark</a> or another network sniffer on the computer, you can capture the packets as they flow through to the mobile device on Wi-Fi.</p>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/04/sniff_ipad_system_overview.png"><img class="aligncenter size-full wp-image-524" title="sniff_ipad_system_overview" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/04/sniff_ipad_system_overview.png" alt="Sniffing iPad Traffic" width="500" height="197" /></a></p>
<p>My computer is a MacBook Pro running OS/X 10.6 “Snow Leopard”, but the same concept should work on Windows or on earlier OS/X versions, although the dialogs might look a little different. There are three steps:</p>
<ul>
<li>Configure the computer to act as a Wi-Fi Bridge</li>
<li>Connect the iPad to the computer’s ad-hoc Wi-Fi network</li>
<li>Capture the packets</li>
</ul>
<h2>Step 1: Configure OS/X as a Wi-Fi Bridge</h2>
<p>First, we need to configure OS/X as a Wi-Fi bridge. To do this, select “Create Network…” from the Airport drop-down menu. This dialog appears:</p>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/04/osx_configure_adhoc_wifi.png"><img class="aligncenter size-full wp-image-522" title="osx_configure_adhoc_wifi" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/04/osx_configure_adhoc_wifi.png" alt="" width="427" height="396" /></a></p>
<p>Type a network name, and, if you like, assign a password. I assigned a password just so I could ensure that only one device was connecting to my bridged Mac. We are nerds here at Cardinal Peak, so we tend to have a lot of devices floating around our office!</p>
<p>At this point, the iPad would be able to connect to the computer, but the computer is not yet configured to bridge the packets from the 802.11 network onto the Ethernet network. To configure bridging on OS/X, you need to turn on what Apple calls “Internet Sharing”. Go to System Preferences and select the “Sharing” option. Turn on Internet Sharing, and set it up to “Share your connection from” “Ethernet”, “To computers using” “AirPort”:</p>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/04/osx_configure_sharing.png"><img class="aligncenter size-full wp-image-523" title="osx_configure_sharing" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/04/osx_configure_sharing.png" alt="" width="500" height="350" /></a></p>
<h2>Step 2: Connect the iPad to the ad-hoc Wi-Fi network</h2>
<p>Next, you’ll need to configure the iPad to connect to the ad-hoc Wi-Fi network you just created. This is pretty easy: Go to Settings, and then Wi-Fi. You should see your new ad-hoc network in the list—in my case, I’m looking for “HowdysNetwork”:</p>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/04/ipad_network_config_1.png"><img class="aligncenter size-full wp-image-520" title="ipad_network_config_1" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/04/ipad_network_config_1.png" alt="" width="500" height="372" /></a></p>
<p>Just tap on the ad-hoc network. If you elected to use a password, you’ll be prompted for it.</p>
<p>You can confirm your iPad’s network configuration by tapping the right arrow next to the network name:</p>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/04/ipad_network_config_2.png"><img class="aligncenter size-full wp-image-521" title="ipad_network_config_2" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/04/ipad_network_config_2.png" alt="" width="500" height="440" /></a></p>
<p>Good—we have an IP address, but more importantly we have reasonable entries for Router and DNS server, as well.</p>
<p>Next, you should test out your bridged network connection by bringing up Safari on the iPad and proving you can visit a web site.</p>
<h2>Step 3: Capture the Packets</h2>
<p>The final step is to start up Wireshark on your computer and attach to the Wi-Fi interface. You normally need to start Wireshark as the super-user in order to have enough rights to capture traffic. There’s probably a cool way to do this graphically, but being an old-school Unix guy, I always bring up a Terminal window and type <code>sudo wireshark &amp;</code>.</p>
<p>We want to capture packets on the Wi-Fi interface, which on my Mac is device en1. Click the leftmost button on the Wireshark toolbar, and then click “Start” next to device en1:</p>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/04/wireshark_begin_capture.png"><img class="aligncenter size-full wp-image-525" title="wireshark_begin_capture" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/04/wireshark_begin_capture.png" alt="" width="500" height="169" /></a></p>
<p>Now you should be all set—do something on your iPad to cause network traffic, and confirm that you see it showing up in the Wireshark window!</p>
<p><a href="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/04/working.jpg"><img class="aligncenter size-full wp-image-527" title="working" src="http://www.cardinalpeak.com/blog/wp-content/uploads/2010/04/working.jpg" alt="" width="500" height="305" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.cardinalpeak.com/blog/?feed=rss2&amp;p=519</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>
