View from the Peak
A blog on engineering topics including hardware and software/firmware design; video, mobile application and signal processing technologies; and engineering applications in industries including security, cable/satellite, enterprise video, oil and gas, law enforcement, smartphone, defense and communications, among others. View From The Peak is written by Cardinal Peak's partners: Howdy Pierce, Mike Perkins, Chad Scates, Ben Mesander and Mike Deeds.
Creating Single Frame Movies
My camera (an Olympus SP-570UZ) allows me to optionally record a four-second audio clip with each photo I take. I haven’t used this feature much, because I typically upload my photos to Flickr, and there’s been no good way to associate the audio with the video. Ideally, I would like an audio player to appear below the photo, but there aren’t really any public audio sharing websites with much longevity. And, in any case, Flickr won’t allow me to embed an audio player in my photo description.
Recently, it occurred to me that since Flickr allows short movies (up to 1:30 long), maybe I could create a single-frame movie with the still picture as the frame and the audio as the sound track. Then the Flickr movie player would serve as the control for the audio, and the audio and the video would stay associated with each other.
I decided to try to use ffmpeg to create the movie, since it seems to be able to do almost anything with video and audio. The command line for ffmpeg is a bit obscure, so this blog post documents about two hours of my time spent getting it to work.
My camera produces 3648×2736 JPEG images, and the audio files are 8 kHz sample rate, mono, 8 bit unsigned PCM samples in WAV file format. I decided my goal would be to create a motion JPEG (MJPEG) encoded AVI file with maximum quality.
I started by searching the web to see if anyone had done this before. By studying those examples and experimenting, I came up with the following ffmpeg command line:
ffmpeg.exe -loop_input -shortest -f image2 -r 0.25 -i P910033.jpg -i P910033.wav -vcodec mjpeg -qscale 1 -t 4 foo.avi
Most of my attempts caused ffmpeg to hang. But eventually, I got the error message below:
Duration: 00:00:04.00, start: 0.000000, bitrate: N/A
Stream #0.0: Video: mjpeg, yuvj422p, 3648x2736, 0.25 tbr, 0.25 tbn, 0.25 tbc
[wav @ 01a80050]Estimating duration from bitrate, this may be inaccurate
Input #1, wav, from 'P6060033.wav':
Duration: 00:00:04.02, bitrate: 64 kb/s
Stream #1.0: Audio: pcm_u8, 8000 Hz, 1 channels, u8, 64 kb/s
[mp2 @ 01ac6310]Sampling rate 8000 is not allowed in mp2
Output #0, avi, to 'foo.avi':
Stream #0.0: Video: mjpeg, yuvj422p, 3648x2736, q=2-31, 200 kb/s, 90k tbn, 0
.25 tbc
Stream #0.1: Audio: mp2, 8000 Hz, 1 channels, s16, 64 kb/s
Stream mapping:
Stream #0.0 -> #0.0
Stream #1.0 -> #0.1
Error while opening encoder for output stream #0.1 - maybe incorrect parameters such as bit_rate, rate, width or height
At last I understood the problem: ffmpeg needs the audio sampled at some rate other than 8 kHz. So I decided to use Audacity, another open source application, to upsample the sound. However, now Audacity was unhappy with this audio format.
So I used Project->Import Raw Data, and selected my WAV file. I set up the import with the following parameters:
I knew this would work, because the WAV file format consists of a header, followed by PCM data, in this case 8 kHz unsigned samples. So the result in the audio editor would be an audio file with the WAV header as a noisy sound at the start, followed by the data I wanted. The selected (darker) portion of the WAV file below is the header. I used Edit->Cut to remove it.
Finally, I tried to save the audio at a different sample rate. The audio file has a pulldown menu that lets you change the sample rate, but it doesn’t do what I wanted—what it does is play the audio file back at a different rate with aliasing.
Instead, after consulting the Audacity documentation, I discovered you use the menu in at the lower left corner of the main Audacity window to set the sample rate.
Change this to 48000, and choose File->Export as WAV to save at the new sample rate. I re-ran ffmpeg, and the resulting AVI file would play in QuickTime and VLC player (although VLC crashes afterwards), but it would not work in Windows Media Player (audio played, no video), divx, realplayer, or Flickr. So, I decided to try encoding to mp4 instead with the following command:
ffmpeg.exe -loop_input -shortest -f image2 -r 0.25 -i P910033.jpg -i P910033.wav bar.mp4
The resulting mp4 file plays in all the media players (although, again, VLC crashes after playing it), and Flickr can read it successfully as well. Here is what it looks like on Flickr:
Using size as a proxy for quality, however, the encoded video is much smaller than the input JPEG file. Can someone suggest additional flags to ffmpeg to improve the encoding quality?
Ben Mesander has more than 18 years of experience leading software development teams and implementing software. His strengths include Linux, C, C++, numerical methods, control systems and digital signal processing. His experience includes embedded software, scientific software and enterprise software development environments.
Creating the Orton Effect in Gimp
Recently I decided to learn how to write scripts in the Gimp image editing program to automate certain tasks. The first task I wanted to automate was the Orton effect. This is an effect invented by Michael Orton in the 1990’s, which consists of taking two copies of an image, one blurred, and one sharp, and mixing them to produce an image with a dreamy quality. It is especially well suited to landscape and flower photography.
The Orton effect was originally achieved by taking two photos: a well-focused image that was overexposed by two stops, and an out-of-focus image of the same scene that was overexposed by one stop. These were then printed as slides and sandwiched together to produce the final image.
With digital photography, one way to achieve this effect is to shoot a single raw image of a scene. The raw image can be developed to two JPEGs, one at +1 EV (Exposure Value), and the other at +2. My script blurs the +1 EV image with a two dimensional Gaussian filter with a standard deviation of 40 pixels, loads the second +2 EV image, sharpens it with an unsharp mask, and then overlays the two images. There are a variety of ways the images can be overlaid, but I prefer to multiply them, which enhances the color saturation in light areas. This is done by the Gimp by calculating (blur layer × sharp layer) / 255, which results in the image darkening, and an increase in color saturation.
My Gimp script to do this is available on the Gimp plugin registry.
The soft focus of the colors and the sharpness of the image got me thinking: Is the Orton effect really equivalent to heavily subsampling the chroma channels of the image, and sharpening the luma channel? JPEG and MPEG compression both make use of the fact that the human eye is not as sensitive to chroma (color) as it is to brightness (luma). Typically, both still and video compression uses 4:2:0 chroma subsampling to reduce the number of bits used to represent color information in compressed images without a perceptible quality difference to the human visual system.
I decided to test my theory. It turns out the Gimp has the ability to decompose an image into its YCbCr luma and chroma components used in the JPEG and MPEG compression process.
![]() Y |
![]() Cb |
![]() Cr |
Once I had the image split into its separate components, the Gimp allowed me to apply my Gaussian filter to just the Cb and Cr components, and then regenerate a new color image from the components.
Unfortunately, as you can see, the image is nothing like the image that underwent Orton processing—my intuition was wrong. However, it is interesting to see just how much one can low-pass filter an image without a huge impact on the image. I increased the standard deviation of my Gaussian filter from 40 pixels to 100 with the following result—the image is still recognizable and doesn’t look too bad, although the color bleeds outside the lines. It’s interesting to note that the resulting JPEG is also smaller because the low-pass filtered chroma information is easier to compress.
Additionally, it is interesting to see what happens if we decompose our squid into RGB components instead of YCbCr and filter two of them with a 100-point deviation Gaussian filter.
Yuck. We can clearly see the advantage of chroma subsampling here over RGB subsampling.
The Basics of 3D Image Acquisition
One of our clients is heavily involved in 3D video and has been for several years. However, several are just now starting to think about it because of the uptick of interest in the consumer electronics world. Enough questions have been posed to us recently that it seemed worthwhile to me to pull together a few basic facts regarding 3D stereopair imaging and stereo disparity.
First, we need a simple model of a lens. Consider the diagram below:
In this picture, the long horizontal line that passes through the center of the lens is called the lens axis. The lens has the property that rays that pass through the center of the lens are undeviated. Therefore, the ray from the top of the tree, at a distance l to the left of the lens, passes straight through the center of the lens. (The tree has a height of h.) The lens also has the property that rays that arrive perpendicular to the lens are refracted to pass through the focal point of the lens. The focal point lies on the lens axis and is a distance f from the center of the lens. The intersection of these two rays shows where the image of the tree will be formed. You can see that the image of the tree is upside down, and has a new height h’. The image is formed a distance d to the right of the focal point.
By using similar triangles we see first that
Using a different pair of similar triangles we also see that
Solving the first equation above for h’, substituting the result into the second equation and simplifying, we derive the following relationship:
This is the fundamental equation of a simple lens. It shows that as the object gets further and further from the lens, i.e. as l increases, the distance of the image of the object from the focal plane decreases, i.e. d gets smaller. We can assume that the camera’s image sensor is located at a distance f from the lens, is perpendicular to the lens axis, and that all objects more than a certain distance away from the lens will be in focus. In other words, the image of all sufficiently distant objects will appear on the focal plane where the image sensor is located.
In the case of 3D video, two cameras are used to acquire a sequence of stereopair images, one from the left camera and one from the right. Different stereo geometries are possible, but the most common one is to place the two cameras horizontally apart from each other by a distance i, and to keep their focal planes coplanar. The diagram below illustrates this configuration:
The horizontal line at the bottom is the focal plane; it is clear from the diagram that the focal planes are coplanar. The lenses are a distance f from the focal plane and are separated by a distance of i from each other. We assume that a small object (or a point on a larger object) is located a distance l from the lens plane and a distance m to the right of the axis of the right lens. We want to know where the image of that object appears in the left and the right camera. In particular, we want to know if we overlaid the left image on top of the right image, how far apart would the images appear? Mathematically, we want to know the disparity, which we define to be
where s1 and s2 are the distances from the image point to the intersection of the lens axis with the focal plane for the left and the right cameras respectively. Note that we are assuming that the object being imaged is far enough away that its image forms on the focal plane.
Using our favorite trick of similar triangles we have the following two equations:
and
Solving the first equation for s1, the second equation for s2, taking the difference and simplifying yields
Although this expression was derived for an object to the right of the axis of the right camera, it is easy to show in a similar manner that it is also true for an object between the axes of the two cameras as well as for an object to the left of the axis of the left camera.
So what does this equation tell us? First, it says that for this particular camera geometry, the disparity is only a function of the separation between the two cameras, i, and the distance of the object from the lens plane, l. Second the equation tells us that the disparity increases as we increase the separation between the cameras. Finally, it tells us that the disparity decreases as the object gets further away from the cameras, approaching zero for objects an infinite distance away. (You can see this when you watch 3D content without wearing the special 3D glasses: The “distant” objects can be seen by the naked eye, whereas the near objects appear blurry to the naked eye, because the value of ρ is greater.)
It should be clear from this equation that if a stereopair is available, and corresponding points can be found in the left and right pictures, that the disparity between those points can be measured, and the distance to the point can be computed.
Mike Perkins, Ph.D., is a managing partner of Cardinal Peak and an expert in algorithm development for video and signal processing applications.
Minimizing Development Costs on Low-to-Mid Volume Products
My last post suggested ways to reduce parts costs in a low-to-mid volume product. This post explores ways to keep development costs low while still creating a cost-effective product.
You can’t escape the fact that it takes money to create a low cost product. It is estimated that the first version of the iPhone had COGS (cost-of-goods-sold) of around $200. That is very impressive given all of the included features. However, it is also estimated that Apple spent around $150M over 30 months to design the iPhone. In addition to that direct investment, Apple was also able to entice their component vendors to spend huge amounts of money to design custom ICs for the device.
Our clients typically do not have that kind of money to invest in a new product. (Although please give us a call if you do!) And even if they did, it often doesn’t make business sense to invest a lot of money in upfront engineering in order to reduce COGS, unless you have high volumes. So it’s more typical that we are performing a balancing act between the budget available for engineering, and meeting the target COGS that is necessary for a successful product.
Here are some guidelines that we use to keep development costs low and still create cost-competitive products.
- Start with solid requirements definition.
The most successful design projects start with clear product requirements. Starting a project with known requirements (and not changing them during the product design phase) is the best way to minimize development costs. Of course, most projects aren’t so lucky as to have perfectly defined requirements. It usually takes the management team some time to balance the requirements, development cost targets, product cost targets, and project time-lines. Still, the more quickly you can define what the product has to do, the less the engineering team will thrash—and thrashing costs money.
- Keep the schedule short.
This is pretty obvious, but the longer a project takes, the more it costs. We’re big believers in rapidly getting a solid, but perhaps less-fully-featured, product to market. In addition to keeping development costs low, this strategy also allows you to quickly gain feedback from the market and focus investment for an enhancement release on the most important areas.
- Hire an experienced design team.
An experienced design team will put together a more accurate project schedule and budget, is more likely to meet the project deadlines, and is better prepared to overcome the inevitable hurdles along the way.
- Explore the trade-offs between custom hardware design and using off-the-shelf components.
For low to mid-volume products, using off-the-shelf subsystems and components can be a very tempting way to reduce development costs. However, off-the-shelf subsystems (such as single board computers, power supplies, cases, etc) can be quite a bit more expensive than custom designed hardware. It is worth a thorough investigation of what solutions might exist to reduce your design effort yet still meet the product cost targets. Small hardware vendors may be willing to modify their products to fit your application and costs, and may not charge NRE to do this. As with all vendors, negotiating goes a long way towards minimizing your costs.
- Put together a hardware prototype early in the project.
In most product development projects, the software effort is larger than the hardware effort. As such, we structure projects to give the software engineers as much time as possible to work on the target hardware. This maximizes the time available for low level hardware and software bugs to be found and resolved.
- Start with reference designs and evaluation boards where possible.
Most semiconductor products have reference designs and evaluation boards available to give you a head start. For a low-volume product, these designs can be especially helpful to minimize development time.
- Use the vendor Field Application Engineers
IC vendors usually have FAEs that will help integrate their products into your design. FAEs are usually willing to do schematic reviews, help with software drivers, and even help debug parts of the system if necessary. These folks increase the chance of a successful first revision design and can reduce the development time of a product.
- Use open source software where possible.
We are big advocates of open source software here at Cardinal Peak. There is huge opportunity for reducing the development time and costs in a project by using open source modules. However, it is not easy to properly integrate open source software (especially real-time or embedded modules) into a complex product. To properly take advantage of the open-source benefits, a team that has experience with this is imperative.
Mike Deeds is an expert in embedded systems hardware and software engineering, including FPGA design and computer architecture.
Thoughts on 3D after NAB
I just returned from this year’s NAB show, where I was bombarded with 3D demos in virtually every booth. Most of the factors driving this 3D superabundance originate outside of the broadcast industry itself. First, TV manufacturers are hot on 3D as a way to get everyone who just bought an HDTV to upgrade to a new 3D enabled display. Cinema owners like 3D because they can charge more for the tickets. The Blu-ray consortium recently standardized a method for storing 3D video on BD discs, and hopes to enable and piggyback on the efforts of the TV manufacturers as a way to drive sales of 3D Blu-ray players and finally displace DVDs. Hollywood has begun producing more 3D movies too, including the phenomenally successful Avatar (which will be released on 3D Blu-ray very soon). So naturally the broadcast industry needs to be prepared to author and carry 3D signals.
Most of the demos I saw were nothing more than “here, put on these glasses”…in other words, “me too” type demos. (Although after seeing all this 3D interest I did wonder if Philips regretted terminating their auto stereographic display effort last year!).
Nevertheless, I did see two problems addressed that I found technically interesting. First, real-time 2D to 3D conversion (see here and here), and second, automatic 3D quality monitoring.
I worked a lot on the problem of compressing stereopairs as part of my Ph.D. research, and I also spent time thinking about 3D video quality assessment. However, I had never considered the problem of real-time 2D to 3D conversion, so the show got me thinking. It’s a pretty tricky problem!
Converting a 2D video stream to 3D can be partitioned into two fundamental steps. First, creating a depth map for each video image, and second, using the depth map to construct a second viewpoint. Although both steps are challenging, the first step feels substantially harder to me.
With regards to the first step, a sequence of 2D video images must be analyzed to extract a depth map. Several special cases are worth discussing, but I’ll only mention two. First, consider the case where the camera is stationary and a 3D object moves through the field of view. The closer points on that object will have frame-to-frame pixel displacements that are larger than those for object points that are further away. Therefore, one useful approach for deriving information for a depth map would be the following: a) segment the image into two regions: moving and stationary; b) segment the moving areas into distinct objects using various clues such as color and proximity; c) find distinct matching points on the moving objects in two different frames; d) determine depths for those matching points based on the measured point displacements; e) interpolate the depth map for non-matched moving object points.
As a second special case, consider the situation where nothing is moving in the video sequence for many frames in a row. In this case, occlusion becomes a major depth cue. If one object is in front of another, then it will occlude the background object, and it must be closer. If an image can be segmented into objects, and an occlusion map can be deduced, then different depths can be assigned to different objects based on where they lie in the occlusion map. Other clues that may be algorithmically exploitable could stem from perspective considerations applied to the edges of identified objects.
Many powerful depth clues will be hard to take advantage of algorithmically—although humans can exploit them easily—because they involve recognizing objects. For example, we can easily recognize two humans in a picture, and determine whether or not they are adults or children. We know that if two adult males appear in the picture, and one appears substantially taller than the other (and isn’t holding a basketball), then the shorter one is further away. I suspect that taking advantage of this sort of knowledge is beyond the capability of today’s real-time (and non real-time!) processing. Nevertheless, I was amazed at how well the systems I saw at the show appeared to work.
With regards to the second step, given an image and a depth map, a second view can be created from the first by displacing each pixel of the original image with a disparity value corresponding to its depth. In practice it won’t be that easy. Why? Because after displacing pixels with their appropriate disparities, gaps will appear in the new image. These gaps result from image detail that is visible in one image but not in the other, so the gaps will need to be interpolated or otherwise synthesized in some reasonable way.
With regards to 3D video quality assessment, I just want to interject a note of caution. I was encouraged to see that several vendors have made progress in developing systems that automatically approximate the “mean opinion scores” that subjective human evaluation tests would assign to various image sequences. However, when dealing with 3D video, the sum is greater than the parts. If the algorithmic approach implemented is to naively apply 2D image quality assessment to the left and right pictures independently, and then average the scores together, the result is likely to not correspond at all to a human’s subjective viewing experience. For those of you who wear glasses, like me, you can experience this directly if one of your eyes is better than the other. Take off our glasses, look at the world around you, and you will see it with the resolution of your better eye; but you will still have stereoscopic vision. This effect will ultimately need to be taken into account in automatic systems that purport to algorithmically assess the quality of a 3D image sequence.
Sniffing iPad Traffic
For a project I’m working on, I was wondering how a particular video-related feature on Apple’s new iPad works. In order to figure that out, I thought it would be interesting to connect a network sniffer in-line with my shiny new iPad, so I could capture and analyze all the network traffic flowing to and from the device.
Although I did this with the iPad, the technique below is not specific to it; you could use the approach below to capture network traffic to any Wi-Fi-enabled mobile device, like an iPod Touch or a Palm Pre.
An easy way to do this is to configure a computer to serve as a bridge between an Ethernet network and an ad-hoc Wi-Fi network. Then, by running Wireshark or another network sniffer on the computer, you can capture the packets as they flow through to the mobile device on Wi-Fi.
My computer is a MacBook Pro running OS/X 10.6 “Snow Leopard”, but the same concept should work on Windows or on earlier OS/X versions, although the dialogs might look a little different. There are three steps:
- Configure the computer to act as a Wi-Fi Bridge
- Connect the iPad to the computer’s ad-hoc Wi-Fi network
- Capture the packets
Step 1: Configure OS/X as a Wi-Fi Bridge
First, we need to configure OS/X as a Wi-Fi bridge. To do this, select “Create Network…” from the Airport drop-down menu. This dialog appears:
Type a network name, and, if you like, assign a password. I assigned a password just so I could ensure that only one device was connecting to my bridged Mac. We are nerds here at Cardinal Peak, so we tend to have a lot of devices floating around our office!
At this point, the iPad would be able to connect to the computer, but the computer is not yet configured to bridge the packets from the 802.11 network onto the Ethernet network. To configure bridging on OS/X, you need to turn on what Apple calls “Internet Sharing”. Go to System Preferences and select the “Sharing” option. Turn on Internet Sharing, and set it up to “Share your connection from” “Ethernet”, “To computers using” “AirPort”:
Step 2: Connect the iPad to the ad-hoc Wi-Fi network
Next, you’ll need to configure the iPad to connect to the ad-hoc Wi-Fi network you just created. This is pretty easy: Go to Settings, and then Wi-Fi. You should see your new ad-hoc network in the list—in my case, I’m looking for “HowdysNetwork”:
Just tap on the ad-hoc network. If you elected to use a password, you’ll be prompted for it.
You can confirm your iPad’s network configuration by tapping the right arrow next to the network name:
Good—we have an IP address, but more importantly we have reasonable entries for Router and DNS server, as well.
Next, you should test out your bridged network connection by bringing up Safari on the iPad and proving you can visit a web site.
Step 3: Capture the Packets
The final step is to start up Wireshark on your computer and attach to the Wi-Fi interface. You normally need to start Wireshark as the super-user in order to have enough rights to capture traffic. There’s probably a cool way to do this graphically, but being an old-school Unix guy, I always bring up a Terminal window and type sudo wireshark &.
We want to capture packets on the Wi-Fi interface, which on my Mac is device en1. Click the leftmost button on the Wireshark toolbar, and then click “Start” next to device en1:
Now you should be all set—do something on your iPad to cause network traffic, and confirm that you see it showing up in the Wireshark window!
Designing Low-to-Mid Volume Embedded Products Cost-Effectively
I take it as a given that when a client approaches us with a new embedded product idea, they will require a very demanding set of features and a minimal price tag. The “minimal price tag” part always applies to the development effort required. For products with a hardware component, it also applies to the product’s cost of goods sold (COGS).
Companies developing high volume consumer products can afford to spend quite a bit of engineering effort to reduce their COGS, since the development costs will be amortized over so many units. However, many of our clients sell low-to-medium volume products—in the range of 1,000 to 10,000 units per year. This volume is not high enough to leverage massive economies of scale, yet a clever design team can still create a very successful and cost effective product if some care is taken during the design phase.
There’s no magic bullet to reduce COGS in the low-to-mid volume range; it’s mostly common sense, coupled with the experience that comes from having built products in this volume range before. Here are some ideas for minimizing COGS:
- Use stocking distributors, but negotiate.
Stocking distributors can help reduce both product costs and development costs. Distributors have much higher leverage with IC suppliers than a small company, which can help reduce lead times and solve supply problems. One big advantage is that distributors often give forward pricing or high volume pricing (even at low volumes) that can significantly lower your product’s cost. The big ones (Avnet and Arrow) also have dedicated resources that can suggest technology solutions you might not be aware of. They can also facilitate deals with small 3rd party subsystem vendors, such as display manufacturers. As with any vendor, you must negotiate with these companies in order to see the best prices. They are in business to make a profit, but they can also act as a facilitator in negotiating with the end suppliers.
- Talk with vendors to learn about new products.
IC vendors always have new products in the pipeline. These are usually lower cost and higher performance than existing product. Of course there is risk in using a new product, especially if there is a lot of software support required. You need a close vendor relationship in order to successfully integrate a new product.
- Use ICs that are derivatives of high volume consumer products.
Many high volume consumer electronics ICs are custom designed for a certain customer or retail market. Large IC vendors such as Texas Instruments and Intel frequently offer embedded products that are derived from their consumer products. These are usually low cost, but very high performance products that are guaranteed to be available for many years. One example is the Intel Atom family. The consumer versions of the Atom chipset can disappear at a moments notice, but the embedded versions will be around for at least 7 years after release.
- Try to use stocked parts.
Many component suppliers have wonderful parts that would be a perfect fit for your product, but not all of these parts are actually available in small purchase quantities. Your best chance of a sustainable supply chain is to use parts that are currently stocked. Maxim is a company that is notorious for having lots of great parts in theory, but a much smaller subset of parts that are actually regularly manufactured and available for small purchase quantities. My colleague Todd refers to these types of parts as being made out of “Unobtainium”.
- Negotiate with your vendors!
This is a no-brainer, but one that engineering teams are not always comfortable with. Competition between suppliers is one of the best ways to reduce product costs. A lower cost competitive bid is your best negotiating position.
- Second-source your highest cost components, if possible.
Second sourcing components or subsystem modules creates a permanent negotiating position with your suppliers. A great example is to use a single board computer in a open form factor, such as COM Express. Many vendors supply these modules, with lower cost versions coming out in the future.
- Continue to work the supply chain to drive down costs over time.
Concentrate your efforts on the highest cost components. Unlike high volume products, it may not be worth the overhead cost to squeeze out the pennies.
- Pick a Contract Manufacturer that is a good fit for your product and volume.
If you don’t manufacture your own products, you will need to select a CM; this is an area where “bigger” definitely doesn’t mean “better”. It may be difficult to get and keep the attention of a large CM. It is more likely that a large CM will slip your deadlines in favor of their larger customers. You will be able to push on a small CM when you need to.
These are some of the guidelines we use to minimize costs for low to mid volume products. Part 2 of this article will cover minimizing development costs while still producing a cost-effective product.
Mike Deeds is an expert in embedded systems hardware and software engineering, including FPGA design and computer architecture.
Encoders Aren’t Commodities
My partner Ben Mesander had a really cool post the other day: An h.264 encoder written in 30 lines of C code.
Ben’s encoder outputs completely valid h.264, but it doesn’t actually compress anything. (What do you expect from 30 lines!) In fact, because of the necessary h.264 headers, the output of Ben’s encoder is larger than the input.
This is a dramatic example of something that I find interesting about the codec marketplace: Decoders are commodities, but encoders are highly differentiated. People often mis-understand this dynamic, however.
A video decoder, if it works, has to follow the relevant specification. There are hundreds of “tricks” that a baseline profile h.264 encoder could use, and so a baseline-profile decoder must be able to handle all of them. So there’s really not room for a lot of differentiation between h.264 decoders. Sure, one decoder might use less CPU than another. But mostly, if you’re looking to buy a decoder, you should shop based on price.
Another way to say the same thing is that a codec specification details how to write a decoder. The spec lays out what a compliant bitstream looks like, and specifies how you turn that bitstream into video or audio.
Encoders, as Ben showed, are completely different beasts. An encoder author can pick which of the tools provided by the standard he or she will use. In the extreme case, as Ben did, he can choose to use almost none of the tools. Therefore, there can be a huge difference in compression efficiency—and thus video quality—between two encoders.
You might think this is obvious, but if so you should walk around the security industry’s ISC West trade show this week. You will find all sorts of vendors claiming that their h.264 DVR is the same as their competitor’s DVR, or claiming that their h.264 IP camera is better than a MPEG-4 IP camera. Maybe so, and maybe not: Just because h.264 is a more modern and complex codec than MPEG-4 part 2, it doesn’t automatically follow that a particular h.264 encoder is better than a particular MPEG-4 encoder.
Ultimately, the only way to compare two encoders is a head-to-head bakeoff, where each encoder is set to the same data rate and fed the same content, and you view decoded video from the two at the same time.
Howdy Pierce is a managing partner of Cardinal Peak with a technical background in multimedia systems, software engineering and operating systems.
World’s Smallest h.264 Encoder
Recently I have been studying the h.264 video codec and reading the ISO spec. h.264 a much more sophisticated codec than MPEG-2, which means that a well-implemented h.264 encoder has more compression tools at its disposal than the equivalent MPEG-2 encoder. But all that sophistication comes at a price: h.264 also has a big, complicated specification with a plethora of options, many of which are not commonly used, and it takes expertise to understand which parts are important to solve a given problem.
As a bit of a parlor trick, I decided to write the simplest possible h.264 encoder. I was able to do it in about 30 lines of code—although truth in advertising compels me to admit that it doesn’t actually compress the video at all!
While I don’t want to balloon this blog post with a detailed description of h.264, a little background is in order. An h.264 stream contains the encoded video data along with various parameters needed by a decoder in order to decode the video data. To structure this data, the bitstream consists of a sequence of Network Abstraction Layer (NAL) units.
Previous MPEG specifications allowed pictures to be coded as I-frames, P-frames, or B-frames. h.264 is more complex and wonderful. It allows individual frames to be coded as multiple slices, each of which can be of type I, P, or B, or even more esoteric types. This feature can be used in creative ways to achieve different video coding goals. In our encoder we will use one slice per frame for simplicity, and we will use all I-frames.
As with previous MPEG specifications, in h.264 each slice consists of one or more 16×16 macroblocks. Each macroblock in our 4:2:0 sampling scheme contains 16×16 luma samples, and two 8×8 blocks of chroma samples. For this simple encoder, I won’t be compressing the video data at all, so the samples will be directly copied into the h.264 output.
With that background in mind, for our simplest possible encoder, there are three NALs we have to emit:
- Sequence Parameter Set (SPS): Once per stream
- Picture Parameter Set (PPS): Once per stream
- Slice Header: Once per video frame
- Slice Header information
- Macroblock Header: Once per macroblock
- Coded Macroblock Data: The actual coded video for the macroblock
Since the SPS, the PPS, and the slice header are static for this application, I was able to hand-code them and include them in my encoder as a sequence of magic bits.
Putting it all together, I came up with the following code for what I call “hello264”:
#include <stdio.h>
#include <stdlib.h>
/* SQCIF */
#define LUMA_WIDTH 128
#define LUMA_HEIGHT 96
#define CHROMA_WIDTH LUMA_WIDTH / 2
#define CHROMA_HEIGHT LUMA_HEIGHT / 2
/* YUV planar data, as written by ffmpeg */
typedef struct
{
uint8_t Y[LUMA_HEIGHT][LUMA_WIDTH];
uint8_t Cb[CHROMA_HEIGHT][CHROMA_WIDTH];
uint8_t Cr[CHROMA_HEIGHT][CHROMA_WIDTH];
} __attribute__((__packed__)) frame_t;
frame_t frame;
/* H.264 bitstreams */
const uint8_t sps[] = { 0x00, 0x00, 0x00, 0x01, 0x67, 0x42, 0x00,
0x0a, 0xf8, 0x41, 0xa2 };
const uint8_t pps[] = { 0x00, 0x00, 0x00, 0x01, 0x68, 0xce,
0x38, 0x80 };
const uint8_t slice_header[] = { 0x00, 0x00, 0x00, 0x01, 0x05, 0x88,
0x84, 0x21, 0xa0 };
const uint8_t macroblock_header[] = { 0x0d, 0x00 };
/* Write a macroblock's worth of YUV data in I_PCM mode */
void macroblock(const int i, const int j)
{
int x, y;
if (! ((i == 0) && (j == 0)))
{
fwrite(¯oblock_header, 1, sizeof(macroblock_header),
stdout);
}
for(x = i*16; x < (i+1)*16; x++)
for (y = j*16; y < (j+1)*16; y++)
fwrite(&frame.Y[x][y], 1, 1, stdout);
for (x = i*8; x < (i+1)*8; x++)
for (y = j*8; y < (j+1)*8; y++)
fwrite(&frame.Cb[x][y], 1, 1, stdout);
for (x = i*8; x < (i+1)*8; x++)
for (y = j*8; y < (j+1)*8; y++)
fwrite(&frame.Cr[x][y], 1, 1, stdout);
}
/* Write out PPS, SPS, and loop over input, writing out I slices */
int main(int argc, char **argv)
{
int i, j;
fwrite(sps, 1, sizeof(sps), stdout);
fwrite(pps, 1, sizeof(pps), stdout);
while (! feof(stdin))
{
fread(&frame, 1, sizeof(frame), stdin);
fwrite(slice_header, 1, sizeof(slice_header), stdout);
for (i = 0; i < LUMA_HEIGHT/16 ; i++)
for (j = 0; j < LUMA_WIDTH/16; j++)
macroblock(i, j);
fputc(0x80, stdout); /* slice stop bit */
}
return 0;
}
(This source code is available as a single file here.)
In main(), the encoder writes out the SPS and PPS. Then it reads YUV data from standard input, stores it in a frame buffer, and then writes out a h.264 slice header. It then loops over each macroblock in the frame and calls the macroblock() function to output a macroblock header indicating the macroblock is coded as I_PCM, and inserts the YUV data.
To use the code, you will need some uncompressed video. To generate this, I used the ffmpeg package to convert a QuickTime movie from my Kodak Zi8 video camera from h.264 to SQCIF (128×96) planar YUV format sampled at 4:2:0:
ffmpeg.exe -i angel.mov -s sqcif -pix_fmt yuv420p angel.yuv
I compile the h.264 encoder:
gcc –Wall –ansi hello264.c –o hello264
And run it:
hello264 <angel.yuv >angel.264
Finally, I use ffmpeg to copy the raw h.264 NAL units into an MP4 file:
ffmpeg.exe -f h264 -i angel.264 -vcodec copy angel.mp4
Here is the resulting output:
There you have it—a complete h.264 encoder that uses minimal CPU cycles, with output larger than its input!
The next thing to add to this encoder would be CAVLC coding of macroblocks and intra prediction. The encoder would still be lossless at this point, but there would start to be compression of data. After that, the next logical step would be quantization to allow lossy compression, and then I would add P slices. As a development methodology, I prefer to bring up a simplistic version of an application, get it running, and then add refinements iteratively.
Ben Mesander has more than 18 years of experience leading software development teams and implementing software. His strengths include Linux, C, C++, numerical methods, control systems and digital signal processing. His experience includes embedded software, scientific software and enterprise software development environments.
More on Patents
I had intended to give the indemnification issue a rest. But then the following caught my attention this morning:
One big difference between patents and other kinds of intellectual property, like copyrights and trademarks, is that patent-holders who want to sue someone for infringement don’t have to show that their patents or their products were actually copied by the defendant. In fact, the issue of copying is legally irrelevant when determining whether or not someone infringed a patent. (It is relevant to willfulness—more on that below.) The flip side of that rule is that a defendant company can have a really nice story about they did their own research, invention, and development—but it doesn’t matter one bit, legally speaking. Such “independent invention” stories are no defense.
“No one seems to know whether patent infringement defendants are in fact unscrupulous copyists or independent developers,” writes Lemley. So he and his partner went on a hunt looking for copycats in patent disputes. How much copying did they find? Not much at all.
(Joe Mullin’s whole post is excellent; thanks to Brad Feld for calling attention to it.)
Which underscores my earlier point: Patent lawsuits don’t usually arise because of unethical behavior on the part of the engineering team. And therefore offering indemnity protection against these kinds of cases is not a financial risk that we can or should bear.
I’m not primarily out to agitate for reform of the patent system, but I agree with calls for adding an independent innovation defense. Such a reform would help swing the effect of the patent system back toward its original intention, which was to encourage innovation.
Howdy Pierce is a managing partner of Cardinal Peak with a technical background in multimedia systems, software engineering and operating systems.























