Posts Tagged ‘Image Processing’

Ben Mesander Partner

Creating the Orton Effect in Gimp

May 20th, 2010 by Ben Mesander

Recently I decided to learn how to write scripts in the Gimp image editing program to automate certain tasks. The first task I wanted to automate was the Orton effect. This is an effect invented by Michael Orton in the 1990’s, which consists of taking two copies of an image, one blurred, and one sharp, and mixing them to produce an image with a dreamy quality. It is especially well suited to landscape and flower photography.

The Orton effect was originally achieved by taking two photos: a well-focused image that was overexposed by two stops, and an out-of-focus image of the same scene that was overexposed by one stop. These were then printed as slides and sandwiched together to produce the final image.

With digital photography, one way to achieve this effect is to shoot a single raw image of a scene. The raw image can be developed to two JPEGs, one at +1 EV (Exposure Value), and the other at +2. My script blurs the +1 EV image with a two dimensional Gaussian filter with a standard deviation of 40 pixels, loads the second +2 EV image, sharpens it with an unsharp mask, and then overlays the two images. There are a variety of ways the images can be overlaid, but I prefer to multiply them, which enhances the color saturation in light areas. This is done by the Gimp by calculating (blur layer × sharp layer) / 255, which results in the image darkening, and an increase in color saturation.

Original

Orton

My Gimp script to do this is available on the Gimp plugin registry.

The soft focus of the colors and the sharpness of the image got me thinking: Is the Orton effect really equivalent to heavily subsampling the chroma channels of the image, and sharpening the luma channel? JPEG and MPEG compression both make use of the fact that the human eye is not as sensitive to chroma (color) as it is to brightness (luma). Typically, both still and video compression uses 4:2:0 chroma subsampling to reduce the number of bits used to represent color information in compressed images without a perceptible quality difference to the human visual system.

I decided to test my theory. It turns out the Gimp has the ability to decompose an image into its YCbCr luma and chroma components used in the JPEG and MPEG compression process.

Original

Y

Cb

Cr

Once I had the image split into its separate components, the Gimp allowed me to apply my Gaussian filter to just the Cb and Cr components, and then regenerate a new color image from the components.

A Gaussian filter applied to just the chroma planes

The same picture using the Orton effect plugin above

Unfortunately, as you can see, the image is nothing like the image that underwent Orton processing—my intuition was wrong. However, it is interesting to see just how much one can low-pass filter an image without a huge impact on the image. I increased the standard deviation of my Gaussian filter from 40 pixels to 100 with the following result—the image is still recognizable and doesn’t look too bad, although the color bleeds outside the lines. It’s interesting to note that the resulting JPEG is also smaller because the low-pass filtered chroma information is easier to compress.

A 100-pixel Gaussian filter applied to the chroma planes

Additionally, it is interesting to see what happens if we decompose our squid into RGB components instead of YCbCr and filter two of them with a 100-point deviation Gaussian filter.

A 100-pixel Gaussian filter applied to two of the three RGB planes

Yuck. We can clearly see the advantage of chroma subsampling here over RGB subsampling.

Mike Perkins Managing Partner

The Basics of 3D Image Acquisition

April 26th, 2010 by Mike Perkins

One of our clients is heavily involved in 3D video and has been for several years. However, several are just now starting to think about it because of the uptick of interest in the consumer electronics world. Enough questions have been posed to us recently that it seemed worthwhile to me to pull together a few basic facts regarding 3D stereopair imaging and stereo disparity.

First, we need a simple model of a lens. Consider the diagram below:

In this picture, the long horizontal line that passes through the center of the lens is called the lens axis. The lens has the property that rays that pass through the center of the lens are undeviated. Therefore, the ray from the top of the tree, at a distance l to the left of the lens, passes straight through the center of the lens. (The tree has a height of h.) The lens also has the property that rays that arrive perpendicular to the lens are refracted to pass through the focal point of the lens. The focal point lies on the lens axis and is a distance f from the center of the lens. The intersection of these two rays shows where the image of the tree will be formed. You can see that the image of the tree is upside down, and has a new height h’. The image is formed a distance d to the right of the focal point.

By using similar triangles we see first that

Using a different pair of similar triangles we also see that

Solving the first equation above for h’, substituting the result into the second equation and simplifying, we derive the following relationship:

This is the fundamental equation of a simple lens. It shows that as the object gets further and further from the lens, i.e. as l increases, the distance of the image of the object from the focal plane decreases, i.e. d gets smaller. We can assume that the camera’s image sensor is located at a distance f from the lens, is perpendicular to the lens axis, and that all objects more than a certain distance away from the lens will be in focus. In other words, the image of all sufficiently distant objects will appear on the focal plane where the image sensor is located.

In the case of 3D video, two cameras are used to acquire a sequence of stereopair images, one from the left camera and one from the right. Different stereo geometries are possible, but the most common one is to place the two cameras horizontally apart from each other by a distance i, and to keep their focal planes coplanar. The diagram below illustrates this configuration:

The horizontal line at the bottom is the focal plane; it is clear from the diagram that the focal planes are coplanar. The lenses are a distance f from the focal plane and are separated by a distance of i from each other. We assume that a small object (or a point on a larger object) is located a distance l from the lens plane and a distance m to the right of the axis of the right lens. We want to know where the image of that object appears in the left and the right camera. In particular, we want to know if we overlaid the left image on top of the right image, how far apart would the images appear? Mathematically, we want to know the disparity, which we define to be

where s1 and s2 are the distances from the image point to the intersection of the lens axis with the focal plane for the left and the right cameras respectively. Note that we are assuming that the object being imaged is far enough away that its image forms on the focal plane.

Using our favorite trick of similar triangles we have the following two equations:

and

Solving the first equation for s1, the second equation for s2, taking the difference and simplifying yields

Although this expression was derived for an object to the right of the axis of the right camera, it is easy to show in a similar manner that it is also true for an object between the axes of the two cameras as well as for an object to the left of the axis of the left camera.

So what does this equation tell us? First, it says that for this particular camera geometry, the disparity is only a function of the separation between the two cameras, i, and the distance of the object from the lens plane, l. Second the equation tells us that the disparity increases as we increase the separation between the cameras. Finally, it tells us that the disparity decreases as the object gets further away from the cameras, approaching zero for objects an infinite distance away. (You can see this when you watch 3D content without wearing the special 3D glasses: The “distant” objects can be seen by the naked eye, whereas the near objects appear blurry to the naked eye, because the value of ρ is greater.)

It should be clear from this equation that if a stereopair is available, and corresponding points can be found in the left and right pictures, that the disparity between those points can be measured, and the distance to the point can be computed.

Mike Perkins, Ph.D., is a managing partner of Cardinal Peak and an expert in algorithm development for video and signal processing applications.

Ben Mesander Partner

Detecting Well-Focused Images

November 10th, 2009 by Ben Mesander

Recently, one of my colleagues mentioned to me that he takes large numbers of pictures and wanted to write a program to automatically determine which was in the best focus, out of a group of pictures that were taken of the same scene. He mentioned that he expected the algorithm to be computationally intensive.

My initial reaction was “it can’t be that hard,” since my digital camera does it, and it has a small, low power processor in it. So I checked out the Wikipedia article on autofocus. Two general algorithms are described, active autofocus and passive autofocus. Since I’m examining images that have already been taken, active autofocus is not an option. Within the category of passive autofocus algorithms, Wikipedia describes two methods, phase detection and contrast measurement. Again, since my images have already been captured, I will use contrast measurement. Since I don’t have a DSLR, this is the method my cameras actually use to autofocus, so that’s kind of cool, too.

So what does an autofocus system using contrast measurement consist of? I imagine a lens system, a sensor, and a processor that reads the sensor which can then adjust either the positions of the optics in the lens system or the position of the sensor relative to the lens system. When the user wants to take a picture, the processor will read the sensor, make a contrast measurement, and repeat this process, sweeping the variable position part of the system through its range. Then it will select the position that gave the best contrast measurement. This matches with my experience of the view through my camera’s viewfinder, especially in low-light conditions: I see the autofocus sweep through a range until the image is sharp, and then the camera takes the picture.

Thinking about this, I imagine a picture with many black and white stripes—if the stripes are in perfect focus, there will be many large transitions in brightness (high contrast), and if the stripes are blurred, then we would expect the transitions in brightness to be less dramatic. So I made some test 100 × 100 pixel test images with horizontal and vertical lines in the GIMP to test my algorithm on.

horiz.jpg vert.jpg
horiz.jpg vert.jpg
horizblur.jpg vertblur.jpg
horizblur.jpg vertblur.jpg
 

The blurred images are the result of running Filters->Blur->Blur in the GIMP over the original images.

So how to measure the contrast in a picture in a way that works quickly on a tiny low-power processor? The simplest method that came to mind was to sum the differences in pixel values across the image. This is a nice O(N) algorithm. Other methods such as computing FFTs and looking at the high order bins are more computationally intensive, and they don’t seem likely to be used in a consumer grade camera.

Ideally, we’d want to look at the difference in luminance in each pixel. Conveniently, during the JPEG compression process, you normally need to perform the linear algebra to convert the R, G, and B planes into Y (or luminance), Cr and Cb (chrominance) planes. So if we inserted the autofocus algorithm after this conversion in our low-powered camera, we could run it on just the luma plane.  In addition to being less computationally-intensive, that might be more accurate, because the human eye has different sensitivities to different wavelengths of light, and using the luminance values would come closer to what we really want.

However, my algorithm operates in the RGB space, because that’s what my tools support easily. I decided to use GNU Octave, a free clone of MATLAB, to develop my algorithm. While I am mostly a C/C++/Java programmer, I sometimes use Octave to prototype algorithms because it is so concise and doesn’t require recompilation. In this case, I want to concentrate on contrast detection, and not the mechanics of reading and decoding JPEG files. We sometimes get algorithms from customers expressed as MATLAB or Octave code, and translate them to other languages for efficiency’s sake—for instance, we recently took some MATLAB code and implemented it in CUDA so that it would run at 24 frames per second. So here’s my contrast measurement program in Octave:

img=imread(“test.jpg”);
c=0;
for i=img
for j=1:rows(i)-1
c+=uint64(abs(i(j)-i(j+1)));
endfor
endfor
c

As you can see, Octave let me express this algorithm very concisely. First, I read in the image into the variable img. For an M × N image, this results in a 3 dimensional array of M × N × 3 RGB pixel values in the range 0-255.

I then iterate over the image, and each time i contains a vector of pixel values. Octave iteration and subscripting is a bit mysterious to me, and I always have to read the manual, but this is what it does in this case. It turns out that each of these vectors is a column vector of the image.

Then, for each of these column vectors, I accumulate the sum of the absolute value of the first differences in pixel values.

At the very end, I print out the value accumulated over the whole image. I should be able to use this program to evaluate the focus of two images, where the one with the larger result at the end should be more in focus (because it has higher contrast).

My first attempt at this program left out the uint64(), and I got results of 255 for the first images I tried. It turns out that Octave is doing saturation arithmetic, and so I need to cast the values to a wider integer. My camera takes full resolution images at 3648 × 2736, and so if we look at the worst case of a black and white checkerboard, we would accumulate a value of 3648 × 2736 × 3 × 255 = 0×1C71B1C00—a bit too large to hold in 32 bits. If we computed a combined brightness value for each pixel, we could use a 32 bit accumulator instead.

Running it over my test images above, I got the following results:

Image Contrast Measurement
horiz.jpg 1248200
horizblur.jpg 407800
vert.jpg 0
vertblur.jpg 0
 

The horizontal stripe image works as one would expect, the in-focus image generating a higher metric than the blurred one. But what’s up with the vertical stripes? Well, it reminded me that my camera’s manual has an interesting section called “subjects that are difficult to focus on.” One of the examples given is “Subject without vertical lines.” I never understood this—until now. My camera must be looking at contrast across rows, so it has the same behavior of my Octave code, albeit in a different direction. This is further confirmation that my digital camera actually uses something very similar to this algorithm to autofocus. Let’s try the program on a real image, and a blurred copy:

tree.jpg treeblur.jpg
tree.jpg treeblur.jpg
 

Here are the results:

Image Contrast Measurement
tree.jpg 880167
treeblur.jpg 338791
 

As we expect, the unblurred image gives a higher contrast measurement, so the algorithm is detecting that it is more in focus.

Finally, we should note that in actual digital cameras, depth of field is variable, and thus we might have to focus on just a particular portion of the frame, the rest of the frame might be intentionally be out of focus, in order to draw the viewer’s attention to the subject.

finch

This would involve just running the algorithm over a subset of the image, such as the center of the frame. Many cameras also offer the option of automatically detecting human faces and autofocusing there. Wikipedia has an article on algorithms for detecting human faces in an image as well—but maybe that’s a topic for the future.

Ben Mesander has more than 18 years of experience leading software development teams and implementing software. His strengths include Linux, C, C++, numerical methods, control systems and digital signal processing. His experience includes embedded software, scientific software and enterprise software development environments.

 
 

Archives:

 

About Cardinal Peak

Contract engineering expertise to quickly, reliably bring embedded products to market.