Regression Testing, the Lazy Way

Suppose you were writing a video transcoder as a command-line tool for a customer. Further suppose you had allowed 20 different options that, in various combinations, could make the tool do hundreds of different things. After weeks of work, you are at long last done, but the customer would like a simple regression test script to be delivered with the tool. Let’s look at a simple way to achieve regression testing.

Regression Testing

Regression testing is a simple (and wonderful) idea. The idea is that once you have working software, you should easily be able to ensure it keeps working after further changes are made. This is achieved by constructing a set of tests along with their known-correct outputs, and then re-running these tests after any modifications to ensure the known-correct outputs are still obtained. Writing regression tests can occur as the last step before product delivery (as in our story above) or it can occur in parallel with development (as often practiced in Agile Dev shops).

Using bash to Script a Regression Test

Let’s look at how we might build a simple regression test script using bash and a few common Linux tools. Suppose we want change the well-known Linux utility called sort. Before we do so, we develop a regression test script:

#!/bin/bash

sort -a

In a real regression script there would be hundreds of different invocations of sort, not just one. And writing regression testing scripts is tedious, painstaking and often under-appreciated but highly valuable.

Now it turns out sort doesn’t take -a, so running the script above produces an error:

$ ./sort_test.sh
sort: invalid option -- 'a'
Try `sort --help' for more information.
$

So was it a mistake to include this in our regression testing? Absolutely not. Error messages should remain intact just as correct usage does. OK, so next let’s save the current output of our script. Note that we capture both stderr and stdout.

$ ./sort_test.sh 2&> sort_test_correct.out
$

So sort_test_correct.out will contain our known-correct output for future testing. Now suppose we make some changes to sort and recompile. In order to ensure our regression tests still pass we use diff.

$ ./sort_test.sh 2&> sort_test.out
$ diff sort_test.out sort_test_correct.out
$

If diff produces no output, the test passed. If not, it’s time to dig into the file and see why. There are two reasons the test might fail: (1) You made a change that broke something, or (2) you made a change that altered what the correct output should be. In case (1) you need to fix your bug, and in case (2) you rename your sort_test.out to sort_test_correct.out.

We could stop here and have a reasonable set of tests, but in the case of a video transcoder (or even in the case of sort) the output of the command on stdout isn’t even half the story. The command output is often both huge and written to a file. We need to ensure these output files are correct.

Testing Contents of Transcoder Output Files

There are a variety of ways to approach our goal. The obvious way would be as follows: prepare a set of test inputs, mov1.mp4, mov2.mp4, mov3.ts, and transcode them with your tool to produce mov1_correct.mp4, mov2_correct.mp4, mov3_correct.ts. Then ship all inputs and outputs with your tool.

The regression tester now does conversions and runs a diff:
#!/bin/bash

set -x

./tc -i mov1.mp4 -o mov1_out.mp4
diff mov1_out.mp4 mov1_correct.mp4
./tc -i mov2.mp4 -o mov2_out.mp4
diff mov2_out.mp4 mov2_correct.mp4
./tc -i mov3.ts -o mov3_out.ts
diff mov3_out.ts mov3_correct.ts

Notice we added set -x so bash will output the command it’s running as the script executes. This makes it a little easier for a human to read the script output. And we are using diff between transcoder invocations to ensure we have the right outputs. This is fine, but we start to get a big mess if our transcoder takes a lot of conversion options. For example suppose we want to invoke

./tc -i mov1.mp4 -o mov1_out.mp4 -x
./tc -i mov1.mp4 -o mov1_out.mp4 -y
./tc -i mov1.mp4 -o mov1_out.mp4 -z
./tc -i mov1.mp4 -o mov1_out.mp4 -x -y
./tc -i mov1.mp4 -o mov1_out.mp4 -x -z
./tc -i mov1.mp4 -o mov1_out.mp4 -y -z

And maybe 50 other similar option configurations. This means you will get a (presumably) different mov1_out.mp4 each time. And this in turn means you will have to bundle all 50 versions of the output file with your tool which gets unmanageable in a hurry. A nifty way to solve this problem uses cryptographic hashing.

Crypto Hash Functions

Cryptographic hash functions are functions like MD5 or SHA1 or (very recently) SHA3. Linux distros often supply these as functions like md5sum and sha1sum. These functions can be thought of as a high-powered checksum where the chances of two different inputs producing the same output are exceedingly small (even if you tried to find such an input pair!). They are also quite quick-to-compute, making them ideal for many applications such as fingerprinting large images, fingerprinting your SSH public key, use in various network protocols like bittorrent and rsync, etc.

So let’s use them in our setting. Instead of having 50 different output files to lug around, we will have 0. Instead we will just compute the MD5 hash and output that so diff will pick it up if the output file is incorrect.

Our script now looks like this:

#!/bin/bash

command -v md5sum > /dev/null

if ; then
echo " ********** ERROR ***********"
echo "md5sum missing; you need to install md5sum"
exit 1
fi

if ; then
echo " ********** ERROR ***********"
echo "tc must exist and be executable in the current directory"
exit 1
fi

set -x

./tc -i mov1.mp4 -o mov1_out.mp4 && md5sum mov1_out.mp4
./tc -i mov2.mp4 -o mov2_out.mp4 && md5sum mov2_out.mp4
./tc -i mov3.ts -o mov3_out.ts && md5sum mov3_out.ts

Note how we make sure the commands we need are present at the head of the script so we can give a meaningful error message instead of making the test-user figure out that something’s missing.

Patching with xxd

We now have a fundamental and simplistic regression test built on basic ideas from bash and MD5. However, there are still areas in which it will not work. One common stumbling block would be if, say, our transcoder inserted a timestamp into its output. Then every time we ran the transcoder we could get a slightly different output and this would cause a completely different MD5 sum. If the problem lies at a known offset in the output file, we can patch this part of the file to a constant string in various ways. Probably simplest is to use xxd. For example to overwrite location 0x55 with 5 zero bytes:

$ echo "0000055: 0000 0000 00" | xxd -r - mov1_out.mp4

See the xxd man page for more examples. Of course if your timestamp is not in a predictable place, you may need more sophisticated tools. For example, some transcoders will put a timestamp in the MP4 box called moov.trak.mdia.hdlr.name. In this case, you might have to write your own patch program (using MP4v2, say) or use another command-line utility that allows you to set that box value, or consider adding an undocumented option to your transcoder that suppresses any fields that will vary from run-to-run.

But don’t give up on xxd. It’s more powerful than it looks. In fact, if you want to try a wider variety of input files you might be able to patch one to produce many using xxd. Your imagination is the only limit here.

John Black is an Associate Professor in the Department of Computer Science at the University of Colorado and an occasional member of the Cardinal Peak team.