Yes, You Do Need to Document Your Open Source Code

I’ve recently had the experience of working with two pieces of open source code. Both implement a standard wireless communication stack in Linux. Names are omitted to protect the guilty.

In one case, the code has been around since 2001 and has been part of the Linux kernel since version 2.4.6. It has now been through five major revisions and has grown into quite a complex piece of software. This was inevitable because the specification for the protocol that it implements has also grown and matured in that period. What was not inevitable, however, is the difficulty that application programmers face in trying to use the stack. That’s because, in 16 years, its developers have never seen fit to write any tutorial documentation.

To be fair, the open-source code for the stack does include a “doc” directory that contains some text files. These files provide reference manual-style information on some very specific aspects of the stack. However, nowhere is there a document that provides a big picture “How to get started” overview of the code for first-time users.

This problem is compounded by the fact that in the fourth major release, the stack developers chose to use a particular interprocess communication (IPC) mechanism for applications to interface with the stack. This IPC mechanism is also complex and not well documented. In fact, the webpage for the library that provides its interface explicitly warns that using it is painful. Agreed. This IPC code has a reference manual, but not much in the way of high-level documentation.

The other stack mentioned above is worse, although it has the excuse of being new to the Linux environment, having been ported from another OS. But its documentation deficiency seems more egregious because Cardinal Peak’s customer is paying for hardware from its vendor; you’d think the vendor would put some effort into making its software usable. This stack’s documentation consists entirely of two Doxygen (that is, machine-generated) files describing two particular interfaces. Doxygen is a tool that searches through source code for specially marked comments, then extracts these comments and formats them into something that looks like a reference manual. Of course, the resulting manual is only as correct and complete as the code comments it contains. If the code author chose to describe the function frobnicate() solely with a comment saying “This function frobnicates,” that’s all you’ll get in the document.

Another problem posed by lack of documentation is that application programmers don’t know whether to trust that an interface won’t change. Normally, documenting a software interface implies a commitment that the interface won’t change, even if the code underneath it does. If the interface is not documented at all, what should the programmer using the interface assume about its stability? In the case of the first stack, I hope its users assumed very little because the IPC transplant in its fourth release certainly broke every application out there.

Apparently, open-source authors assume that their customers are other programmers who can read the code to learn how to use to it. Sure, that works, but anyone who has ever been put in that situation can tell you that it is a painful and inefficient process. I have spent many days doing web searches and reverse engineering each of these protocol stacks to learn how to use them. Multiply that by all the other application programmers who have had the same experience, and you certainly have a productivity loss measured in person-years, maybe person-decades. An investment of one or two person-weeks by the open-source authors in writing good tutorial documents could have obviated all this inefficiency.