Real-Time Ethernet

A recent project for a customer was to implement a transport for a real-time signal processing application over Gigabit Ethernet (GbE). The project was especially interesting because our customer’s requirement was for extremely low latency: The transport needed to be able to send replies to incoming packets with a custom EtherType within 74.4 μsec. This exceeded the capabilities of existing Ethernet protocols.

There are several approaches possible for real-time networking under Linux, including RTAI with the RTnet hard real-time network stack.

However, the customer was already using Red Hat Linux servers to perform the signal processing, so we decided to try the less intrusive Red Hat MRG kernel to see if it could be modified to incorporate the desired networking protocol. MRG incorporates many kernel patches to improve real-time performance and lower latency.

Since the timing requirements were so tight, I decided that only an in-kernel implementation was likely to succeed. If the code had to switch between kernel mode and user mode, this would require additional overhead, interactions with the Linux scheduler and moving the data between the kernel and user address spaces.

I started development by building two server-class Linux machines running Red Hat Linux with the MRG kernel, connected back-to-back with a GbE cable. Then I wrote a user-mode test program to send an Ethernet packet of the appropriate type via a raw socket. I now had a way to send a packet from one machine to another. I set up Wireshark to monitor the traffic between the two machines

Initially, I attempted to modify the device-independent portion of the Ethernet stack within the Linux kernel to detect the incoming packet and respond to it. Unfortunately, I had to discard this approach when I found it could not meet the customer’s latency requirement.

So I experimented with modifying the Ethernet drivers for various GbE PCIe Ethernet cards in the MRG kernel. The basic approach I took was to invoke the outgoing packet interrupt service routine from the incoming packet interrupt service routine. The driver has to be in just the right state for this to work correctly, so it required some study of the driver source code. Fortunately, most Linux Ethernet drivers share some common framework, so it did not require a complete re-engineering effort to try different cards, but there are definitely some differences. I did a bit of research into the 8 Best Linux download managers and then got started.

I initially started with a Realtek card and driver and had some success. I installed the kernel driver on both machines so that once I sent a single packet from one machine to another via my user-space program, the two machines would continuously exchange Ethernet frames with each other at the maximum speed achievable.

This made measurement challenging because my modifications to the driver were at a low enough level that the outgoing packets did not go through the device-independent Ethernet stack in Linux, and thus were not captured by Wireshark; and additionally, Wireshark was too slow to keep up and often dropped packets. I was, however, able to use tcpdump to capture packet headers to a file and then look at them later with Wireshark to make measurements. The latency could be estimated by looking at the inter-packet timing from packets sent by the other machine and dividing by two. Using tcpdump mostly got rid of most of the packet losses, but it would still sometimes drop some. I resorted to putting a serial number in each packet, which I could then examine to determine if I had lost a packet or not.

I also tried inserting a GbE switch between the two machines and monitoring packets from a third machine. However, I found the switch introduced 20 μsec of additional latency on average and greatly increased the deviation of the measurements. And still, sometimes packets were lost. So I discarded this approach and used tcpdump running on one of the machines being tested.

I was unable to reliably meet the 74.4 μsec spec under load with the Realtek card, so the next card I tried was an Intel GbE card. This card used the Intel e1000e driver, which is supported directly by Intel rather than the reverse-engineered driver the Realtek card used. Unfortunately, I was unable to find a card supported by the version of the e1000e driver contained in the MRG kernel, so I downloaded the latest Intel driver. This driver was significantly more complex than the MRG version of the driver, and I made some measurements.

Finally, I modified the Broadcom bnx2 driver in the MRG kernel. This driver is directly supported by Broadcom. I was able to use the MRG kernel driver with my card, and so I made appropriate modifications.

I found this card, like the Intel card, was also able to keep up with the desired data rate. The Broadcom card ended up having slightly lower latency measurements than the Intel card. I suspect this is due to the MRG kernel driver for bnx2 having less locking overhead and shorter code paths than the stock Intel e1000e driver.

In the end, with the Broadcom bnx2 driver with my modifications, we achieved an average latency measurement of 58 μsec, which was comfortably under the 74.4 μsec requirement. Additionally, we tested continuously over a several-day period, monitoring for missing packet serial numbers, and none were detected.

The customer’s initial protocol required sending several smaller packets in either direction, but ultimately an additional speedup could have been realized by using jumbo frames. RTAI/RTnet could also have been used, but I do not think it would have been significantly faster, although it may have reduced the variability of the latency.

Ben Mesander has more than 18 years of experience leading software development teams and implementing software. His strengths include Linux, C, C++, numerical methods, control systems and digital signal processing. His experience includes embedded software, scientific software and enterprise software development environments.