The Dreaded Do-over

It can be hard to discard work that came at a high cost.  Sometimes fixing poorly written software is the costliest path.  Sometimes the right move is just to start over. 

 

The System

A major aerospace OEM was doing maintenance updates to an aircraft navigation system that they had developed.  In addition to several software enhancements, the system’s Ethernet IC had been discontinued, and a new one had to be integrated into the system for future orders.  However, the software package was being developed for use on both pre-existing and new systems. This meant that any software updates now had to be compatible with either Ethernet device.

While the system was DAL A (the highest level of safety-criticality under DO-178C), the Ethernet subsystem was only permitted for use in the maintenance shop or when the plane was on the ground. This meant that while the driver could be developed to operate as DAL D, there was DAL A platform code required to configure and enable it.

Driver development was outsourced to a vendor group considered expert in the target RTOS. They were the specialists. It seemed like the right call.

The Challenge

After more than a year of development and an investment of $250,000, the vendor delivered a driver that would run briefly and then fail — unpredictably. The vendor's lead developer subsequently left the organization, leaving them unable to debug or otherwise support the driver. The client team spent months attempting to debug the delivered code.  The issues seemed to be endless.  Each fix caused a new failure to manifest.  SafeCode's consultant, engaged on the refresh program, recommended a clean rewrite on multiple occasions. Each time, management declined. They had authorized the original investment, and they weren't ready to walk away from it. So the debugging continued — and the driver kept failing.

The Approach

About ten months in, with program scrutiny escalating into senior leadership reviews, the consultant raised the rewrite recommendation again — this time in a forum where a program director who knew him from prior work was present. The director asked why the rewrite had never been proposed before. The consultant produced documentation showing it had been raised repeatedly. Authorization was granted.

He started with a Linux driver provided by the chip manufacturer, working from chip documentation and RTOS specifications. Where the previous implementation had simply removed memory management and thread synchronization functions because they used functionality that wasn't natively supported by the RTOS, the consultant wrote equivalent substitute functions that worked within the RTOS constraints. He created OS-level mechanisms such as memory pools, atomic locks, spinlocks, and tasklets. The problems the prior team had treated as obstacles to be worked around; this consultant solved them head-on.

The Outcome

The rewritten driver was proven on the system simulation platform, and the consultant did the work to integrate it with the DAL-A platform.  It was only then that it could be tested in real system context.  That’s when the hardware team was able to debug their issues.  The new driver allowed the remaining work to be completed, and the updated software was accepted for DO-178C certification on both the new hardware and the legacy hardware. Total implementation time for the new driver: eleven weeks — one week beyond the consultant's original commitment, due to an undocumented gap in the chip datasheet. The $250,000 investment and the months of unsuccessful debugging that followed were resolved by a targeted engineering effort that might have been authorized far earlier. This was the sixth time SafeCode's consultant had been engaged by the same OEM over the course of a long working relationship.