The Implementation can be Right, and Still be Wrong
Sometimes the key to a solution lies in discovering the questions that have not yet been asked. A large multinational corporation developing a crucial piece of a new DoD system was stymied. Even with their huge team of experts, they had been unable to resolve a technical issue. They had the skills and technical acumen, but needed a fresh perspective.
The System
A major aerospace and defense OEM was developing a prototype CREW (Counter RC-IED Electronic Warfare) system, designed to protect soldiers in the field by detecting and jamming signals intended to detonate improvised explosive devices. The system relied on encrypted communications between two devices: one built by the prime vendor, who also developed the communications protocol, and the other built by this OEM team.
The Challenge
The engineering team had been stuck for weeks, and the program hung in the balance. The team had developed their communications module exactly to the protocol specification. In the lab, things seemed to work, but when the two units tried to talk to each other, the encryption failed — consistently, without explanation. Reviewing the data stream was of no use. An encrypted message has the appearance of random data, and a single bit in the wrong place in the source changes the entire pattern. They had reviewed the source code repeatedly against the specification and could find no discrepancies. The team was frustrated and not sure what came next.
The stakes were straightforward and severe: the OEM could not complete the evaluation phase or bid for the follow-on program phase unless this issue was resolved. Conversations with the prime vendor had been fruitless. They insisted that the protocol specification had been tested and verified as correct. DoD classification restrictions meant neither vendor could share implementation details with the other. That’s when a colleague who had previously worked with SafeCode’s consultant reached out personally and brought him in on a six-week fixed-scope engagement. The message was simple: this is a show-stopper. Figure it out.
The Approach
The consultant examined the details of the specification carefully, then he reviewed the source code for compliance. Like everyone who had gone before him, he found nothing wrong. He accepted that there had to be a deeper issue. He considered that there might be unaccounted-for behaviors in the hardware. With oscilloscopes and spectrum analyzers, he began investigating that path. In addition to that, he opened up a different approach: he began having regular, friendly phone calls with the developer on the vendor's side. Not interrogations — conversations. He was curious, methodical, and patient.
In the weeks that followed, the hardware investigation yielded nothing surprising. Over several calls, the consultant continued asking questions of his counterpart at the vendor organization, most of which had a predictable response: “You’ll have to look at the spec. We can’t share that information.” Then one question changed everything: how did the vendor's team know that their own implementation was correct? They had a standalone test program, the developer explained — built purely to test their device against the spec. Next question: Since it was an implementation of the specification and not of any proprietary or classified design, could they share it? He asked his management and his security people. They evaluated the request and approved it the same day, and he sent us the test program. Within a day of receiving this test program, the consultant had found the issue.
The answer: The specification had never stated whether message bit-padding should use ones or zeros. Both teams had made an assumption — opposite assumptions, as it turned out — and each assumption was so consistent with their own reading of the document that neither team had ever noticed the ambiguity. Neither implementation was wrong in isolation. Both matched every detail in the spec. The integration failure was caused by a tiny detail that the spec didn’t contain. The find took weeks. The key question took seconds. The fix took minutes.
The Outcome
The defect was resolved during week four of the six-week engagement, and with his work complete, our consultant accepted a challenge to bring another area of the program back on track. The OEM completed the phase successfully. What had stopped an experienced engineering team for weeks wasn't a hard technical problem — it was an unasked question and a gap in a document that everyone had already read too many times to see clearly. Sometimes, the most valuable thing an outside perspective brings is the ability to pose questions that no one thought to ask.
A Corollary Lesson
Sometimes, requirements deficiencies can go unnoticed, no matter how many eyes have been on them. Requirements validation using formal methods might have avoided this situation completely.