I read an EE Times article entitled "Software Won’t Fix Boeing’s 'Faulty' Airframe", which discusses the recent situation with the Boeing 737 Max. The article made several very good points that I thought warranted comment, but I found that I had more to say than would fit in a standard post. I think the lessons are valid for a broader audience than the aviation industry, so for those who don't have an aviation background, I will try to avoid and/or clarify the jargon. I am working with information limited to public reports and articles from those in the know, so it is possible that my information or interpretations are incorrect; and outside of that -- this is all just my opinion. That's my disclaimer... Moving on. [the original version of this article was published on LinkedIn]
I know only what I have read from the reports about the 737 Max. However, I agree with Travis about the engineering principles. It is not that hard to deduce that putting an inherently unstable airframe design in flight, inherently increases risk. By unstable, I mean an airframe that requires reactive effort to maintain level flight, either under power or in glide. The engine position on the 737 Max caused the pitch of plane to be unstable. The MCAS system was introduced to mitigate some of that risk. A software-controlled system can certainly react much quicker than a human pilot, so at first flush this seems reasonable. The problem with this idea is two-fold -- first, software-based systems are always complex relative to nearly any other type of system, and software complexity introduces risk; second, it sets up a chain of additional dependencies with sensors and electronics that also introduces risk. So, there are risks in the systems that are supposed to reduce the risk -- meaning that the risks introduced by the unstable configuration can never be fully eliminated. Because of this, I think it has to be concluded that, by design, the 737 Max was less safe than other aircraft in its class*, or at least less safe than it could have been (Strike 1).
For those who haven't followed the story, the preliminary reports indicate that the most likely cause of the crashes was how the software-based MCAS system reacted to faulty input from an angle-of-attack (AOA) sensor. Angle-of-attack is aviation-speak for the relative angle of the wind to the wing; generally we envision this as the pitch/slope of the plane, but it isn't always. An AOA that is too high for the airspeed results in a stall -- which means that the wings lose lift, and may as well be anchors.
It seems that the MCAS was designed to react by quickly adjusting the elevator in response to input from the AOA sensor. Note that I said "the AOA" sensor. This was a non-redundant input, which is a big no-no in a system that is this critical (Strike 2). When the MCAS applied the elevator corrections, the faulty sensor did not respond as expected (because it was faulty), so the MCAS applied the correction again..and again.. (fighting against the human pilots) until the plane was flying towards the ground. So the software-based MCAS had not apparently done enough to anticipate the possibility that a sensor input might be incorrect, and this points to insufficient requirements & design (Strike 3).
Boeing is a great company with a long history of safe airplanes. They are industry innovators, and I am convinced that they have done more to advance safe commercial aviation than any other company. So -- on all of the above, they should have known better. Somebody (actually -- everybody) in the engineering organization should have known better. Somebody in their management organization should have known better. Somebody (again -- everybody) among their FAA regulators should have known better. This was a multi-point failure for safety management. But wait -- it gets better.
We have one more issue that should never have happened. After the FAA had reviewed and approved the parameters of the MCAS, Boeing made changes, increasing the amount of elevator correction that could be applied by about 3x, and deployed the modified system purportedly without notifying the FAA (Strike 4?). To me this suggests either a ridiculous amount of organizational hubris ("because we're the real experts" -- that really happens); and/or a complete breakdown in their change management & approvals processes.
I have a concern here that maybe this was just the first time that Boeing, the FAA, and the industry "got caught". I don't think that big multi-faceted failures like this just happen. I suspect that they are typically the result of little short-cuts taken here and there... things worked out, so they get used again. Then over time, they compound until things don't work out. Did these short-cuts start with this Aircraft, or was it just the first model to have enough, and bad-enough issues? It seems to me that there should be an audit of previous designs. How far back? Maybe until they find two sequential models where no risk-inducing shortcuts were taken. I've developed software for aerospace systems in the past, and I've worked on the quality assurance side as well, but I don't consider myself an industry expert -- just a software engineer and a concerned citizen who will feel a little less secure whenever I climb aboard a recent Boeing model.
I have much more to say about the "cultural laziness" described in the article, as well as other aspects of culture in regulated engineering environments, but I will save that for another day.
Here is a link to the EE Times article that inspired this diatribe: Software Won’t Fix Boeing’s ‘Faulty’ Airframe
* - "in its class" because there are other aircraft where unstable airframes are stabilized by software systems in order to enhance safety. These are high-performance military aircraft, where the instability of the airframe was created for the purpose of achieving enhanced maneuverability, which in combat flight, reduces pilot risk greatly. In straight-and-level flight, they are still more risky, but their design-purpose is not straight and level flight.
#Boeing #737max #safetyculture #aviationsafety #regulatedenvironments #safetycritical #highassurance