9
$\begingroup$

Relevant article https://www.bbc.com/news/articles/c8e9d13x2z7o

The plane pitched down "4-5 seconds" before the autopilot somehow corrected it. I'm asking why that happened in the first place since fly by wire systems are meant to be redundant with check systems that read results from multiple computers and if one computer is giving a weird result due to malfunction/corruption it is "voted out" and ignored.

$\endgroup$
2
  • 1
    $\begingroup$ Related Accident: Jetblue A320 near Tampa on Oct 30th 2025, inflight upset causes injuries $\endgroup$ Commented 20 hours ago
  • 1
    $\begingroup$ The main factual and public document so far is Airbus AOT. Previous software versions of the ELAC have no vulnerability, the problem is with the last update. The BBC article contains approximations, like "there is no direct mechanical link between the controls in the cockpit and the parts of the aircraft that actually govern flight" while in this specific case there is a mechanical linkage between the cockpit and the THS, in case the elevators can't be controlled using the ELACs or the backup computers. $\endgroup$ Commented 6 hours ago

1 Answer 1

7
$\begingroup$

I'm strictly answering your question about voting principles, which is general:

I'm asking why that happened in the first place since fly by wire systems are meant to be redundant with check systems that read results from multiple computers and if one computer is giving a weird result due to malfunction/corruption it is "voted out" and ignored.

I'm providing an example of how this can fail on a A330. This example is valid for any aircraft but there is no reason to relate this example to the current incident. We know nothing about the A320 case, except what Airbus communicates: A faulty ELAC software must be updated.


Redundancy and vote to discard a faulty computer

Consider this principle:

  • There is a master computer providing orders to a flight surface actuator, and a computer, designed differently but with identical capabilities, monitoring the master.

  • There are three of these master-monitor systems to control different surfaces.

  • The three masters do the same calculations, the monitoring computers are aware of the three results and can compare them.

  • Only reliable computers can be elected master computers by the other masters.

  • When one monitoring computer detects an abnormal master behavior, it informs the other masters, they discard the faulty master, declare it unreliable and promote one of the (reliable) monitoring computers to the master status.

Unanticipated circumstances

As demonstrated every day, the mechanism described above works correctly. We don't see why it wouldn't, assuming the majority is always right (sometimes it not and it creates problems, but that's not the point).

For this to work, two conditions are required: All masters must receive similar data and must use algorithms delivering similar results.

A problem arises when, under very unlikely circumstances, one master is provided different data because of data corruption somewhere in the chain. The monitoring computers detect the different result, and the related master is declared unreliable. A new master is be elected. If the faulty data sequence continues, another master is declared unreliable, and so on. The reliable computers pool is indeed quickly exhausted.

After all computers have been declared unreliable, the system fails. Of course this possibility is anticipated by the designers, the aircraft is reconfigured to a more basic "alternate law of control" focused on the most critical aspects, offering less automatic protection, e.g. the protection against excessive angle of attack is lost. This can go on until all automation is removed and the aircraft is basically manually piloted.

This scenario did happen to a Qantas A330 in 2008: Quantas flight 72. I already wrote an answer on this occurrence which shows the specific sequence of events which led to wrongly discard all valid computers:

Quantas flight 72 events

I'm not reproducing the technical details which are all available in the linked answer. The table above shows how the system switched from the normal law of control to the alternate one after master computers were declared unreliable one after the other.

The aircraft wasn't the same, the computers weren't the same, and we don't know about the technical circumstances of the current incident, so there is no conclusion to draw on the present case from this other case.

However this is a perfect example to answer your question about how redundant computers can fail to reject the faulty computer. In both cases the initial event (not the cause) is corrupted data sent to computers. The way data were corrupted was not anticipated in the flight computer software, because it had almost 0 probability to occur.

$\endgroup$
2
  • 1
    $\begingroup$ An extreme example of this issue happened during the space shuttle STS-124 mission, where a cracked diode on one of the sensor data buses caused it to enter a state where each computer was reading different values. Which caused each of the quad redundant computers to start producing different results in sequence. $\endgroup$ Commented 7 hours ago
  • 1
    $\begingroup$ @user1937198: This incident and Quantas 72 are actually listed as #33 and #34 in this NASA list of software failures in aerospace. $\endgroup$ Commented 6 hours ago

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.