Revisions to The Airbus A320 that pitched down due to solar radiation recently. Why did that happen? I thought the computers vote the corrupted one out and ignore?

added 570 characters in body

Source Link

edited 9 hours ago

mins

96.8k
32
362
543

I'm strictly answering this generalyour question about voting principles, which is general:

I'm asking why that happened in the first place since fly by wire systems are meant to be redundant with check systems that read results from multiple computers and if one computer is giving a weird result due to malfunction/corruption it is "voted out" and ignored.

The answer doesn't try inI'm providing an example of how this can fail on a A330. This example is valid for any wayaircraft but there is no reason to relate this example to the current incident. We don't know anythingnothing about the latterA320 case, except what Airbus communicates: A faulty ELAC software must be updated.

Redundancy and vote to discard a faulty computer

Consider this principle:

There is a master computer providing orders to a flight surface actuator, and a computer, designed differently but with identical capabilities, monitoring the master.
There are three of these master-monitor systems to control different surfaces.
The three masters do the same calculations, the monitoring computers are aware of the three results and can compare them.
Only reliable computers can be elected master computers by the other masters.
When one monitoring computer detects an abnormal master behavior, it informs the other masters, they discard the faulty master, declare it unreliable and promote one of the (reliable) monitoring computers to the master status.

Unanticipated circumstances

As demonstrated every day, the mechanism described above works correctly. We don't see why it wouldn't, assuming the majority is always right (sometimes it not and it creates problems, but that's not the point).

For this to work, two conditions are required: All masters must receive similar data and must use algorithms delivering similar results.

A problem arises when, under very unlikely circumstances, one master is provided different data because of data corruption somewhere in the chain. The monitoring computers detect the different result, and the related master is declared unreliable. A new master is be elected. If the faulty data sequence continues, another master is declared unreliable, and so on. The reliable computers pool is indeed quickly exhausted.

After all computers have been declared unreliable, the system fails. Of course this possibility is anticipated by the designers, the aircraft is reconfigured to a more basic "alternate law of control" focused on the most critical aspects, offering less automatic protection, e.g. the protection against excessive angle of attack is lost. This can go on until all automation is removed and the aircraft is basically manually piloted.

This scenario did happen to a Qantas A330 in 2008: Quantas flight 72. I already wrote an answer on this occurrence which shows the specific sequence of events which led to wrongly discard all valid computers:

I'm not reproducing the technical details which are all available in the linked answer. The table above shows how the system switched from the normal law of control to the alternate one after master computers were declared unreliable one after the other.

The aircraft wasn't the same, the computers weren't the same, and we don't know about the technical circumstances of the current incident, so there is no conclusion to draw on the present case from this other case.

However this is a perfect example to answer your question about how redundant computers can fail to reject the faulty computer. In both cases the initial event (not the cause) is corrupted data sent to computers. The way data were corrupted was not anticipated in the flight computer software, because it had almost 0 probability to occur.

I'm strictly answering this general question:

I'm asking why that happened in the first place since fly by wire systems are meant to be redundant with check systems that read results from multiple computers and if one computer is giving a weird result due to malfunction/corruption it is "voted out" and ignored.

The answer doesn't try in any way to relate this example to the current incident. We don't know anything about the latter, except what Airbus communicates: A faulty ELAC software must be updated.

Redundancy and vote to discard a faulty computer

Consider this principle:

There is a master computer providing orders to a flight surface actuator, and a computer, designed differently but with identical capabilities, monitoring the master.
There are three of these master-monitor systems to control different surfaces.
The three masters do the same calculations, the monitoring computers are aware of the three results and can compare them.
Only reliable computers can be elected master computers by the other masters.
When one monitoring computer detects an abnormal master behavior, it informs the other masters, they discard the faulty master, declare it unreliable and promote one of the (reliable) monitoring computers to the master status.