Skip to main content
added 570 characters in body
Source Link
mins
  • 96.8k
  • 32
  • 362
  • 543

I'm strictly answering this generalyour question about voting principles, which is general:

I'm asking why that happened in the first place since fly by wire systems are meant to be redundant with check systems that read results from multiple computers and if one computer is giving a weird result due to malfunction/corruption it is "voted out" and ignored.

The answer doesn't try inI'm providing an example of how this can fail on a A330. This example is valid for any wayaircraft but there is no reason to relate this example to the current incident. We don't know anythingnothing about the latterA320 case, except what Airbus communicates: A faulty ELAC software must be updated.


Redundancy and vote to discard a faulty computer

Consider this principle:

  • There is a master computer providing orders to a flight surface actuator, and a computer, designed differently but with identical capabilities, monitoring the master.

  • There are three of these master-monitor systems to control different surfaces.

  • The three masters do the same calculations, the monitoring computers are aware of the three results and can compare them.

  • Only reliable computers can be elected master computers by the other masters.

  • When one monitoring computer detects an abnormal master behavior, it informs the other masters, they discard the faulty master, declare it unreliable and promote one of the (reliable) monitoring computers to the master status.

Unanticipated circumstances

As demonstrated every day, the mechanism described above works correctly. We don't see why it wouldn't, assuming the majority is always right (sometimes it not and it creates problems, but that's not the point).

For this to work, two conditions are required: All masters must receive similar data and must use algorithms delivering similar results.

A problem arises when, under very unlikely circumstances, one master is provided different data because of data corruption somewhere in the chain. The monitoring computers detect the different result, and the related master is declared unreliable. A new master is be elected. If the faulty data sequence continues, another master is declared unreliable, and so on. The reliable computers pool is indeed quickly exhausted.

After all computers have been declared unreliable, the system fails. Of course this possibility is anticipated by the designers, the aircraft is reconfigured to a more basic "alternate law of control" focused on the most critical aspects, offering less automatic protection, e.g. the protection against excessive angle of attack is lost. This can go on until all automation is removed and the aircraft is basically manually piloted.

This scenario did happen to a Qantas A330 in 2008: Quantas flight 72. I already wrote an answer on this occurrence which shows the specific sequence of events which led to wrongly discard all valid computers:

Quantas flight 72 events

I'm not reproducing the technical details which are all available in the linked answer. The table above shows how the system switched from the normal law of control to the alternate one after master computers were declared unreliable one after the other.

The aircraft wasn't the same, the computers weren't the same, and we don't know about the technical circumstances of the current incident, so there is no conclusion to draw on the present case from this other case.

However this is a perfect example to answer your question about how redundant computers can fail to reject the faulty computer. In both cases the initial event (not the cause) is corrupted data sent to computers. The way data were corrupted was not anticipated in the flight computer software, because it had almost 0 probability to occur.

I'm strictly answering this general question:

I'm asking why that happened in the first place since fly by wire systems are meant to be redundant with check systems that read results from multiple computers and if one computer is giving a weird result due to malfunction/corruption it is "voted out" and ignored.

The answer doesn't try in any way to relate this example to the current incident. We don't know anything about the latter, except what Airbus communicates: A faulty ELAC software must be updated.


Redundancy and vote to discard a faulty computer

Consider this principle:

  • There is a master computer providing orders to a flight surface actuator, and a computer, designed differently but with identical capabilities, monitoring the master.

  • There are three of these master-monitor systems to control different surfaces.

  • The three masters do the same calculations, the monitoring computers are aware of the three results and can compare them.

  • Only reliable computers can be elected master computers by the other masters.

  • When one monitoring computer detects an abnormal master behavior, it informs the other masters, they discard the faulty master, declare it unreliable and promote one of the (reliable) monitoring computers to the master status.

Unanticipated circumstances

As demonstrated every day, the mechanism described above works correctly. We don't see why it wouldn't, assuming the majority is always right (sometimes it not and it creates problems, but that's not the point).

For this to work, two conditions are required: All masters must receive similar data and must use algorithms delivering similar results.

A problem arises when, under very unlikely circumstances, one master is provided different data because of data corruption somewhere in the chain. The monitoring computers detect the different result, and the related master is declared unreliable. A new master is be elected. If the faulty data sequence continues, another master is declared unreliable, and so on. The reliable computers pool is indeed quickly exhausted.

After all computers have been declared unreliable, the system fails. Of course this possibility is anticipated by the designers, the aircraft is reconfigured to a more basic "alternate law of control" focused on the most critical aspects, offering less automatic protection, e.g. the protection against excessive angle of attack is lost. This can go on until all automation is removed and the aircraft is basically manually piloted.

This scenario did happen to a Qantas A330 in 2008: Quantas flight 72. I already wrote an answer on this occurrence which shows the specific sequence of events which led to wrongly discard all valid computers:

Quantas flight 72 events

I'm not reproducing the technical details which are all available in the linked answer. The table above shows how the system switched from the normal law of control to the alternate one after master computers were declared unreliable one after the other.

The aircraft wasn't the same, the computers weren't the same, and we don't know about the technical circumstances of the current incident, so there is no conclusion to draw on the present case from this other case.

However this is a perfect example to answer your question about how redundant computers can fail to reject the faulty computer. In both cases the initial event (not the cause) is corrupted data sent to computers. The way data were corrupted was not anticipated in the flight computer software, because it had almost 0 probability to occur.

I'm strictly answering your question about voting principles, which is general:

I'm asking why that happened in the first place since fly by wire systems are meant to be redundant with check systems that read results from multiple computers and if one computer is giving a weird result due to malfunction/corruption it is "voted out" and ignored.

I'm providing an example of how this can fail on a A330. This example is valid for any aircraft but there is no reason to relate this example to the current incident. We know nothing about the A320 case, except what Airbus communicates: A faulty ELAC software must be updated.


Redundancy and vote to discard a faulty computer

Consider this principle:

  • There is a master computer providing orders to a flight surface actuator, and a computer, designed differently but with identical capabilities, monitoring the master.

  • There are three of these master-monitor systems to control different surfaces.

  • The three masters do the same calculations, the monitoring computers are aware of the three results and can compare them.

  • Only reliable computers can be elected master computers by the other masters.

  • When one monitoring computer detects an abnormal master behavior, it informs the other masters, they discard the faulty master, declare it unreliable and promote one of the (reliable) monitoring computers to the master status.

Unanticipated circumstances

As demonstrated every day, the mechanism described above works correctly. We don't see why it wouldn't, assuming the majority is always right (sometimes it not and it creates problems, but that's not the point).

For this to work, two conditions are required: All masters must receive similar data and must use algorithms delivering similar results.

A problem arises when, under very unlikely circumstances, one master is provided different data because of data corruption somewhere in the chain. The monitoring computers detect the different result, and the related master is declared unreliable. A new master is be elected. If the faulty data sequence continues, another master is declared unreliable, and so on. The reliable computers pool is indeed quickly exhausted.

After all computers have been declared unreliable, the system fails. Of course this possibility is anticipated by the designers, the aircraft is reconfigured to a more basic "alternate law of control" focused on the most critical aspects, offering less automatic protection, e.g. the protection against excessive angle of attack is lost. This can go on until all automation is removed and the aircraft is basically manually piloted.

This scenario did happen to a Qantas A330 in 2008: Quantas flight 72. I already wrote an answer on this occurrence which shows the specific sequence of events which led to wrongly discard all valid computers:

Quantas flight 72 events

I'm not reproducing the technical details which are all available in the linked answer. The table above shows how the system switched from the normal law of control to the alternate one after master computers were declared unreliable one after the other.

The aircraft wasn't the same, the computers weren't the same, and we don't know about the technical circumstances of the current incident, so there is no conclusion to draw on the present case from this other case.

However this is a perfect example to answer your question about how redundant computers can fail to reject the faulty computer. In both cases the initial event (not the cause) is corrupted data sent to computers. The way data were corrupted was not anticipated in the flight computer software, because it had almost 0 probability to occur.

added 570 characters in body
Source Link
mins
  • 96.8k
  • 32
  • 362
  • 543

I'm strictly answering this general question:

I'm asking why that happened in the first place since fly by wire systems are meant to be redundant with check systems that read results from multiple computers and if one computer is giving a weird result due to malfunction/corruption it is "voted out" and ignored.

The answer doesn't try in any way to relate this example to the current incident. We don't know anything about the latter, except what Airbus communicates: A faulty ELAC software must be updated.


Redundancy and vote to discard a faulty computer

Consider this principle:

  • There is a master computer providing orders to a flight surface actuator, and an identicala computer, designed differently but with identical capabilities, monitoring the master.

  • There are three of these master-monitor systems to control different surfaces.

  • Only trustable computers can be elected master computers by the other masters.

  • The three masters do the same calculations, the monitoring computers are aware of the three results and can compare them.

  • Only reliable computers can be elected master computers by the other masters.

  • When one monitoring computer detects an abnormal master behavior, it informs the other masters, whichthey discard the faulty onemaster, declare it unreliable and promote one of the (trustablereliable) monitoring computers to the master status.

IfUnanticipated circumstances

As demonstrated every day, the mechanism described above works correctly. We don't see why it wouldn't, assuming the majority is always right (sometimes it not and it creates problems, but that's not the point).

For this to work, two conditions are required: All masters must receive similar data and must use algorithms delivering similar results.

A problem arises when, under very unlikely circumstances, one master is provided different data, because of data corruption somewhere in the chain. The monitoring computers detect the different result, and the related master is declared untrustableunreliable. A new master is be elected. If the faulty data sequence continues, another master is declared untrustableunreliable, and so on. The trustablereliable computers pool is indeed quickly exhausted.

After all computers have been declared unstrustableunreliable, the system fails. Of course this possibility is anticipated by the designers, the aircraft is reconfigured to a more basic "alternate law of control" focused on the most critical aspects, offering less automatic protection, e.g. the protection against excessive angle of attack is lost. This can go on until all automation is removed and the aircraft is basically manually piloted.

This exact scenario did happen to a Qantas A330 in 2008: Quantas flight 72. I already wrote an answer on this occurrence which shows the specific sequence of events which led to wrongly discard all valid computers:

Quantas flight 72 events

I'm not reproducing the technical details which are all available in the linked answer. The table above shows how the system switched from the normal law of control to the alternate one after master computers were declared untrustableunreliable one after the other.

The aircraft wasn't the same, the computers weren't the same, and we don't know about the technical circumstances of the current incident, so there is no conclusion to draw on the present case from this other case.

However this is a perfect example to answer your question about how redundant computers can fail to reject the faulty computer. In both cases the initial event (not the cause) is corrupted data sent to computers. The way data were corrupted was not anticipated in the flight computer software, because it had almost 0 probability to occur.

I'm strictly answering this general question:

I'm asking why that happened in the first place since fly by wire systems are meant to be redundant with check systems that read results from multiple computers and if one computer is giving a weird result due to malfunction/corruption it is "voted out" and ignored.

The answer doesn't try in any way to relate this example to the current incident. We don't know anything about the latter, except what Airbus communicates: A faulty ELAC software must be updated.


Consider this principle:

  • There is a master computer providing orders to a flight surface actuator, and an identical computer monitoring the master.

  • There are three of these master-monitor systems to control different surfaces.

  • Only trustable computers can be elected master computers by the other masters.

  • The three masters do the same calculations, the monitoring computers are aware of the three results and can compare them.

  • When one monitoring computer detects an abnormal master behavior, it informs the other masters, which discard the faulty one and promote one of the (trustable) monitoring computers to the master status.

If, under very unlikely circumstances, one master is provided different data, the monitoring computers detect the different result, and the related master is declared untrustable. A new master is be elected. If the faulty data sequence continues, another master is declared untrustable, and so on. The trustable computers pool is indeed quickly exhausted.

After all computers have been declared unstrustable, the system fails. Of course this possibility is anticipated by the designers, the aircraft is reconfigured to a more basic "alternate law of control" focused on the most critical aspects, offering less automatic protection, e.g. the protection against excessive angle of attack is lost. This can go on until all automation is removed and the aircraft is basically manually piloted.

This exact scenario did happen to a Qantas A330 in 2008: Quantas flight 72. I already wrote an answer on this occurrence which shows the specific sequence of events which led to wrongly discard all valid computers:

Quantas flight 72 events

I'm not reproducing the technical details which are all available in the linked answer. The table above shows how the system switched from the normal law of control to the alternate one after master computers were declared untrustable one after the other.

The aircraft wasn't the same, the computers weren't the same, and we don't know about the technical circumstances of the current incident, so there is no conclusion to draw on the present case from this other case.

However this is a perfect example to answer your question about how redundant computers can fail to reject the faulty computer. In both cases the initial event (not the cause) is corrupted data sent to computers. The way data were corrupted was not anticipated in the flight computer software, because it had almost 0 probability to occur.

I'm strictly answering this general question:

I'm asking why that happened in the first place since fly by wire systems are meant to be redundant with check systems that read results from multiple computers and if one computer is giving a weird result due to malfunction/corruption it is "voted out" and ignored.

The answer doesn't try in any way to relate this example to the current incident. We don't know anything about the latter, except what Airbus communicates: A faulty ELAC software must be updated.


Redundancy and vote to discard a faulty computer

Consider this principle:

  • There is a master computer providing orders to a flight surface actuator, and a computer, designed differently but with identical capabilities, monitoring the master.

  • There are three of these master-monitor systems to control different surfaces.

  • The three masters do the same calculations, the monitoring computers are aware of the three results and can compare them.

  • Only reliable computers can be elected master computers by the other masters.

  • When one monitoring computer detects an abnormal master behavior, it informs the other masters, they discard the faulty master, declare it unreliable and promote one of the (reliable) monitoring computers to the master status.

Unanticipated circumstances

As demonstrated every day, the mechanism described above works correctly. We don't see why it wouldn't, assuming the majority is always right (sometimes it not and it creates problems, but that's not the point).

For this to work, two conditions are required: All masters must receive similar data and must use algorithms delivering similar results.

A problem arises when, under very unlikely circumstances, one master is provided different data because of data corruption somewhere in the chain. The monitoring computers detect the different result, and the related master is declared unreliable. A new master is be elected. If the faulty data sequence continues, another master is declared unreliable, and so on. The reliable computers pool is indeed quickly exhausted.

After all computers have been declared unreliable, the system fails. Of course this possibility is anticipated by the designers, the aircraft is reconfigured to a more basic "alternate law of control" focused on the most critical aspects, offering less automatic protection, e.g. the protection against excessive angle of attack is lost. This can go on until all automation is removed and the aircraft is basically manually piloted.

This scenario did happen to a Qantas A330 in 2008: Quantas flight 72. I already wrote an answer on this occurrence which shows the specific sequence of events which led to wrongly discard all valid computers:

Quantas flight 72 events

I'm not reproducing the technical details which are all available in the linked answer. The table above shows how the system switched from the normal law of control to the alternate one after master computers were declared unreliable one after the other.

The aircraft wasn't the same, the computers weren't the same, and we don't know about the technical circumstances of the current incident, so there is no conclusion to draw on the present case from this other case.

However this is a perfect example to answer your question about how redundant computers can fail to reject the faulty computer. In both cases the initial event (not the cause) is corrupted data sent to computers. The way data were corrupted was not anticipated in the flight computer software, because it had almost 0 probability to occur.

added 149 characters in body
Source Link
mins
  • 96.8k
  • 32
  • 362
  • 543

I'm strictly answering this general question:

I'm asking why that happened in the first place since fly by wire systems are meant to be redundant with check systems that read results from multiple computers and if one computer is giving a weird result due to malfunction/corruption it is "voted out" and ignored.

The answer triesdoesn't try in nowany way to relate this example to the current incident. We don't know anytihnganything about the latter, except what Airbus communicates: A faulty ELAC software must be updated.


Consider this principle:

  • There is a master computer providing orders to a flight surface actuator, and an identical computer monitoring the master.

  • There are three of these master-monitor systems to control different surfaces.

  • Only trustable computers can be elected master computers by the other masters.

  • The three masters do the same calculations, the monitoring computers are aware of the three results and can compare them.

  • When one monitoring computer detects an abnormal master behavior, it informs the other masters, which discard the faulty one and promote one of the (trustable) monitoring computers to the master status.

If, under very unlikely circumstances, one master is provided different data, the monitoring computers will detect the different result, and the related master will beis declared untrustable. A new master willis be elected. If the faulty data sequence continuecontinues, another master will beis declared untrustable, and so on. The trustable computers pool is indeed quickly exhausted.

After all computers arehave been declared unstrustable, the system fails. Of course this possibility is anticipated by the designers, and the aircraft is reconfigured to a more basic "alternate law of control" focused on the most critical aspects and, offering less automatic protection, e.g. the protection against excessive angle of attack is lost. This can go on until all automation is removed and the aircraft is basically manually piloted.

You may think this won't never happen. But itThis exact scenario did happen withto a Qantas A330 in 2008: Quantas flight 72. I already wrote an answer on this occurrence which shows the specific sequence of events which leadsled to discardingwrongly discard all valid computers:

Quantas flight 72 events

You may readI'm not reproducing the linked answer if youtechnical details which are interestedall available in the technical details aboutlinked answer. The table above shows how this happened and whythe system switched from the normal law of control to the alternate one after master computers were declared untrustable one after the other.

The aircraft wasn't the same, the computers weren't the same, and we don't know about the technical circumstances of the current incident, so there is no conclusion to draw on the present case from this other case.

However this is a perfect example to answer your question about how redundant computers can fail to reject the faulty computer. In both cases the initial event (not the cause) is corrupted data sent to computers. The way data were corrupted was not anticipated in the flight computer software, because it had almost 0 probability to occur.

I'm strictly answering this general question:

I'm asking why that happened in the first place since fly by wire systems are meant to be redundant with check systems that read results from multiple computers and if one computer is giving a weird result due to malfunction/corruption it is "voted out" and ignored.

The answer tries in now way to relate this example to the current incident. We don't know anytihng about the latter, except what Airbus communicates: A faulty ELAC software must be updated.


Consider this principle:

  • There is a master computer providing orders to a flight surface actuator, and an identical computer monitoring the master.

  • There are three of these master-monitor systems to control different surfaces.

  • Only trustable computers can be elected master computers by the other masters.

  • The three masters do the same calculations, the monitoring computers are aware of the three results and can compare them.

  • When one monitoring computer detects an abnormal master behavior, it informs the other masters, which discard the faulty one and promote one of the (trustable) monitoring computers to the master status.

If, under very unlikely circumstances, one master is provided different data, the monitoring computers will detect the different result, and the related master will be declared untrustable. A new master will be elected. If the faulty data sequence continue, another master will be declared untrustable, and so on. The trustable computers pool is quickly exhausted.

After all computers are declared unstrustable, the system fails. Of course this possibility is anticipated, and the aircraft is reconfigured to a more basic "alternate law of control" focused on the critical aspects and offering less protection, e.g. the protection against excessive angle of attack is lost. This can go on until all automation is removed and the aircraft is basically manually piloted.

You may think this won't never happen. But it did happen with a Qantas A330 in 2008 Quantas flight 72. I already wrote an answer on this occurrence which shows the specific sequence of events which leads to discarding all valid computers:

Quantas flight 72 events

You may read the linked answer if you are interested in the technical details about how this happened and why.

The aircraft wasn't the same, the computers weren't the same, and we don't know about the technical circumstances of the current incident, so there is no conclusion to draw on the present case from this other case.

However this is a perfect example to answer your question about how redundant computers can fail to reject the faulty computer. In both cases the initial event (not the cause) is corrupted data sent to computers. The way data were corrupted was not anticipated in the flight computer software, because it had almost 0 probability to occur.

I'm strictly answering this general question:

I'm asking why that happened in the first place since fly by wire systems are meant to be redundant with check systems that read results from multiple computers and if one computer is giving a weird result due to malfunction/corruption it is "voted out" and ignored.

The answer doesn't try in any way to relate this example to the current incident. We don't know anything about the latter, except what Airbus communicates: A faulty ELAC software must be updated.


Consider this principle:

  • There is a master computer providing orders to a flight surface actuator, and an identical computer monitoring the master.

  • There are three of these master-monitor systems to control different surfaces.

  • Only trustable computers can be elected master computers by the other masters.

  • The three masters do the same calculations, the monitoring computers are aware of the three results and can compare them.

  • When one monitoring computer detects an abnormal master behavior, it informs the other masters, which discard the faulty one and promote one of the (trustable) monitoring computers to the master status.

If, under very unlikely circumstances, one master is provided different data, the monitoring computers detect the different result, and the related master is declared untrustable. A new master is be elected. If the faulty data sequence continues, another master is declared untrustable, and so on. The trustable computers pool is indeed quickly exhausted.

After all computers have been declared unstrustable, the system fails. Of course this possibility is anticipated by the designers, the aircraft is reconfigured to a more basic "alternate law of control" focused on the most critical aspects, offering less automatic protection, e.g. the protection against excessive angle of attack is lost. This can go on until all automation is removed and the aircraft is basically manually piloted.

This exact scenario did happen to a Qantas A330 in 2008: Quantas flight 72. I already wrote an answer on this occurrence which shows the specific sequence of events which led to wrongly discard all valid computers:

Quantas flight 72 events

I'm not reproducing the technical details which are all available in the linked answer. The table above shows how the system switched from the normal law of control to the alternate one after master computers were declared untrustable one after the other.

The aircraft wasn't the same, the computers weren't the same, and we don't know about the technical circumstances of the current incident, so there is no conclusion to draw on the present case from this other case.

However this is a perfect example to answer your question about how redundant computers can fail to reject the faulty computer. In both cases the initial event (not the cause) is corrupted data sent to computers. The way data were corrupted was not anticipated in the flight computer software, because it had almost 0 probability to occur.

added 493 characters in body
Source Link
mins
  • 96.8k
  • 32
  • 362
  • 543
Loading
Source Link
mins
  • 96.8k
  • 32
  • 362
  • 543
Loading