Are computers reliable?

PcTek9

Well-Known Member
Reaction score
87
Location
Mobile, AL
In general yes, but honestly, perhaps not.

You can have a pc with no hardware problems and no software problems and it can crash.

In my hardware design classes I learned that cpu's, memory, and io buses are all subject to crashing from ... APD. What's APD? It's alpha particle decay. In fact, one of the things engineers learned when designing computers, is that cpu's have bit-error rates that occur because of particles striking the devices.

Cpu bit-error rates are usually 10^-18 power, but for ram, they are a 'bit' higher at 10^-8 to 10^-12.

To overcome this engineers user a variety of algorithms to insure that when data is communicated from the cpu to ram, or the cpu to the motherboard, that the data is the same data the cpu originally sent.

Originally simple parity checks were used in early computers, but this quickly became more complex as hamming codes were used, and SECDED coding methods. Cyclic Redundancy Checks are also used.

So for example if the cpu gets ready to talk to memory or to the io bus it must encode everything it says, so that when the iobus gets it, it can decode it and check that the bits are the original bits the cpu sent.

Parity checks could find 1 bit errors, multiple parity checks were combined together to form 'hamming codes' which could detect multiple bit errors and correct single bit errors. An extension from this basic hamming code called SECDEC allows the installation of an additional bit in the data which once again adds another level of data integrity protection by simply generating a parity over the entire group (byte, doublebyte, or word).

The single error correct, double error detect method is highly efficient for large words, like on a 64 bit cpu, you can think of it as just Log base 2 of 64 which equals 8 so only 8 secded bits are required. When used in memory an entire pulse train can be secded secured allowing even higher effiency ratios to be generated.

Secded forces the number of bad or altered bits occurring in a word to more than 3 which is extremely rare. What are the odds of that much alpha decay flipping the value of 3 bits in a word ?

It turns out that if the probability P of an alpha particle error in data transmission from cpu to memory exists, that as you increase the levels of complexity in secded encoding, the probability DECREASES exponentially to the power of the level.

That means the probability of a word having a tribit transmission error is effectively P to the power of 3, which transported out of theoretical computer science into real world data simply means 10 to the power of -32.

So to sum up, no pc is 100% reliable, no matter how well designed, there is always the possibility in this infinite universe that a group of alpha particles will decide to take aim at your data.

Additionally variations in the throughput of data in both latency and bandwidth give rise to a tertiary method of data communication including PIO, DIO, and DMA (direct memory addressing). In PIO, data transmission is achieved through software implementation and loops to check subsequent receipt of and integrity of data. In DIO (driven IO) a combination of hardware and software protocols dispense arbitration and response to interrupt requests. On the original IBM PC XT motherboard this was accomplished by the intel 8259 irq chip. It had basically 8 hardware irq's.. Later they took another 8259 and ran a connection from leg 2 of the first one to leg 9 of the second one and generated the 16 interuppts for ibm PC AT.

Of course we all know dma is just direct memory access, which is a completely hardware solution to handling interrupts. Which means only the software has to worry about data intercommunication.

All 3 methods of data communication employ some form of encoding, b/c anywhere data is being transmitted in the computer, there is the possibility that a bit can flip due to alpha particle decay. :)
 
Really? That statement contradicts itself.

As for the rest of your post, very nice

The rest of the post explains it. Random particles from space screw with the bits while they are in transit and corrupts data, which can cause a BSOD.
 
Though very rare BitFlipping and BitRot effects are a perfect reason to have backups of your data. If dealing with a server, it's great to keep a spare barebones of the system. Computer are as reliable as they can be if properly setup and maintained. I'm not sure anyone could build a bulletproof system or expect to account for every possible weakness of the system.
 
Back
Top