2

A small technical error in an algorithm or program can cost millions of dollars. This article takes you through 7 of the most costly bugs in history so far due to programming errors.

NASA Mars Climate Orbiter: $193 million


In 1999, NASA's Mars Climate Orbiter entered an orbit that was too low, causing the spacecraft to burn up in the atmosphere. The failure was ultimately traced to a conversion error - the imperial pound-second unit was not converted to the standard metric Newton-second.

NASA's Mars Orbiter is the second probe in its Mars Surveyor program, which also includes the Mars Global Surveyor, launched in November 1996, and the Mars Polar Lander, launched in January 1999. They are designed to arrive at roughly the same time to conduct experiments on Mars' surface, climate and atmosphere. It was originally scheduled to reach orbit on September 23, 1999. NASA scientists hope that once the spacecraft reaches Mars, it will help them reconstruct climate history and find signs of water on the surface. After this mission, it will also serve as a communications relay for future Mars missions.

On September 23, 1999, the Mars Climate Orbiter began to burn into orbit as planned. The spacecraft was supposed to re-establish contact after passing by Mars, and then send the signal, but unfortunately, the spacecraft did not receive any signal. During the week between TCM-4 and its entry into orbit, the navigation team thought the probe's altitude was likely much lower than expected, somewhere between 150 and 170 kilometers.

The main reason for the failure of the Mars Climate Explorer mission was human factors, as the flight system software on Mars Climate Explorer used imperial units of pound-force to calculate thruster power, while ground personnel entered directional corrections and thruster parameters in metric systems The unit is Newton, causing the probe to enter the atmosphere at the wrong height, and eventually disintegrate and fragment.

Mariner 1: $18.5 million


The Mariner 1 incident, also known as the most expensive hyphen in human history, was another NASA misstep that, albeit a small one, cost the company millions of dollars.

Mariner's plans launch a series of unmanned spacecraft designed to explore Mars, Venus, Mercury, and more. The program has won several firsts, including: the first intergalactic interstellar flyby, the first planetary probe, and the first spacecraft to travel by gravity.

Mariner 1 launched at 9:21 am on July 22, 1962, less than 5 minutes away from launch, the mission was forced to abort; the most historic flight in human history crashed to the ground, and it was only because the mathematical code a small mistake.

NASA is quoted on its website: "Until the range safety officer detected an unscheduled yaw lift action, the booster performed satisfactorily. However, the misapplication of the guidance command resulted in an inability to steer and would result in a The vehicle crashed, possibly in the North Atlantic Seaway or in an inhabited area, and 'a range safety officer then ordered a destructive abort'."

A few days after the accident, The New York Times published an article explaining the cause of the crash. It says the error is the result of "a missing hyphen in some math data". A NASA programmer allegedly left out the symbol when entering "a lot of coded information" into the computer system.

A few days later, NASA official Richard Morrison presented the case for destroying the rocket to Congress, emphasizing the importance of small omissions: "The hyphen prompts the spacecraft to ignore computer-provided information until radar contact is restored. Data. When that hyphen is omitted, false information is fed into the spacecraft control system. In this case, the computer turns the rocket to the left, nose down, the rocket obeys the order and crashes."

Ariane 5 Flight 501: $8 million


On June 4, 1996, the unmanned Ariane 5 rocket launched by the European Space Agency exploded just 40 seconds after liftoff from Kourou, French Guiana. The rocket made its first voyage after a decade of multimillion-dollar development.

A committee of inquiry investigated the cause of the explosion and issued a report within two weeks. It turned out that the cause of the failure was a software bug in the inertial reference system. The software installed on Ariane 5 was originally developed for Ariane 4. Ariane 5 has a more powerful engine, which caused bugs that were not possible in previous versions.

The 64-bit floating point number associated with the horizontal velocity of the rocket relative to the platform is converted to a 16-bit signed integer. The number is greater than 32767 which is the largest integer that can be stored in a 16-bit signed integer, so the conversion fails. So, in the 39th second, the rocket began to collapse and self-destruct under the influence of aerodynamic force.

Pentium processor bug: $475 million


The Pentium FDIV vulnerability is the most famous, or perhaps most notorious, Intel microprocessor vulnerability. It was designed to be faster and more accurate, but ended up being bugged and causing errors in the lookup table that is part of Intel's SRT algorithm.

In order to execute 3x faster floating point scalar code and 5x faster vector code, compared to the 486DX chip, Intel decided to use the SRT algorithm, which can generate two quotients per clock cycle, compared to the traditional The 486 shift-and-subtract algorithm generates only one quotient bit per cycle. This SRT algorithm uses a lookup table to calculate the intermediate quotient required for floating point division. Intel's lookup table consists of 1066 table entries, of which five were not downloaded to the programmable logic array (PLA) due to programming errors. When the floating-point unit (FPU) accesses any of these five units, it (FPU) takes zero instead of +2, which should be included in the "missing" unit.

In the worst case, this error can occur as high as the fourth significant digit of a decimal number, but the chance of this happening is 1 in 360 billion. Errors are most common in the 9th or 10th decimal digit, and the chance of this happening is 1 in 9 billion. However, disgruntled customers believe that every user should get working hardware and demand a replacement.

Morris Worm: $100 million


Would you believe it if a student trying to solve a problem accidentally created malware that caused $100 million worth of damage to cover? But it did, and that's exactly what happened on November 2, 1988. Cornell graduate student Robert Taipan Morris accidentally created a malware program. At first, it was a harmless experiment in the program, but there was a small bug in the code. The malware began to spread rapidly and subsequently destroyed thousands of computers.

Robert Morris was charged with cybercrime and fined $10,000 for it. However, the malware resulted in a total cost of $100 million to repair affected computers.

Morris' attorneys claim that the worm helps improve cybersecurity because it aids in the development of antivirus software and makes users aware of such malware in the future. Morris later became a co-founder of Y Combinator. He is an associate professor at MIT. A floppy disk with the malware source code is stored at Boston University. Let's just hope it doesn't mutate.

What is more influential and far-reaching than the incident is that hackers have truly become black since then, the ethics of hackers have lost their restraints, and the tradition of hackers has begun to break. The popular impression of hackers can never be answered. Moreover, computer viruses have since entered the mainstream.

Knight bankruptcy: $440 million


What if a key stakeholder in the U.S. stock market started buying high and selling low? Doesn't sound like a good trade strategy, right? That's exactly what happened to Knight and nearly bankrupted them.

On the morning of August 1, 2012, a nightmare occurred that any CEO would have. The building that took 17 years to build nearly collapsed within hours. There is a bug in some new trading software that only gets activated when the NYSE opens that day. The faulty software set Knight on a buying spree, and soon the company was buying shares in about 150 different companies, worth about $7 billion, within the first hour of trading.

On the morning of August 1, 2012, something happened that would be a nightmare for any CEO: what took 17 years to build collapsed in almost a few hours. Some of the new trading software had a bug that was only activated when the New York Stock Exchange opened that day. The faulty software set Knight on a buying spree, and soon the company was buying shares in about 150 different companies, worth about $7 billion, within the first hour of trading.

Knight tried to cancel the deal, but SEC Chairman Mary Schapiro refused. With the exception of the six stock trades that were reversed, Knight's other buying spree didn't meet the cancellation threshold -- raising the price of the shares purchased by more than 30%. In other cases, the transaction was established.

That's bad news for Knight, who had no choice but to sell the stock it bought once it became clear that a deal would go through. Just as the morning buying frenzy drove up the prices of these stocks, a massive market sell-off will likely force prices down, possibly to the point where Knight can't make up for his losses.

Goldman Sachs stepped in, buying the entire position that Knight didn't want for $440 million.

Millennium bug: $500 billion


What harm can a single digit do? Back in 1999, it cost $500 billion. Y2K errors, also known as Millennium Errors, are a type of computer bug. Because the year is only represented by two decimal digits, when the system performs date processing operations that span centuries (such as dates after December 31, 1999), incorrect results will occur, which will lead to various The system malfunctions or even crashes.

When writing complex computer programs in the 1960s and 1980s, computer engineers used two-digit codes to represent years. "19" is excluded. The date is not 1970, but 70. Computer engineers shortened the date because in those days data storage in computers was expensive and took up a lot of space.

As the year 2000 approached, computer programmers realized that computers might not interpret 00 as 2000, but as 1900. All activities programmed on a daily or annual basis are damaged or defective. When December 31, 1999 becomes January 1, 2000, the computer interprets December 31, 1999 as becoming January 1, 1900.

Banks and other financial institutions that calculate interest rates on a daily basis face real problems. The computer will calculate the negative 100-year rate, not the one-day rate. Power plants, transportation and many other sectors will also be affected by this change.

The U.S. government passed the Information and Preparedness Disclosure Act of 2000 to prepare for the event and created a presidential committee of senior government officials and agency officials such as the Federal Emergency Management Agency (FEMA) to oversee private companies' preparations for the event system work. Research firm Gartner estimates the global cost of avoiding the millennium bug could be as high as $600 billion.

Having learned the seven lessons above, remember to always test your software in the initial stages to avoid the high cost of breakage and repair.


陈哥聊测试
158 声望3.3k 粉丝

资深敏捷测试顾问,国内知名项目管理软件禅道团队成员。