海角大神

海角大神 / Text

How a typo made the Amazon cloud go dark for scores of internet users

The outage was an abrupt reminder that the internet is not as invincible as its near seamless fusion with our lives suggests.

By Josh Kenworthy, EqualEd Fellow

How big is Amazon鈥檚 cloud? Big. So big, in fact, that its cloud storage arm, Amazon Web Services, is larger than the equivalent service offered by the next three players 鈥 Microsoft, Google, and IBM 鈥 combined.

That is why it was such a big deal when an Amazon team member, who accidentally entered a couple of wrong bits of code during some routine maintenance on Tuesday, was able to knock out large portions of the internet for around four hours.

AWS hosts a number of high-profile, heavily trafficked websites and services including AirBnb, Netflix, reddit, and Quora, many of whose pages were not loading during the outage. And although the internet giant moved quickly to fix the problem, the mishap was one of the periodic reminders we get that the internet is not as invincible as its near seamless fusion with our lives suggests.

In a public apology issued by Amazon, the company explained that the fat-finger incident occurred while an employee from Amazon Simple Storage (S3) was working to speed up the S3 billing process. 鈥淯sing an established playbook executed a command,鈥 as Amazon put it, the worker鈥檚 intention was to temporarily offline a small number of servers in the S3 subsystems, but the error took down a lot more.

鈥淚n this instance, the tool used allowed too much capacity to be removed too quickly,鈥 Amazon said. 鈥淲e have modified this tool to remove capacity more slowly and added safeguards to prevent capacity from being removed when it will take any subsystem below its minimum required capacity level.鈥

Or, as The Washington Post鈥檚 Brian Fung put it: 鈥淭ranslation: Employees will no longer be able to unplug whole parts of the Internet by mistake.鈥

In Amazon鈥檚 case, its rise to the top of the so-called Infrastructure as a Service (IaaS) tree, began in 2006, when it, in all its frugality, started buying up or leasing existing data centers dotted across northern Virginia, 鈥渁 central region for internet backbone,鈥 according to The Atlantic.

However, the fact that Amazon didn鈥檛 build new servers from scratch also means they鈥檙e old, potentially making them more susceptible to crashing.

The timing of the crash couldn鈥檛 have been worse. It came on the same day that Amazon was holding one of its AWSome Days, where it promotes the advantages of AWS and educates people how to use it. BGR.com鈥檚 Mike Whener wrote about the unfortunate timing from Edinburgh, Scotland: