Whoops! The Internet Broke.
Massive outages caused by a cloud-computing bug are the new normal.
Overnight, much of the world ground to a halt: Tens of thousands of flights and trains were canceled or delayed, hospitals stopped elective surgeries, doctors couldn’t book appointments, banks struggled to process transactions, television networks stopped broadcasting. The culprit was not a war, an earthquake, a mounting heat wave, or a terrorist attack, but some faulty computer code. It was likely the largest IT failure in history.
An update to a piece of software provided by CrowdStrike, a popular cloud-based cybersecurity platform, introduced a bug that caused outages in millions of Windows devices. (A spokesperson for the firm, which controls some 15 percent of the market for security software, wrote in an email that “the issue has been identified, isolated and a fix has been deployed.”) This wasn’t just a matter of large numbers of people not being able to log in to their email; industries and government agencies dependent on software for their most basic operations have been upended. A single company erred a single time, the web buckled, and the globe shuddered. It is as though the Y2K apocalypse has finally arrived, 24 and a half years later than expected.
Today’s outages demonstrate the extent to which the world has become dependent on the cloud. Yes, the cloud provides much of what you see on any given screen—social-media feeds, online health portals, digital shopping carts—but it is also responsible for many functions in the “real” world as well. The cloud stores your information in physical buildings; runs the software that hospitals depend on; facilitates the supply-chain and manufacturing logistics that produce and deliver everything in those carts to your door; connects the network of employees that write, edit, and illustrate this magazine.
Software has instantaneous effects—which can prove disastrous given the slower pace of the physical world. In this case, fixing the bug that CrowdStrike introduced is not necessarily as simple as downloading a new update. The faulty code prevents every affected computer from working properly, which means they likely must be manually reconfigured by IT professionals. Fully resolving the problem could be a lengthy process.
CrowdStrike is only one of many tech companies that provide a cloud-based software service to much of the world. Google and Microsoft dominate email and office-work software; Workday takes care of accounting; Okta provides online sign-in services; CloudFlare supports the data centers littered across the globe. Underneath all that, just three firms—Amazon, Google, and Microsoft—account for roughly two-thirds of the cloud market. Despite their immense power, these companies are beholden to shareholders, not the public whose lives they shape. And because of their sprawl, it’s all the more probable that outages like today’s are not isolated, but global. With older, more traditional infrastructure, there may have been reason to fear a single bridge or tunnel collapsing because of faulty cement—the equivalent of one hospital or airline system needing to reboot today. The CrowdStrike situation shows how easily a million digital bridges, all built with the same company’s concrete, can crumble at once.
The reach and importance of cloud-based software will define the century to come. Such tech outages have happened before, at smaller scales, and will happen again. Software and data servers have become a site of geopolitical contest. Protesters target cloud providers the way they would have previously blocked highways. It is clearer than ever that the internet is not some cumulus floating above modern civilization, but the ground it is built upon.
What's Your Reaction?