As cloud adoption grows at an inexorable rate, so too does the impact of a cloud outage. Many of the services used by businesses and consumers on a daily basis rely on cloud platforms, and an outage can cripple them in one fell swoop.
There have been several noteworthy outages. Facebook was recently brought to its virtual knees twice, and Apple's iCloud suffered a major fault that knocked out the iTunes Store and iTunes Connect service.
V3 has a rundown of the top 10 cloud outages that have hampered some of the world's most popular services. But while the outage of an internet-delivered service is problematic, the real impact is felt when a public cloud platform goes down.
The worldwide footprint of such services means there are often backup data centres ready to take up the work of those suffering problems, but the impact of such outages is still not to be sniffed at.
David Jones, a digital performance expert at application analytics firm Dynatrace, explained that the loss of cloud-based services can have a nasty impact on a company's brand, particularly when its customers start to vent their frustration on social networks.
"In a world where seconds of delay can translate into millions of dollars in lost revenue and affect reputation and loyalty, an outage of this breadth can wreak havoc, and that's just what we've seen from the immediate reaction on social media," he said in reference to Facebook's problems.
"Organisations must have the ability to isolate the cause of performance issues in real time and use this information to prevent users from being affected, or businesses and the customers who rely on them can be thrown back to the dark ages."
These outages raise questions about the precautions platform providers and cloud-reliant services can take to mitigate the impact.
Clive Longbottom, founder of analyst house Quocirca, told V3 that cloud outages are a common occurrence, more so with private than public clouds, but can be addressed if vendors and service users are willing to learn from downtime and its impact.
"The providers need to do something about the number of outages that seem to be occurring. Hopefully, they will learn from each one and put in place new policies, procedures and systems as required to make sure that a repeat doesn't happen," he said.
"There is not much a single customer can do about it, except to request a fully mirrored service, if there is one available, and if it is affordable."
Some companies are already trying to learn from the effects of an outage or problem in their cloud services and infrastructure.
Netflix, which uses AWS to power its video streaming services across the globe, has its Chaos Monkey tool that deliberately kills an AWS server allocated to the company to ascertain the impact of such an occurrence on its services.
The team behind Chaos Monkey explained that the use of such a tool gives Netflix engineers a way to uncover weaknesses in their AWS infrastructure and take action before they cause bigger problems.
"Since we knew that server failures are guaranteed to happen, we wanted those failures to happen during business hours when we were on hand to fix any fallout," the team explained.
"We knew that we could rely on engineers to build resilient solutions if we gave them the context to 'expect' servers to fail. If we could align our engineers to build services that survive a server failure as a matter of course, when it accidentally happened it wouldn't be a big deal. In fact, our members wouldn't even notice. This proved to be the case."
Laurent Lachal, senior analyst for infrastructure solutions at analyst house Ovum, agreed that companies providing and using cloud services need a way to address and bypass cloud outages.
"It is increasingly complex, and complex systems fail from time to time," he told V3. "The key is to design them to flow around infrastructure failures, and if that is not possible, to fail gracefully, to degrade rather than disappear."
Lachal also agreed with Longbottom's advice about learning from previous cloud outages. "It is key to manage the outage by explaining what is happening during the event and what lessons have been learned after it," he said.
Given how cloud outages can be affected by numerous factors, such as power disruption to data centres and broken broadband connections, it is not surprising to see IT companies offering specialist services to cloud vendors and data centre providers to assess the health of the infrastructure that supports their cloud offerings.
IT infrastructure analytics firm Sumerian recently launched a capacity planning tool aimed at giving companies insight into the capacity of their IT estates and servers so they know how their infrastructure handles high demand, and can identify weaknesses before they cause a service outage.
There appears to be no silver bullet to slay the threat of cloud outages, but careful planning and rigorous testing could lessen the impact of any disruption. In turn this may just save a company from brand-wrecking customer vitriol when that inevitable server failure kicks in.
Equinox's Dave Millett explores how phone, mobile and broadband could be affected by a no-deal Brexit
Dust storm on Titan only the third Solar System body where such storms have been observed
New technique could enable quantum computers to scale-up to millions of qubits
Systrom and Krieger taking time off "to explore our curiosity and creativity"