All the latest UK technology news, reviews and analysis

Amazon Web Services outages raise serious cloud questions

by Dave Neal

21 Mar 2011

Comments: 3

  • Tweet this

The unreliability of Amazon Web Services is causing some cloud-dependent firms to reconsider their options and move their infrastructure onto other platforms.

Problems with the firm's Elastic Block Store (EBS) service occurred at an Amazon datacentre in Virginia on 17 and 18 March. The issues were fixed relatively quickly, but still had a significant impact.

"From 7:28pm PDT to 9:56pm PDT a networking issue affected connectivity to a significant number of instances in the US-EAST-1 region. Affected instances experienced degraded network connectivity to the internet and to instances in other availability zones," said the firm on its Service Health Dashboard.

"The root cause of last night's issue was when a core network routing device experienced a partial failure. While the router was causing packet loss, the failure was not detected by surrounding network devices and therefore they did not automatically fail traffic over to redundant network paths as intended."

However, while the problems were apparently solved relatively easily in-house, the same cannot be said for companies that were using the services.

Heruko, a Ruby-based cloud platform-as-a-service provider, said that it was still experiencing problems on Friday.

"Network connectivity has improved substantially, but we are still seeing brief periods of instability as additional networking changes are applied to mitigate the problems. We will continue to provide updates as soon as we have anything to share," the company wrote on 18 March.

Reddit, the news and link-sharing site, also suffered from the outage and was, like Heruko, forced to use its blog to explain the problem to customers.

"As most of you are probably aware, we had some serious downtime with the site today," wrote the firm.

"As you will see, the blame was partly ours and partly Amazon's (our hosting provider). But you probably don't care who is to blame, and we aren't here to assign blame. We just want to tell you what happened."

Reddit said that, despite Amazon's relatively swift reaction to the problems, and the usefulness of EBS, the company had decided to move some of its systems away from the Amazon cloud server offering.

"Even before the serious outage last night, we suffered random disks degrading multiple times a week. While we do have protections in place to mitigate latency on a small set of disks by using raid-0 stripes, the frequency of degradation has become highly unpalatable," Reddit said.

"Over the course of the past few weeks, we have been working to completely move Cassandra [servers] off EBS and onto the local storage which is directly attached to the EC2 instances."

In a move which could be mirrored by other companies affected by the outage, Reddit argued that, although local storage has much less functionality than EBS, its reliability outweighs the benefits of EBS.

An ex-Reddit employee posting on the firm's discussion board on Friday was more outspoken, claiming that EBS alone accounts for more than 80 per cent of Reddit's downtime.

"Amazon EBS is a barrel of laughs in terms of performance and reliability and a constant (and the single largest) source of failure across Reddit," he wrote.

"Reddit's been in talks with Amazon all the way up to CIOs about ways to fix them for nearly a year and they've constantly been making promises that they haven't been keeping, passing us to new people that 'will finally be able to fix it', and variously otherwise desperately trying to keep Reddit while not actually earning it."

Do you agree?

 

Add your comment

We won't publish your address
By submitting a comment you agree to abide by our Terms & Conditions. Your comment will be moderated before publication.

Poll

Flame virus poll

Are you confident that the UK's IT infrastructure is secure from attack in the wake of the Flame malware revelations?

35%

0%

10%

55%

Connect with V3.co.uk

Sign up to our daily or weekly newsletters

Symanteccloud

Social networking: a guide for IT managers

Social networking is almost ubiquitous. This white paper examines the benefits and risks and it looks at the different ways companies can reconcile them

Riverbed

Mitigating the risks of IT change

The importance of understanding your infrastructure

IT Support Analyst - Active Directory, Windows 7, MS Office

IT Support Analyst - Active Directory, Windows 7, MS...

Helpdesk / Desktop Support Analyst (Windows 7, MAC, Windows Server 2008, LAN)

Helpdesk / Desktop Support Analyst (Windows 7, MAC, Windows...

Infrastructure / Server Support Analyst - 3rd Line, Windows 2008, Exchange 2010, VMware

Infrastructure / Server Support Analyst - 3rd Line, Windows...

Credit Risk Modeller, SAS, London, £50,000

Credit Risk Modeller, SAS, London, £50,000 Title- Credit...

To send to more than one email address, simply separate each address with a comma.