21 Mar 2011
The unreliability of Amazon Web Services is causing some cloud-dependent firms to reconsider their options and move their infrastructure onto other platforms.
Problems with the firm's Elastic Block Store (EBS) service occurred at an Amazon datacentre in Virginia on 17 and 18 March. The issues were fixed relatively quickly, but still had a significant impact.
"From 7:28pm PDT to 9:56pm PDT a networking issue affected connectivity to a significant number of instances in the US-EAST-1 region. Affected instances experienced degraded network connectivity to the internet and to instances in other availability zones," said the firm on its Service Health Dashboard.
"The root cause of last night's issue was when a core network routing device experienced a partial failure. While the router was causing packet loss, the failure was not detected by surrounding network devices and therefore they did not automatically fail traffic over to redundant network paths as intended."
However, while the problems were apparently solved relatively easily in-house, the same cannot be said for companies that were using the services.
Heruko, a Ruby-based cloud platform-as-a-service provider, said that it was still experiencing problems on Friday.
"Network connectivity has improved substantially, but we are still seeing brief periods of instability as additional networking changes are applied to mitigate the problems. We will continue to provide updates as soon as we have anything to share," the company wrote on 18 March.
Reddit, the news and link-sharing site, also suffered from the outage and was, like Heruko, forced to use its blog to explain the problem to customers.
"As most of you are probably aware, we had some serious downtime with the site today," wrote the firm.
"As you will see, the blame was partly ours and partly Amazon's (our hosting provider). But you probably don't care who is to blame, and we aren't here to assign blame. We just want to tell you what happened."
Reddit said that, despite Amazon's relatively swift reaction to the problems, and the usefulness of EBS, the company had decided to move some of its systems away from the Amazon cloud server offering.
"Even before the serious outage last night, we suffered random disks degrading multiple times a week. While we do have protections in place to mitigate latency on a small set of disks by using raid-0 stripes, the frequency of degradation has become highly unpalatable," Reddit said.
"Over the course of the past few weeks, we have been working to completely move Cassandra [servers] off EBS and onto the local storage which is directly attached to the EC2 instances."
In a move which could be mirrored by other companies affected by the outage, Reddit argued that, although local storage has much less functionality than EBS, its reliability outweighs the benefits of EBS.
An ex-Reddit employee posting on the firm's discussion board on Friday was more outspoken, claiming that EBS alone accounts for more than 80 per cent of Reddit's downtime.
"Amazon EBS is a barrel of laughs in terms of performance and reliability and a constant (and the single largest) source of failure across Reddit," he wrote.
"Reddit's been in talks with Amazon all the way up to CIOs about ways to fix them for nearly a year and they've constantly been making promises that they haven't been keeping, passing us to new people that 'will finally be able to fix it', and variously otherwise desperately trying to keep Reddit while not actually earning it."
Latest stories from Outsourcing
Related articles
Related jobs
Poll
Are you confident that the UK's IT infrastructure is secure from attack in the wake of the Flame malware revelations?
V3 examines the key strengths and weaknesses of Samsung's latest iPhone killer
Connect with V3.co.uk
Social networking is almost ubiquitous. This white paper examines the benefits and risks and it looks at the different ways companies can reconcile them
The importance of understanding your infrastructure
IT Support Analyst - Active Directory, Windows 7, MS...
Helpdesk / Desktop Support Analyst (Windows 7, MAC, Windows...
Infrastructure / Server Support Analyst - 3rd Line, Windows...
Credit Risk Modeller, SAS, London, £50,000 Title- Credit...
Keep up to date with the latest products, services and technologies from the world's leading IT companies. IThound.com brings you over 2,000 white papers, case studies and analyst reports.
Do you agree?
Dear oh Dear
Cloud computing is still in the hype stage. Of course it is unreliable - it will take several more years to mature and reliability will ocme at a price.
Posted by: ethel the frog 22 Mar 2011
Reddit is scapegoating Amazon
Reddit is blaming Amazon for their own failures. Reddit put all their eggs in one basket and didn't performance test their site. For them to now blame Amazon is absurd. Amazon should sue Reddit for libel.
Posted by: LouF 22 Mar 2011
Really?
I don't know about 'unreliable.' I am an independent tech consultant and author. I have never seen anything completely reliable, and Amazon doesn't claim as much. Hosting your data and apps internally is hardly the equivalent of 100% uptime. There will always be naysayers... Phil
Posted by: Phil Simon 21 Mar 2011