Amazon Web Service (AWS) has apologised for an outage that occurred over Christmas that left several major web firms, such as Netflix, without service on Christmas eve.
Amazon blamed human error for the outage, explaining in a detailed blog on Monday that it was caused when a portion of its Elastic Load Balancing (ELB) service was accidentally deleted.
"The data was deleted by a maintenance process that was inadvertently run against the production ELB state data," it said.
"This process was run by one of a very small number of developers who have access to this production environment. Unfortunately, the developer did not realise the mistake at the time."
As a result it took longer than expected for staff to identify the cause of the issues and resolve it. Amazon finally implemented the fix before lunch on Christmas Day. This led to an apology from AWS for the incident.
"We know how critical our services are to our customers' businesses, and we know this disruption came at an inopportune time for some of our customers," it said.
"We will do everything we can to learn from this event and use it to drive further improvement in the ELB service."
Netflix was one firms affected by the outage. In a blog post its cloud architect Adrian Cockcroft said the firm was now considering how to ensure it would be able to cope with any similar incidents in the future, given its reliance on cloud tools to run its movie services.
"Netflix is designed to handle failure of all or part of a single availability zone in a region as we run across three zones and operate with no loss of functionality on two. We are working on ways of extending our resiliency to handle partial or complete regional outages," he wrote.
He noted, though, that while "Christmas Eve is traditionally a slow Netflix night as many members celebrate with families" the length of the delay did impact its customers.
"We see significantly higher usage on Christmas Day and increased streaming rates continue until customers go back to work or school."
The incident is yet another timely reminder of the potential pitfalls of moving to a fully-hosted cloud environment as many IT managers reassess their infrastructure and IT costs for the year ahead.
Dan Worth is the news editor for V3 having first joined the site as a reporter in November 2009. He specialises in a raft of areas including fixed and mobile telecoms, data protection, social media and government IT. Before joining V3 Dan covered communications technology, data handling and resilience in the emergency services sector on the BAPCO Journal.