Google has apologised for yet another mass outage of its Gmail service yesterday evening, which left users unable to access emails for over 90 minutes.
Ben Treynor, vice president of engineering at Google, wrote in a blog post that the outage was caused by recent changes which were, ironically, designed to improve service availability.
Treynor admitted that the Google team "slightly underestimated" the load which these changes placed on the request routers during a routine period of upgrade work in which some of the Gmail servers were taken offline.
"At about 12:30 pm PST [8.30pm BST] a few of the request routers became overloaded and in effect told the rest of the system 'stop sending us traffic, we're too slow!'," he explained.
"This transferred the load onto the remaining request routers, causing a few more to become overloaded, and within minutes nearly all of the request routers were overloaded. As a result, people were unable to access Gmail via the web interface because their requests could not be routed to a Gmail server."
According to Treynor, the Google team has taken several actions to ensure that the problem does not happen again, including increasing request router capacity and ensuring that they degrade "gracefully". In other words "get slower instead of refusing to accept traffic and shifting their load".
This is not the first time this year that Gmail has suffered a major outage. Google was forced to apologise in February for a two-and-a-half-hour outage which the firm put down to datacentre maintenance.
But doesn't mention Nvidia by name...
PAC slams lackadaisical NHS security as IT security measures are ignored
Visibility, automation and accountability are essential
Developed to enhance real-time biometrics for US Army's night-time operations