Google has been blogging about its Seti research project, discussing how the machine-learning system works and more interestingly revealing that the web giant doesn't get everything right all the time.
Seti, a nod to the search for alien intelligence computing project, is a highly scalable learning system designed to try and solve hard 'prediction problems', according to the firm.
"Several years ago we began developing a large scale machine learning system, and have been refining it over time," wrote Simon Tong of Google's research team.
"We gave it the codename 'Seti' because it searches for signals in a large space. It scales to massive data sets and has become one of the most broadly used classification systems at Google."
So far, so good. Seti is fairly accurate, according to Tong, and in tests has proved to be comparable to other methods of classification. However, it appears to perform much better than the rest, when used in the right places.
"Seti has the flexibility to be used on a broad range of training set sizes and feature sets. These sizes are substantially larger than those typically used in academia (e.g., the largest UCI dataset has 4 million instances)," he added.
However, the nut of the post is in the mistakes that the firm made, and Tong has detailed them in a bid to help other organisations make the most of such systems.
"We saw very early on that, despite its many significant benefits, machine learning typically adds complexity, opacity, and unpredictability to a system. In reality, simpler techniques are sometimes good enough for the task at hand," he admitted.
07 Apr 2010