All the latest UK technology news, reviews and analysis

Researchers promise to boost big data performance

29 Nov 2012

LAS VEGAS: A project from the University of California Berkeley is aiming to revolutionise the big data analysis space.

A team of university researchers have developed a set of platforms and systems which aim to dramatically lower the search and analysis times associated with the analysis of large stores of unstructured data.

Speaking at the 2012 re:Invent conference, UC Berkeley researchers Michael Franklin and Matei Zaharia showcased "Spark" and "Shark," a pair of software platforms designed to one day take over the for the current Hadoop big data structure.

The researchers said that the platforms, which make better use of machine learning and low-latency search functions, can perform tasks up to 100 times faster than current Hadoop implementations.

Based on the Apache Hive platform, the new tools have been designed to offer real-time analysis of databases.

In a live demonstration, the Shark tool was able to extract 30,000 results from a 50GB database of Wikipedia entries in less than seven second. By comparison, a modern Hadoop deployment took upwards of 40 seconds to return the same data set.

The researchers believe that the improved performance will help to revolutionise the use of compute-intensive tasks such as genome sequencing.

Franklin said that in addition to faster processing times, the aim of the platform was to solve new issues which have arisen when handling large-scale databases with multiple categories. As new perimeters and larger data sets have emerged, administrators are finding that the quality of analysis has deteriorated.

With the lowered quality of analysis, administrators will have less confidence in the accuracy of their business intelligence.

"If you are not very careful, the quality of your inference can really degrade," he said.

"It is a big scalability problem in dealing with the amount of data, but we are also focused on this inference problem."

Developers and administrators can download the software and documentation from the UC Berkeley computer science site.

  • Comment  
  • Tweet  
  • Google plus  
  • Facebook  
  • LinkedIn  
  • Stumble Upon  
Shaun Nichols

Shaun Nichols is the US correspondent for He has been with the company since 2006, originally joining as a news intern at the site's San Francisco offices.

More on Databases
What do you think?
blog comments powered by Disqus

BYOD vs CYOD vs BYOC poll

Which approach is your firm taking to managing employees' mobile devices?

Popular Threads

Powered by Disqus
Sony Xperia Z2 Tablet powered by Android KitKat 4.4

Sony Xperia Z2 Tablet video

We take a look at the lightweight, waterproof tablet

Updating your subscription status Loading

Get the latest news (daily or weekly) direct to your inbox with V3 newsletters.

newsletter sign-up button

Data protection: the key challenges

Deduplication is a foundational technology for efficient backup and recovery


iPad makes its mark in the enterprise

The iPad can become a supercharged unified communications endpoint, allowing users to enhance their productivity

Software Development Manager (Agile, LAMP, MySQL, SDLC, Web)

Software Development Manager (Agile, LAMP, MySQL, SDLC...

C# .NET SQL Developer (.NET 4.5, MVC, JavaScript, SharePoint) L

C# .NET SQL Developer (.NET 4.5, MVC, JavaScript, SharePoint...

Developer / Analyst Programmer (C#, VB.NET, SQL, Sharepoint)

Developer / Analyst Programmer (C#, VB.NET, SQL, Sharepoint...

Infrastructure Support Analyst - Exchange 2010, Citrix XenApps, Projects, Prince2

Infrastructure Support Analyst – Exchange 2010, Citrix...
To send to more than one email address, simply separate each address with a comma.