Despite the success in recent years of the open source Hadoop big data analytics platform, the project's founder Doug Cutting has said he believes there is still a room for improvement.
Speaking to V3 Cutting, who now works for Cloudera having previously worked at Yahoo, said that Hadoop should be just as renowned for fast data processing as it is now for big data number crunching.
"Hadoop started out as a batch computing environment built around the MapReduce computing metaphor," said Cutting.
"People are storing more data and doing more batch analysis, but I think there will soon be a move to interactive online computing, where queries take seconds to run."
Hadoop is a collection of software, including a distributed file system which can handle large amounts of data storage, MapReduce which processes the data, and Common, which is the shared infrastructure that supports the project.
Companies can use Hadoop for the types of analyses that business intelligence tools and big data SQL analysis tools are not designed to handle.
The distributed file system is a batch processing system, a system where data is collected and processed on a batch-by-batch basis.
This means that while the Hadoop is highly scalable and allows users to query petabytes of data, the high latency that comes with batch processing slows down data analysis.
Cutting said that to improve big data analytics, a Hadoop format needs to be developed that allows data to be interoperable between different systems. He is currently working on a project to do this called [Apache] Avro.
Additionally, Hadoop processing needs to be pushed online and become less batch oriented, so queries should take seconds rather than minutes or hours, said Cutting.
"I think search technology will play more of a role in big data to make it more interactive, such as that of [Apache] Lucene," he said. "Hadoop certainly has a way to go in terms of improvements."
Do you agree
Latest stories from Business Software