Users of the popular big data analytics platform Hadoop can now process queries on data in real-time due to a breakthrough from Cloudera.
Cloudera, the commercial arm of Hadoop, has launched Impala, an open source, Apache-licensed, real-time query engine that works on data stored in the Hadoop Distributed File System and HBase.
Impala will allow organisations to query petabytes of data at once. The product is still in beta and can be downloaded from the Cloudera website.
"The motivation for Hadoop was always to process large amounts of data, but the question was whether Hadoop could go beyond batch-processing," Cloudera chief operating officer, Kirk Dunn, told V3.
"Two years ago Cloudera launched the project Impala to try and speed up the analytics process so Hadoop could return data fast enough to inform business decisions in real time. You can now use Impala to get results in seconds. It's a large and fundamental advancement in the platform."
Cloudera Enterprise will also soon be available to Hadoop users, as an optional management and support subscription module. Dunn said the offering will be available from the start of next year.
Only last week, Hadoop founder Doug Cutting told V3 the platform was about to get faster and more interactive.
Hadoop is a collection of software, including a distributed file system which can handle large amounts of data storage, MapReduce which processes the data, and Common, which is the shared infrastructure that supports the project.
Companies can use Hadoop for the types of analyses that business intelligence tools and big data SQL analysis tools are not designed to handle.
The distributed file system is a batch processing system, a system where data is collected and processed on a batch-by-batch basis.
This has meant that while the Hadoop is highly scalable and allows users to query petabytes of data, the high latency that comes with batch processing has until now slowed down data analysis.
Cloudera is celebrating the launch of Impala as the first management solution that allows batch and real-time operations to be performed on large amounts of data at the same time.
Latest Tesla news: Tesla stock price tanks amid reports of 'widening probe' by SEC and claims the base Model 3 loses money
SEC 'probe' takes its toll on Tesla as new research suggests that Tesla loses $6,000 on every $35,000 Model 3
10nm Cannon Lake Core i3-8121U CPUs make a rare outing with Intel's NUC mini PC
'Notorious' Australian child hacker thought he had executed 'flawless' hack
The former employee says that Tesla fired him for bringing the accusations to management internally