Hadoop deployments in the cloud are increasing, and the Internet of Things (IoT) will see the next growth spurt for the 10-year-old big data platform, according to co-creator Doug Cutting.
Most Hadoop deployments have been in large organisations in the telecoms and finance sectors, and at internet companies such as Yahoo, where the big data platform was born.
But the range of use cases is expanding as it evolves from experimental software to the de facto way to store and process big data, and Cutting sees the IoT as a natural fit, citing companies that are already using it for sensor data.
"Caterpillar collects data from all of its machines. Tesla is able to gather more information than anyone else in the self-driving business. They're collecting information on actual road conditions because they have cars sending all the data back," he said.
"And Airbus is loading all its sensor data from planes into Hadoop to understand and optimise their processes."
There has been much focus on driverless vehicles, but a quiet revolution has been going on in all cars, he pointed out.
"Almost every car these days has a cellular modem in it. I recently heard that more than 50 per cent of new cellular devices are not phones but other things that are connected," he said.
Cutting, who is chief architect at Cloudera, outlined the most important elements in the Hadoop ecosystem for the IoT.
"Lots of components are very relevant, things like Flume and Kafka helping events flow in, and streaming with Spark," he said, citing Apache Kudu, a data layer recently incorporated into Cloudera's distribution.
"What Kudu lets you do is update things in real time. It's possible to do these things using HDFS but it's much more convenient to use Kudu if you're trying to model the current state of the world."
Cutting explained that he tries not to be precious about the core elements of Hadoop that he was instrumental in creating.
"From Cloudera's perspective we don't want to get in a turf war in defending Hadoop against other projects, rather we are interested in finding the suite of technologies that serves our customers," he said.
"Hadoop's performance fulfils some valuable roles there, but as new things come along we are going to aggressively adopt the new projects. Kudu gives some valuable new functionality, as do Kafka and Spark."
Rival distributions Hortonworks and MapR also see a huge potential market in the IoT, both citing the connected car in recent interviews and promotional literature.
Hortonworks has merged the Apache NiFi data flow system with Kafka and Storm to create Hortonworks DataFlow, a platform for collecting, transporting and analysing data from a multitude of sources.
Meanwhile, MapR Streams is an event publish and subscribe framework integrated into the MapR Converged Data Platform and designed to replicate event data across disparate clusters.
The majority of Hadoop distributions are still on-premise, but cloud deployments are growing twice as fast.
"We are spending a lot of time making our offerings work well in the cloud," Cutting said. "We're trying to provide really powerful high-level tools to make the lives of those delivering this tech a lot easier."
HP and Centrica are the first industry partners to sign up to the government's new Code
New ice grows faster but is also more vulnerable to weather and wind
With a crackdown on cheats is coming in November, PUBG rushes to fix matchmaking problems introduced in Update #22
New material uses carbon dioxide from the air to repair and reinforce itself