Apache Spark is rapidly becoming the most popular platform for big data processing at UK organisations, according to research by V3's sister site Computing.
More than 500 people who work in IT responded to a study among UK companies with 100 or more employees from different sectors. The replies came from CTOs, CIOs, COOs, CEOs, IT managers and developers.
The research found that Apache Hadoop has been the de facto big data storage engine for some time, and remains so, but that Apache Spark is slowly becoming more popular.
When asked which big data processing platforms the respondents believed their company will use as their primary tool in 18 months, 59 percent said Hadoop, followed by Spark (17 percent), Kinesis (seven percent), Storm (four percent) and Flink (two percent).
Perhaps more notably the research found that 'advanced' organisations, those that lead when it comes to adopting and using technology to drive change, are relatively more likely to use Spark as their primary platform, at 20 percent, suggesting that it is catching up with Hadoop at 32 percent.
It is worth noting that Hadoop and Spark are commonly used in conjunction, but respondents were asked to pick only one processing platform to find out which are making their mark.
Spark is a storage-agnostic, general purpose compute engine that can run on a wide range of back-ends, including Hadoop, NoSQL database Cassandra and cloud-based storage and data warehousing systems.
Hadoop has been around for several years, and organisations use the platform to distribute data across cheap hardware, but obtaining the promised analytical insight using some of the applications in the Hadoop ecosystem is not always straightforward.
In response to end-user feedback, Hadoop vendors have started talking up the use of Spark, which is designed to speed up and simplify many common data-crunching and analytics tasks by pulling them together under one interface and doing all the processing in memory.
As part of the research, numerous high-profile IT decision makers were interviewed by phone, in face-to-face interviews and in a focus group, and Spark was a subject that kept coming up in conversation.
"Spark for its speed and simplicity. It's easy to get it up and running, it's very easy to code and it's blindingly fast compared to Hadoop," said one CTO from the technology sector.
Another CTO explained that it is easier to find people who have experience with Hadoop, but that Spark and Storm are "much more attractive and faster".
"They are both a generation ahead of Hadoop, but not as widely adopted," the CTO said.
A data scientist added that Spark is also replacing the MapReduce element in the Hadoop ecosystem.
"Spark actually does the real-time data streaming, whereas it used to be MapReduce but that could only do batch. Now Spark does batch and real-time. So everybody's actually, if they haven't [already], deployed it. They'll go straight to Spark if they're on MapReduce or they're in the process of migrating to Spark. Again, Spark is really the only game in town," the data scientist said.
To hear the full findings from Computing's big data research sign-up now for the Computing Big Data Summit on March 16 in London. It's free for end users to attend. See the full agenda here.
New regulation expected to cut greenhouse gas emissions by about 17 million metric tonnes between 2020 and 2050
Molybdenum ditelluride is a two-dimensional material that can be easily stacked into multiple layers to create a memory cell
New light-guiding nanoscale device can control and monitor a nanoparticle trapped in a laser beam with high sensitivity
Optical traps are scientific instruments in which a focused laser beam is used to exert an attractive or repulsive force on a microscopic object to hold it in place
Scientists estimate that the exoplanet has already lost up to 35 per cent of its mass over its lifetime