Big data analytics software is evolving at a heady pace given the increasing amount of information being harvested from ever more diverse sources, in particular the Internet of Things (IoT).
This evolution is driven by the desire to make sense of, and glean useful insights from, disparate datasets without needing a team of skilled and expensive data scientists.
To address this challenge, data warehouse and analytics firm Teradata has revealed two significant analytics software products designed to simplify the process of integrating complex datasets and carrying out analytical projects on top.
Piping big data
The first is Listener, an intelligent software tool which, as its name suggests, can listen to data streaming from IoT networks and related sources.
Listener uses self-service capabilities to ingest and distribute streaming data from individual or multiple sources and push it into data warehouses, database platforms and analytics systems, such as those offered by Teradata Aster Analytics and Hadoop.
Imad Birouty, director of technical products at Teradata, told V3 that Listener stands out from similar software tools in the market as it can collect data in its raw format.
"We are not doing any type of transformation of the data when it comes in. We are not doing any type of real-time analytics as the data flows. [Listener] is merely a high-volume capture and high-volume distribution of data from many sources to many targets," he said.
Collecting raw data means it can be used to fit customers' analytical projects, rather than transformed into a format that may limit the type of analytics that can be carried out.
Furthermore, data can flow in both directions between data sources, such as apps or sensors, and data warehouses and analytics systems, particularly those that analyse data in real time such as Spark and Tibco.
This allows ‘actionable analytics' to be carried out, where event data is collected from a source, fed into an analytics system, processed and then fed back to the source to trigger a new event. This could be anything from turning on a light to shutting down a machine.
Birouty said that Listener is effectively a tool that hides the complexity of data collection and distribution from data scientists, business analysts and developers, and removes the need for IT departments to set up data collection and analytics platforms.
"You have complete control on where the data resides and goes," said Birouty, explaining that having this level of control can avoid some of the problems with collecting big data when it streams in unchecked.
"When you have this type of environment where there are high volumes of data coming in and going out, the flow control of the data is important. It would be very easy to overwhelm your target systems with all this data coming in."
Listener helps to remove the need for data analytics to be reliant on specialist technical and data science skills, and creates a situation whereby analysts can concentrate on finding business-related insights from large datasets rather than trying to wrangle data from the right systems.
In many ways Listener can be described as a big pipe for big data with lots of smaller pipes inside ensuring that the right data flows to the right place.
Dipping into data lakes
Having a tool that pulls in a wealth of data adds the challenge of storing it. Companies are increasingly using open source frameworks such as Hadoop to create distributed file storage systems to act as an affordable and simple repository of data, often referred to as a data lake.
However, carrying out analytics tasks on huge datasets is complex, requiring data to be removed from the data lake, formatted and put into a separate analytics system. This takes time and often requires a dedicated analytics platform.
Birouty said that, beyond very technical users, companies are beginning to realise that they need analytics tools to work with Hadoop.
"It is a place to store data. It is a place where smart people like data scientists will go and write some custom code. But it is not a place where the business analyst or business user can do high-volume self-service analytics," he said.
"I think they've come to the realisation that Hadoop is important but it is not an analytics tool that you roll out to the masses in your organisation, because the tools and the skills just aren't there."
Aster for Hadoop was born from this situation. Teradata's version of its analytics software was described as having over 100 analytics techniques to find patterns in Hadoop data lakes without needing to extract that data.
Birouty explained that having Aster as a software layer on top of Hadoop is a way to make analytics faster and more convenient for analysts and business users.
"If you do have data on Hadoop, high volumes of it, it's not practical to move that data to another system. So having Astra run right there [on Hadoop] you can analyse that data without moving it," he said.
He also added that Aster on Hadoop also cuts out the need for businesses to set up and maintain a dedicated analytics hardware and software system.
Aster for Hadoop allows business analysts to carry out the type of analytics that is normally the domain of data scientists by combining machine learning, text, graph, pattern and statistical analytics in one process.
Analytics for all
The new Teradata tools indicate the company's ambitions to make big data analytics open for wide use across businesses rather than remaining the function of those with highly technical skills.
But this democratising of big data analytics is making the market very competitive. Numerous specialist and major IT companies are offering their own analytics tools that carry out the technical complexity for customers.
IBM's Watson Analytics offers a cloud-powered tool that makes use of the natural language and cognitive computing capabilities of the Watson supercomputer.
Splunk offers a platform that can pull operational intelligence data from enterprises and use a dashboard-led interface to ease the process of analytics.
Elsewhere, SAP's recently revealed Vora allows business-focused contextual analytics to be carried out on data stored in Hadoop clusters, repackaging its SQL engine from the HANA platform into an easy to deploy tool.
However, while Teradata may face stiff completion, this focus on providing software that takes care of complex and tedious data collection and preparation is a boon for businesses and end users as it opens data analytics to a wider audience.
This is likely to boost the adoption and use of big data analysis, in turn creating an enterprise environment where decisions are based on clear insight rather than instinct and experience, potentially transforming the way businesses operate.
Applications from some member states were down more than 40 per cent
A new RSA report urges coders to sign a 'Hippocratic Oath' before embarking on AI programmes.
IT security vendor believes APT33 is working for the Iranian government
Darktrace pushes machine learning to take some of the pressure off of IT and security teams