V3 Big Data Summit: As a child, one of my favourite books was The Foundation series by Isaac Asimov. I was fascinated and enchanted by the possibility of predicting the future.
This possibility was based on the premise that at some point, when the amount of data being generated by humans reaches a significant level, the predictive algorithms would be able to model the societal changes with a fairly high level of statistical significance.
I think we can safely say that we as humans have reached that point where the amount of data we generate is 'significant', to say the least. At the point of writing this blog, according to some publically available statistics we are generating about 500TB of data on Facebook in just a day. I am willing to predict that this figure is very likely to be higher depending on when you are reading this in the future.
If the amount of data needed to predict the future is definitely no longer a constraint, and the predictive analytics modelling keeps getting more and more sophisticated, what is stopping us from peering into the future?
Well, for one, Asimov may have assumed, or did not give much thought to, the fact that the data being generated today is in different formats, in different systems and often remains unlocked as 'dark' data. And that may not be the only challenge.
Big data technologies have given us the power and the potential to store and process vast amounts of data, but sadly we still don’t have a single silver bullet to assimilate all data from different sources, harmonise, cleanse, match and merge it and then feed it to a sophisticated analytics engine that can then do the predictive modelling. These are the required steps that need people, process and technology and is still woefully inadequate, making acting on the vast amounts of data still a challenge.
More importantly, the analytics engine may ask for the data in different contexts depending on the questions you ask the engine to answer. For example, depending on whether you want your predictive analytics tool to figure out the segment of patients with the highest likelihood of a particular disease, versus the segment of patients least likely to be able to pay their medical bills, the 'context' of the data needed to feed to the engine could be very different.
In big data, what is known as the Kappa Architecture is all about addressing this main challenge of how to model the right context when needed. Within a Kappa Architecture, all data, whether new or an update to an existing data value, is treated as an immutable log event. This ensures storing all data as stacks of raw data with event and meta data that then at a later point can be used to replay or materialise into a context depending on what you want to pull from the stack and feed to your analytics engine.
Another big advantage of this architecture is that data remains fluid and doesn’t necessarily need to conform to a schema or model when it is persisted. The model can be applied to the data stack at consumption time with schema on read or late binding schema methodology.
Kappa Architecture is definitely on a track to taking humans one step closer to future forecasting by storing and replaying data in an intuitive and efficient fashion that is radically different from traditional data management approaches. But the fact remains that businesses still need to have a rock solid process and strategy around acquiring data, ingesting the data, cleansing, harmonising and applying business rules that are specific to the overall data goals.
Data-platform-as-a-service (dPaaS) is one such solution that addresses this squarely with integration and big data technologies. If you are looking to use your data to mine valuable insights (including predicting the future) I urge you to take a look at the ALLOY platform, our dPaaS solution, and how it may help you.
Data today is available everywhere and it is a matter of time before we as humans start unlocking it at the right time to predict the future. There is a thing about predicting the future though. As soon as you predict the future, you end up affecting it when you choose to share it with the subjects and cause them to change their behaviour. For example, if I told you that, according to my predictive modelling, I predict that about 100,000 people will read this blog in the future, a number of people may purposely choose not to and affect the prediction and the future. Kind of mind bending, come to think of it.
If you are in the big data business, though, how you go about unlocking your dark data doesn’t have to be mind bending when you apply the context brokerage methodology on dPaaS.
As for me, I am going to predict that I would be reading Asimov’s series one more time. That is one prediction I can still control the future of.
This blog was written by Madhukar Kumar, vice president of products at Liaison Technologies