The CIO of the Met Office, Charles Ewen, has explained how the size of data the organisation is grappling with to put together its forecasts is already starting to drive a demand for new types big data models.
Ewen explained to V3 that a typical weather forecast weighs in at around 400GB every time information is fed through the system. The Met Office runs the forecast several times a day using a number of different model configurations to ensure the best possible outcome.
“We run weather forecasts using the 'ensemble technique'. Rather than running a forecast once, we run the same model a number of times and, every time we run it, we make slight variations in initial conditions to explore how much that might change the forecast. After all, the weather is a classic example of a chaotic system," said Ewen.
“If we get an 'ensemble' of forecasts that all say very different things, then we can't really have any confidence in any of them. But if we run that forecast a number of times with slightly different assumptions and get much the same answer, then we can have a high degree of confidence in all of them," said Ewen.
The approach generates useful information around probabilities and confidence, which can be delivered as part of the forecast.
As noted, though, this creates a lot of data, and it’s not data that is just required by the Met Office.
"It's not just that the forecast is big, but also, no one wants yesterday's weather forecast. It's not just about moving 400GB of data, it's about moving that data quickly enough so that the Met Office and others can do something useful with it," he said.
"That's a growing challenge."
So, as weather forecasts get bigger and more complex, the Met Office is already considering new ways to deal with rising data loads, with the goal of passing on as much information as quickly as possible but also reducing the dataset size to be as small as possible, while containing all the relevant information.
"There are two fundamental approaches to achieve this. The first is to be more selective about the data that is sent. The second is to bring problems - or algorithms - to data," he said.
"We are beginning to use geospatial standards. Instead of taking large datasets and extracting the information that you need from them at destination, increasingly we're subsetting data of interest at source. The subsetting domain could be geospatial or temporal," said Ewen.
However, even this approach is unlikely to hold up in the future.
"Over the course of the next decade or so, even the data required to do that will get too big and too unwieldy," he warned. "We're looking at emerging technologies that are about truly bringing problems to data.
"There will be applications in which the most efficient method will be to operate algorithms and smaller data against very big data, at source. We already have some early examples of these using emerging technologies," he said.
Different data scientist demands
With this huge rise in data it would seem a logical step to assume the Met Office is hiring lots of data scientists to help. However, Ewen explained it’s not quite that simple.
"You can think about data at the Met Office in three 'domains'," said Ewen. "The first thing is data analysis to describe what has happened and what is happening. The Met Office's observations programme, for example, is all about establishing a picture of the atmosphere. That's one domain.
"The second is why did it happen? And that's a different question to 'what's happened'. Then, based on understanding what happened and why it happened, you stand a fighting chance of predicting what will happen. So, three clear domains: what happened, why did it happen and what will happen?
"If you break those domains down, and you ask the question, what would a data scientist add? Well, in the area of what's happened, potentially quite a lot because there's quite a lot of value to be added in the 'what happened?' area. And, because it's largely the realm of statistics, and big data and all the kinds of things that data scientists are about, that's good."
However, with regards to the question "why did it happen?", Ewen argued that's largely driven by the laws of physics and highly specialist knowledge: "So rather than have a data scientist answer that question, you're much more likely to get a better answer if you have a physicist answering that question,” he added.
"Consequently, in the domain of 'why did it happen?' for us - and this won't be the same for everybody - data scientists probably wouldn't be much use.
"In the domain of 'what will happen?', purely on weather forecasts, once again, they'd probably not be much help. But if you think more broadly that's not the question that people are often asking. People are asking questions like 'will my high street likely be busier tomorrow than it was today?'," said Ewen.
That might involve basic climate data, but mixed with business data that the average data scientist ought to be most comfortable working with.
"If you're looking for an answer to that kind of question, a data scientist potentially has tremendous use, because he or she can use their statistical, correlative, non-diagnostic techniques to make predictions - but in the day-to-day business of weather forecasting and climate projections? Not so much."
The Met Office's deputy director of applied science and scientific consultancies will be presenting at Computing's Big Data & Analytics Summit, along with a number of other big-name experts from industry and the private sector.
Attendance is free to qualifying end-users, so book your place now before they all go
Apple, Samsung, Google and others rush to go ever-higher upmarket is putting off potential customers
Laser tech can charge mobile phones from across a room
AMD's Zen chip roll-out continues with the focus on high-power embedded applications
And becomes the team's executive chairman to boot