A report published in January by IT analyst, DataQuest, has found corporate data. Dennis Howlett looks at how tools can help users slice and dice their way to information gold. that many companies see data mining as a data warehouse's most important role. Of the 177 companies across the UK, Germany and France that had implemented a data warehouse, some 20% saw data mining as key.
This is not surprising, given how the respondents rated the importance of data warehousing. The most important reason stated was to provide better information for decision making support, followed by using the technology to get, or keep, a competitive edge.
"IT vendors need to wake up to the ongoing potential of data warehousing and start converting this into real revenue and market share," commented Arthur Hochberg, principal analyst at Dataquest Europe. "There has been a tendency within the last year to see data warehousing as old hat, but this simply isn't true. There are hundreds of large European companies that have not yet upgraded their IT architectures to include data warehousing."
Data mining is the logical extension of the data warehouse (or data mart), and seeks to go beyond complex analysis of the kind envisaged in OLAP (Online Analytical Processing) systems. However, it is both exciting and confusing, area of computing technology. Based on statistical techniques that are one step removed from rocket science, data mining is really about trying to find that information which will make a significant impact on the business. This is achieved by asking direct questions of the data.
For instance, a motor lubricant distributor might ask: "What characteristics define the top 10% of lubricant X customers in the north west of England, compared with those in the north east?" This contrasts sharply with other forms of decision support, where you start with some kind of hypothesis and then build from there. For example, you might want to know who are the top 10% of lubricant X's customers. The difference may appear subtle, but is proving to be critical to those engaged in strategic planning and tactical marketing. A useful way of looking at data mining is as a process of knowledge discovery, or learning.
According to Ovum Evaluates: Data Mining, the current market is divided into three broad segments; desktop, toolbox and enterprise. At the desktop, we have products like 4Thought from Cognos, that performs powerful statistical analysis on large amounts of data at the desktop level. Here, there is no requirement for a warehouse, but a need to perform complex analysis with a view to isolating specific factors affecting a business, with tools typically pitching in at the z1,000 per seat mark.
At the toolbox level there are products like Datamind and Pilot Discovery Server. These are a quantum leap above desktop products, and the prices reflect this, typically coming in anywhere between $10,000 (z6,061) and $100,000 (z60,610) - depending on the scope of the project. According to Ovum, this segment is broadly divided between those vendors that supply generic toolboxes and those that are working towards providing packaged applications.
At the enterprise end are those applications based on products from the likes of SAS Institute, where there are vast amounts of data to be mined and where the use of neural nets and artificial intelligence is the order of the day. This is where big business can make huge gains and projects could be any size. Costs vary accordingly, but you would typically expect to see a data mining project coming in somewhere between $100,000 (z60,610) and $250,000 (z151,515).
The reasons are obvious. If you are a large telco and are experiencing alarming churn rates amongst customers, then you are going to invest in finding out what is going on in order to stop the rot. This is because telco's - especially in the mobile market - need to hang onto their customers for long periods to achieve a payback. Data mining can provide clues as to what needs to be changed, or what type of offering is appropriate to customers that exhibit a profile associated with churn rate risk. In one case, such an exercise gave the telco the opportunity to reduce churn by an estimated 25% and so increase bottom line profits.
An extreme example of a company making significant use of a warehouse that incorporates a range of decision support tools, including data mining, is Capital One Financial Services, one of the world's fastest growing credit card companies. It decided some years ago that competitive edge could only realistically be gained by making itself an information led business. It developed an IT strategy which concentrates on information retrieval. Today, the company claims it can quickly spot, for example, those types of customer who are likely to default from a series of statistically accurate criteria, and act accordingly. As a result, Capital One says the attrition and default rates it is able to offer customers are relatively low in comparison with other vendors in the marketplace.
Dave Buch, IT director at Capital One Financial Services points out the benefits. "Back in 1987, we figured that our business is nothing to do with credit cards, it's about information," he said. "Over time, we've developed an information system that allows us to mass customise our products to get away from the "one size fits all" idea that characterised the old credit card market."
Buch claims that as a result of operating the business out of an information led strategy, the company saw customer growth of over 40% in 1995-96, faster than its nearest competitor by 13% Revenue growth outstripped customer growth, turning in a 63% increase in 1995-96.
Buch acknowledges the technical challenges are enormous. In particular, he identifies juggling with keeping hundreds of ad hoc users happy, providing availability every hour of the week and massive storage, backup and recovery issues as key. Capital One found that technically, it had to build its own data structures rather than rely on traditional approaches. In addition, Buch isn't always happy about the quality of available tools with which his analysts have to work. However, that is hardly surprising when one considers that Buch is juggling terabytes of data daily.
However, Capital One is only one example, and once you get past the usual plethora of finance, retail and telco stories, there are other areas where data mining is proving useful. For instance, SPSS software is being used by The Commission for Racial Equality to analyse data relating to discrimination to discover causes. In another example, Thinking Machine, Syllogic and GK Intelligent Systems provide data mining features for Cabletron's advanced network management systems. Thinking Machine, for instance, allows customers to find network diagnostics information that would previously have been hidden due to the sheer size of the data in Cabletron's Spectrum. Syllogic, on the other hand, uses learning technologies to discover how applications respond to resources, and so identify potential bottlenecks that would otherwise have been difficult to locate.
The problem for the many decision takers who would like to take advantage of these mining techniques is not the availability of product, of which there is plenty, but the cost of putting together a usable warehouse in time to make it useful. In a companion publication, Ovum Evaluates: The Data Warehouse, the authors state: "Data warehouses must be regularly updated from operational information systems. Most data warehouses are still populated using custom-built 3GL or 4GL applications. Maintenance of this legacy code adds to the already considerable maintenance cost of the warehouse. If there is insufficient programming resource available to keep the programs in step with modifications to the warehouse or source databases, the availability and quality of the warehouse data will be reduced. This will cause a loss of faith in the warehouse and, ultimately, the failure of the project."
It is hardly surprising then, that taking all these factors into account, many companies are fighting shy of creating warehouses from which to mine data. However, this is set to change dramatically in the next year or so, following the introduction of Plato - Microsoft's OLAP server in the Enterprise edition of Microsoft SQL Server 7, due for release around the beginning of the second half of this year. As a freebie, it has shaken the traditional OLAP vendors and although they are putting a brave face on it, they are clearly worried about what this means for them.
Does this mean the end of data mining for the traditional OLAP vendors?
Probably not, because they occupy a special position in the market and will introduce new techniques to make their offerings more intelligent and so, more attractive. One company that's seeking to do this is Gentia, with its balanced scorecard approach. This is a technique that does not rely purely on financial data, but includes soft data like customer satisfaction levels to provide an overall picture. Gentia can mix data types to provide users with a richer set of information on which to make decisions. Whether this is taken up in a big way remains to be seen, but it is a valuable contribution because advanced analysis of the type found in data mining cannot be conducted in a vacuum. It has to be placed in the wider context of general and specific business knowledge. In turn, this should be made available from the system which already holds stores of additional non-financial data.
Data mining has provided companies that can afford it with a significant advantage. The good news for those using technology to maintain competitive advantage is that regardless of how good a product is, it is still a tool and not a replacement for creative thinking which, for the foreseeable future, is the purview of human intuition and judgement.
TM1, the OLAP reporting tool from Applix, has been used by Reuters since 1991. Geoff Swettenham, manager of financial reporting for Reuters UK and Ireland, said he came across TM1 purely by chance, having been shown it by a friend at the end of a meeting.
It is now used for financial management such as producing profit/loss reports, balance sheets and forecasts and budget reports. Swettenham said that with TM1, reports which took two weeks to produce previously, could be run off in two hours. "We were able to reduce the staff in my department and also save an awful lot of accountant's time," he said.
Plato: set to change the OLAP market
The reason for the excitement is that Plato is free and will be relatively easy to set up compared with other OLAP engines. It will be the first time that the many lower-end customers who clamour for better information will have access to a low-cost server that offers anything like enterprise class scaling and performance.
According to OLAP guru, Nigel Pendse, Plato will radically change the shape of the OLAP market largely because it allows plenty of scope for new entrants in the front-end tools market. Already, and before the product is ready to ship, tools vendors like Cognos and Arbor are jumping on what looks like Microsoft's boldest step yet into the murky OLAP seas.
Strictly speaking, Microsoft is not in the data mining market at this time, but provided it persuades customers and software vendors to invest in what it is offering, then it will only be a matter of time before it starts picking off existing OLAP and data mining vendors. However, when speaking with others, there is an alarming silence to the Plato threat.
According to Stuart Holness, managing director of Microstrategy UK, "Plato is only providing hypercube technology and, from what we have seen, it is unlikely to affect the kind of business we do at the high-end which is relational OLAP based." John Watton, Oracle's decision support tools marketing manager, said: "Microsoft is giving users good news, but to us it looks like a lower-end product, falling short at the enterprise level."
However, according to Pendse, potential competitors ignore Plato at their peril: "There's a general feeling of unhappiness in the user community with the arrogance of the OLAP vendors," he said. "I am surprised at vendor complacency because there is no doubt in my mind that Plato will nibble into everyone's market." Microsoft is not being gung-ho, and uncharacteristically is hedging its bets. Chris Brown, architectural analyst at Microsoft, said: "We will commoditise Plato but won't have the full story on day one. I guess that Cognos, Knossis and Arbor's Wired for OLAP will be among the top solutions providers, but there will be others over time." In the meantime, Microsoft is planning on adding functionality to Excel for the next release to take advantage of the multi-dimensional capabilities of Plato. This will include some form of time intelligence - critical for any OLAP offering. Essentially, these are developments of the existing pivot tables features, and it will provide enough for smaller businesses to get a taste of what OLAP and data mining can be about. Whether you believe Microsoft is about to launch another limp wristed performer or not, the industry view is that, for once, it has done enough to get people into a technology that has been denied to all but the select few. In a sense, the change being brought by Plato is almost as much of a quantum leap as that which came with Access.
Prior to Access, database development was largely in the hands of trained developers and it was hard work. Plato is breaking a few moulds, especially in the ease of use stakes. It will require the skills of people who understand the difference between transaction processing and analysis systems - it is a fair gap rather than a chasm - but it will be easier and cheaper to get a data mart and perform rudimentary mining operations than at present.
Dust storm on Titan only the third Solar System body where such storms have been observed
New technique could enable quantum computers to scale-up to millions of qubits
Systrom and Krieger taking time off "to explore our curiosity and creativity"
Comcast's £29.7bn winning bid more than twice the £13.7bn Rupert Murdoch valued Sky at just eight years ago