According to the hype, the market for data-mining technology ? a technique that reveals meaningful patterns in data ? will be worth billions by the turn of the century. And in general, the predictions of market analysts that the technology will be extremely relevant to business in the future are correct.
The best data-mining software tackles the business problems the technology is supposed to solve, rather than simply incorporating the hottest technology.
In some ways, data mining takes statistical analysis a step further, with some intelligent agents thrown in. Like statistical analysis, data mining is not in itself a business solution, it is just the underlying technology. Statistical analysis on its own cannot solve business problems.
Data mining involves two main analytical methods for turning data into useful information. The first is verification, which is the art of traditional statistics. It is the ?meat? of effective data mining, and is ideal for revealing facts in response to questions, and for monitoring changes.
Discovery techniques are the ?sizzle? of business intelligence. While these techniques have exciting possibilities, particularly as starting points for certain types of complex problems, they are not a substitute for statistical thinking and verification.
It?s likely that software using discovery techniques belong in your set of analytical tools to complement software with verification methods.
However, software which uses discovery techniques probably does not have the full range of analytical capabilities your company needs. Nor does it alone offer the range of analytical flexibility required to find unique advantages for your company.
Good quality software will present information in tables and graphs, and will allow users to export data within popular office suites. The software should also be flexible enough for you to quickly find answers to the inevitable follow-up questions resulting from your research.
Subject: The Whitbread Group
Activities: leisure and retail outlets
Uses: for exploratory analysis of its market research
The Whitbread Group has a turnover of nearly #3bn, 75,000 employees, several of the most successful beer brands on the market and 8,000 retail sites, ranging from Cafe Rouge to TGI Friday?s, Wine Rack and David Lloyd Leisure. It uses data mining for exploratory analysis.
The group?s market research department, managed by Martin Callingham, competes with outside suppliers to provide conventional market research services, spatial analysis using geographic information systems (GIS) and, most recently, a highly sophisticated service for planning, targeting and evaluating database marketing campaigns.
The company is using the Windows 95 version of SPSS, a desktop data mining and statistical software package, to drive its new areas of work.
Callingham says: ?Our whole raison d?etre is to help clients in the Whitbread Group to optimise the quality of their decisions. If we can try out different analyses of any given set of data quickly and easily, without spending a lot of time or money, we are breaking important new ground.?
Callingham?s approach is to let his team use SPSS as a prospecting tool ? install the software throughout the 20-strong department, and get the market researchers to play and and experiment with it to discover its benefits.
Two years ago, those experiments consisted largely of importing a file from Lotus, for example, running a relatively simple statistical routine and exporting it back again. Now, SPSS has been completely absorbed into the work of the department.
Complex data manipulations that would previously have been considered too daunting are carried out easily on a daily basis. Now there is every incentive to look beyond the obvious relationship between variables, and mine for the gold that lies hidden in the corporate database.
For example, every branch of the Beefeater restaurant chain provides detailed figures on the mix of sales. Cluster analysis is used to group outlets showing similarities in their sales mix, which provides a better understanding of performance and profitability.
Cluster analysis is also used in site location. By identifying clusters of population data that match the profiles of the most successful outlets, Whitbread has divided the old off-licence chain Threshers into a group of extremely profitable sub-brands, closely matched to their local population.
It is quite easy to carry out complex modelling exercises. For example, the company can now check the impact of various factors on the turnover of a given store, such as size, location and the number of customers.
Whitbread?s market researchers can also drill down through customer survey data and look for connections between groups of questions, and therefore associations of ideas in customers? minds.
In the past, a survey might have been used to gain a picture of how different outlets of the same chain are rated by customers. But it is now possible to combine that information with data on how much each of those customers would be prepared to pay for a meal in various designated restaurants or pubs. An analysis called ?curve fitting? is produced, which provides vital pricing information on each outlet.
Callingham concludes: ?The biggest single factor that enables us to use these sophisticated techniques so readily and without spending weeks on training courses, is Windows.?
Subject: Natwest UK
Activities: financial services
Uses: to analyse its customer records more easily so it can create new products
Natwest UK has recently divided into six key business units ? credit card, retail banking, insurance, life and investment, mortgage, and corporate business services. To get the best out of the commonly held data in its customer information system, Natwest wanted easier access to its customer information, and to help the system?s users analyse the data in a way that would fit into their working style.
Natwest UK has spent about #100,000 on a desktop data mining tool, Brann Viper. It wants its marketing department to use the tool to analyse its database of 50 million records of established customers and customer contacts more easily. Natwest hopes the desktop data analysis tool will help the marketing department build a ?train-of-thought? analysis process so that it can plan and create new products and business.
Initially, Viper has been set up on workstations in nine offices. Sixteen people ? campaign, brand strategy and business development managers ? have been trained to use the system. They are checking the quality of the data held and are learning about what can be gained from it, and how to manipulate and visualise it.
In the past, Natwest?s marketing staff had to write complex reports, submit them to the IT department and wait weeks for the results. Now, with this capability on their desktop, staff can quickly extract pertinent information from the data. For the first time, they have a customer-focused view of their data so that they can look for cross-selling opportunities.
By superimposing customer data from different business units on top of each other (a process whereby data ?objects? are dragged by mouse on to other data ?objects?) they can immediately see where areas overlap. More importantly, they can identify customers who have bought one particular product in preference to others.
For example, Natwest can see certain loyal customers in one category and, say, their commitment to mortgages. But the same batch of customers may also be prospects for house contents insurance. Such parallels can be drawn instantly from the data that appears on screen.
Like many other commercial institutions, Natwest had spent a lot of money building a data warehouse ? sold as the technological panacea to cure all ills. However, having created a monolith of 45 million accounts with 220 fields of data ? theoretically providing access to 9,000 million pieces of information ? the company found that its users had no means of visualising the results. Viper now provides a window on the warehouse.
Subject: Oxford Transplant Centre
Activities: transplant operations
Uses: to find out what factors affect transplant survival rates
The Oxford Transplant Centre uses data mining at its kidney transplant unit to find out what factors affect the survival rate of transplant patients.
Part of the Oxford Radcliffe Hospitals Trust, the transplant centre is analysing case-history data accumulated since the unit was set up 20 years ago. A variety of patient data is recorded by different laboratories and clinics in the centre, the most important of which are the HLA (human leucocyte antigen) tests carried out by the centre itself.
These tests quantify the tissue characteristics of donor and patient, and determine whether or not a patient should have a kidney transplant. A close match reflects the likely success of such an operation.
Other data recorded includes details about a patient?s skin, any tumour history, three sets of pathology measurements, as well as post-operative information gathered over the years following a transplant.
A core set of this data is kept in a proprietary database and is available to other transplant centres. Other information is collected in Excel spreadsheets or Xbase files, using an application written in Foxpro, by the centre?s staff.
More than 5,000 separate measurements are recorded for each patient, so finding relationships between them is beyond any manual procedure.
Dr Ken Welsh, immunologist and head of the data mining project at the transplant centre, says: ?In the past, we have either manually examined data in spreadsheets or used traditional reporting packages to extract results. We have then analysed them using a statistical package. This has often resulted in repetitive queries and analyses by different staff. Because reporting tools only give you the specific information you ask for, the main difficulty lies in the need to ask the right questions.?
The transplant centre is solving this problem by using the statistical techniques of a product called Knowledge Seeker from Angoss Software to find relationships between seemingly unrelated data.
The centre has already obtained some interesting information as a result of this automatic categorisation. For instance, it was not understood that the duration of a transplant operation has a direct impact on survival rates over five years.
Researchers found that one of the reasons why operations can take longer than normal is that blood vessels may be in poor condition and take longer to sew together. This, in turn, may result in a revision of surgical techniques, and lead to more transplant patients living longer.
Further insights have been gained into the way in which organs are handled after they have been removed from a donor. Typically, kidneys are chilled in ice to preserve them before a transplant operation. Knowledge Seeker has shown that this period, known as cold ischaemia, directly influences long-term survival. Before being stored in ice, the organ is warm. Knowledge Seeker has revealed that this period, warm ischaemia, affects survival rates, although the effect is not uniform. The reasons for this are being researched.
Knowledge Seeker has highlighted areas for research which would never have been identified using manual methods of analysis. More importantly, as a direct result of this information, transplant techniques have been updated, which in turn, has improved long-term survival rates.
Data mining lessons:
1 Data mining software is useless if it does not start with an understanding of real-world business problems.
2 You need people with the appropriate skills to deliver it; technology is only as good as the people asking the questions.
3 Collect the right data; to analyse data in a certain way, you must collect it in a certain way.
4 You must have user-friendly, graphical user interfaces. These GUIs must integrate smoothly into the business user?s overall decision support (DSS) application environment.
Ssupermassive black hole is so big it corresponds to four per cent of the galaxy's total mass
Imminent attack will target a single bank with cloned cards used to fraudulently withdraw millions over one weekend
Using photocatalysts to convert carbon dioxide into usable energy such as methane or ethane
Trained on curated data from Moorfields Eye Hospital, the neural network also shows clinicians how it reached its judgement