Blind man's buff

Heuristic analysis can be used to generate reasonable solutions where there is little theoretical understanding of the nature of the problem to be solved

Anthony Harrington

Heuristic analysis is a well-known method of problem solving in the computer world. It's used to generate reasonable solutions where there is little theoretical understanding of the nature of the problem to be solved.

Typically heuristic analysis is put to use in antivirus products to detect new viruses and variations of old ones. While there may be little theory on the release of new viruses, protection has to be offered.

Advertisement

The heuristic method can be described as finding the most appropriate solution of several by selecting alternative methods at successive stages of a program for use in the next step of the program.

Getting to grips with heuristic analysis is not particularly difficult. You just need to set aside a few entrenched ideas about how solutions should be generated.

Nicholas Radcliffe, CEO of data mining specialist Quadstone, defines heuristics as a basic trial and error strategy, rather than a theoretically well-grounded solution set.

In fact, as he points out, one often surprisingly powerful heuristic is to use random selection as a way of deciding between competing possibilities. Network professionals readily appreciate the importance of this idea.

Radcliffe points out that the intuitively obvious choice of opting for shortest path routes as the first choice to connect up multiple nodes often turns out in practice to be the worst way of designing a network. It produces massive bottlenecks in the shortest path routes while leaving much of the rest of the network under-used.

"Similarly," he adds, "when humans are asked to join up, say, a matrix of processors, they invariably opt to join them in a regular pattern, linking each processor with its four neighbours in a two-dimensional matrix, or with its six neighbours in a three-dimensional space."

Again, this turns out in practice to be a terrible way of linking all the processors since it creates the longest and most tortuous collection of paths between collections of processors. A much better approach is to join each processor randomly to four or six others. Surprisingly, this gives pretty well the optimum network design.

Quick and dirty

Generally what is needed to make a good heuristic solution that will generate confidence is a second or third level of heuristics that improves on the initial solution. The big proviso here is that these additional levels have to be calculable fast enough to keep the ?quick and dirty, but pretty good' characteristic that makes heuristic analysis valuable.

To summarise, heuristic problem solving involves finding a set of rules or procedures, often through trial and error, that generates satisfactory rather than perfect solutions to specific problems - the rules tend to be problem specific, or class of problem specific, which means that different situations can generate very different heuristic solutions. Different starting points within the same problem generate solutions with differing relationships to the optimum solution - some are better, some are worse. The user will then have to devise a way of weighing the solutions to arrive at the best solution from the set to emerge.

Often the heuristic analysis will allow a second iteration, or even many subsequent iterations, which can improve the solution. In some cases, different solutions can be combined to generate an improved heuristic solution. This last approach, combining multiple heuristic solutions, is the route chosen by MessageLabs, an email virus detection ASP.

"In many ways MessageLabs has an advantage over the antivirus vendors that aim to protect desktop PCs," says Mark Sunner, chief technical officer.

"Trying to identify viruses in the macro environment of the internet, where traffic flows are very visible as opposed to the ?one at a time' serial world of the desktop, is actually a relatively easier proposition."

As it turns out, the email gateway problem is tailor-made for the heuristic detection of viruses.

The problem that needs to be addressed is to discover viruses for which there is as yet no recognised signature. In other words, the search is for virus-like behaviour.

MessageLabs has a system called ScepticT (a complex collection of heuristic algorithms and processes), which is a way of weighting emails to deliver a conclusion that email A is almost certainly a virus.

False positives

"Part of the challenge in designing a detection system is that we have to avoid unacceptable levels of false positives. If our system keeps identifying perfectly good emails as potential viruses, and blocking those emails, our clients would get annoyed and that would damage our business," explains Sunner.

The one big failing in conventional antivirus technology is the window of opportunity the new virus enjoys between the time it first appears on the scene and the time when the industry gets an antidote together.

Typically, what happens is that shortly after a new vulnerability is discovered and publicised, a virus will appear that exploits that weakness. This is something that the antivirus suppliers have recognised and are working on.

"The current IIS-server antivirus filtration version of Kaspersky reliably defends computers against all known versions of the Code Red worm, and does not require the Microsoft patch," says Eugene Kaspersky, head of anti-virus research at Kaspersky Labs.

"Soon, the program will have built-in heuristic technology capable of detecting and neutralising the attack of even an unknown virus using the buffer overflow approach of Code Red."

MessageLabs uses a similar technique. "We caught the Nimda virus immediately on its first appearance, since we were looking for anything that tried to use the exploit which Nimda targets," says Sunner.

Heuristics will target other parts of a file to work out whether it's a virus. These include double extensions on filenames, lots of white spaces in the filename (a technique for trying to push a .exe or .vbs file extension off the edge of the viewing screen), which would score very highly as virus identifiers. Attempts to hide the code content of the email from an emulator or a sandbox would also signal up viral activity.

But heuristic analysis is still in its infancy as far as scanners go. Sunner warns that although many antivirus suppliers claim their products contain heuristic elements, not all heuristics are equal.

The Magistr virus, released this year, reappeared in the top 10 virus list after it had been successfully detected. The code base had been substantially reworked so that even heuristic scanners didn't have a chance.

Virus epidemic

Denis Zenkin, head of corporate communications at Kaspersky Labs, warns against the danger of new variants. "Don't be in doubt that the latest Magistr modification has the potential to be as widespread as the original," he says. "This could lead to another global epidemic." Heuristic analysis has also been the cause of many problems, when innocent files are detected as viruses (see box below).

Heuristic detection has to be implemented carefully to avoid this kind of problem. If an engine is too sensitive it will generate false positives, but if it's not sensitive enough it won't detect new viruses. For this reason, MessageLabs firmly believes that heuristic scanners should be run as a managed service.

Alex Shipp, chief virus technologist at MessageLabs, says: "One of our main weapons against the virus writer is that, unlike antivirus software which they can buy off the shelf and examine, they can't get to look at our system, so they have to guess at our defences and their virus only gets one shot at getting through us. When it's caught, it's caught forever."

Heuristics have a role in a wide range of products, including systems to analyse telephone fraud. Used correctly, with a multi-layered approach, the problem of false positives associated with heuristics can be demonstrably reduced, to the point where the good work done by the product far outweighs the rare false identification.

What the experts say

Graham Cluley, senior technology consultant, Sophos Anti-Virus:

With 1,000 new viruses written every month, it is attractive to have a solution that can protect against malware before it has been examined in a virus research lab.

The problem is that many heuristics are prone to false alarms - detecting a ?new' virus when one is not present - and causing more problems than a genuine virus infection. There are stories of companies shutting down their networks because their antivirus has goofed up.

Earlier this year, one antivirus product's heuristics went badly wrong when it false-alarmed on another supplier's email warning about the Homepage worm. Because the warning mentioned the filename homepage.html.vbs in its description, the antivirus product mistakenly accused it of being infected with that worm.

Some ISPs offering a virus scanning service use very sensitive heuristics to detect new and unknown viruses. For instance, they will look at any Word document containing a macro in case it is viral.

The issue here is confidentiality. Some companies are understandably reluctant to give permission to third parties to open their documents or spreadsheets.

Heuristics have a place in virus protection and most modern antivirus products contain them. However, it is important that their use is carefully balanced against the danger of false alarms.

Mark Sunner, chief technology officer, MessageLabs:

Surprise will always be virus writers' most potent asset in unleashing terrorism on the world.

Fortunately, in the business of virus detection, we have the element of surprise covered with heuristics. Conventional scanning can catch the back catalogue of viruses, but effective heuristics captures the new virus before it can wreak havoc.

That's why a rapidly increasing number of businesses are so concerned about the heuristic capabilities of their existing antivirus protection. They now recognise they need protection against both known and unknown threats.

An antivirus solution without a comprehensive heuristics capability is incomplete for two reasons. First, sophisticated new viruses are on the increase. Second, conventional scanning is not suited to catching script viruses.

If your scanner doesn't have a heuristics capability, you're wide open to new viruses and variants.

Heuristic hostilities

Heuristic antivirus detection has a history of causing conflict between antivirus suppliers.

As early as 1997, McAfee claimed that the heuristic scanner in Dr Solomon's was programmed to detect when it was being reviewed and then jump into action to give higher test results. The claims came after it was found that Dr Solomon's Advanced Heuristics Detection engine only kicked into life after 11 viruses were detected in a row. It was claimed that this was misleading as this kind of detection rate wouldn't happen in real life and gave a false impression of the power of heuristics.

Since then heuristics have been the cause of many a supplier feud. In May this year McAfee was back in the headlines when it thought it had found the Homepage virus in a newsletter sent by Sophos.

The mix-up was caused by McAfee's heuristic engine detecting the string ?VBSWG' (the toolkit used to create the homepage virus) and a quoted filename with a double extension.

This was enough for McAfee to block the email. While McAfee was quick to point out that it detected the virus without the need for an update, it illustrates that heuristic engines need to be very carefully designed to avoid this kind of problem.

In October, Norton caused trouble when its heuristic engine falsely detected a trojan in the MSN web site. The suspect file (www.msn.co.uk/webinclude/mc.vbs) was in fact completely harmless and Norton had to release an update for its software so that the web site could be visited.

Similar problems can occur when antivirus suppliers try and write detection code. Norton got itself in trouble (again) with F-Prot and InstallShield in November, when a badly written definition file started detecting the Nimda virus.

It claimed that executables used by both programs were infected and either quarantined or deleted the files, rendering them useless. Norton had to release an update that correctly detected the virus and release instructions on how to recover files affected by the mistake.

  • Have your say
  • Send to a friend
  • Print
  • Digg
  • Reddit
  • Share

Tags:

Do you agree?

Further reading

Related whitepapers

Related jobs

Most watched

eu flag

V3.co.uk weekly debrief, 6 Nov 09

This week, Europe decides what to do with illegal file sharers

Intel unveils its micro server platform

Small-enclosure systems take aim at hosting market

IT white papers

Search white papers

Top categories

Poll

Impact of Information Overload poll

Impact of Information Overload poll

What is the biggest problem your firm faces as a result of the data explosion?

View poll results

Advertisement

Advertisement

Newsletter signup

Sign up for our range of FREE newsletters:

Existing User

Newsletter user login:

Enter email address to edit your newsletter preferences

Job of the week

Search thousands of IT jobs :

Search thousands of IT jobs:

Advanced search

Hiring now on ComputingCareers:

Related IT jobs

Search thousands of IT jobs :

Search thousands of IT jobs:

Advanced search

Spotlight

eu flag

V3.co.uk weekly debrief, 6 Nov 09

This week, Europe decides what to do with illegal file...

Dell Adamo XPS

Dell launches ultra-thin Adamo XPS

World's thinnest laptop will be available by Christmas

Top 10 articles, 6 November 2009

The worst Microsoft products of all time, and a USB...

Iain Thomson

Pirate Bay shutdown could be inspiring online militancy

Recent Swedish attacks raise worrying possibility

Primary Navigation