All the latest UK technology news, reviews and analysis

The hunt is on for an alternative to the inadequate search engine

by

22 Oct 1998

Be the first to comment

  • Tweet this

Search engines are not doing their jobs. Consequently, new strains of the technology are being developed to challenge companies such as Yahoo, Excite and Altavista.

Internet users, according to a recent 'Business Week'-Harris poll, spend 50 per cent of their time online conducting research. The portal sites such as Netscape Netcenter know that a better search engine means more visitors - which is why companies are trying to improve their search capabilities, turning to projects such as Google, Clever, Direct Hit and Looksmart.

Following in the footsteps of its predecessors, Yahoo and Excite, a third search engine start-up has emerged from Silicon Valley's Stanford University. Google is based on three years' research by two graduate students, Larry Page and Sergey Brin, who take a new approach to relevance ranking.

Their product looks at the way Web sites link to one another and then orders matches based on their 'importance' - how frequently they are pointed to by other 'important' Web sites (see Newswire 20 October).

"You're the sum of the importance of the things that point to you," said Page. He added that other search engines use linking relationships to build indexes, but do not do a good job of prioritizing results based on link relationships.

Google currently has more than 25 million indexed pages and hopes to raise that to 100 million. Researchers estimate there are currently more than 300 million publicly indexable pages on the Web.

Google is not the only start-up developing new ways to search the Web. Clever is being developed at IBM's Almaden Research Center. Like Google it ranks pages by calculating links between them and measuring importance but it does not crawl the Web.

The system, based on an algorithm developed by Jon Kleinberg, a Cornell University researcher, looks at the links between pages and ranks the important pages at the top of the list.

"The principle of Clever is to exploit the work of millions of participants on the Web, who are all over the world, creating pages without any centrally directed motivation," said Prabhakar Raghavan, a researcher at the research centre. "The Clever system exploits these distributed judgments and aims to extract one consensus view on any given topic."

For example, people with soccer fan pages point to the same kinds of sites. Not only are they pointing to other sites, they are annotating what they point to. Clever follows these links, processes the text around the links and uses the information to draw out the communities around the topic in which the searcher is interested. The result is a page of relevant links separated into 'hubs' and 'authorities'.

A third contender comes from venture capital backed Direct Hit, which has a system that records users' behavior and documents popular results. The company has produced technology that tracks millions of Internet searches and records which pages users visit from a list of results. The data is then used to determine which pages are popular and they are ranked accordingly.

Direct Hit also factors in the frequency with which Web sites have been visited by previous Internet searches. By keeping track of the outcome of previous searches, the company plans to create a market for that information.

"The efforts of all these millions of searchers out there is actually a by-product that search engines aren't capturing," said Direct Hit inventor Gary Culliss, a former patent agent and graduate of Harvard Law School.

Culliss entered Direct Hit in the MIT business plan contest this year, and won. He met Mike Cassidy, a recently retired software company co-founder at the MIT contest and the two enlisted the help of systems designer Steven Yang, who created the Direct Hit prototype. Venture capital firm Draper Fisher Jurvetson provided the company with $4.1 million dollars in funding.

Launched in 1996, Looksmart provides category based navigation services for the World Wide Web. The company's Web based directory guides users to more than 600,000 relevant Web sites in more than 23,000 subject categories.

Looksmart includes an integrated search engine that delivers keyword matches to reviews in the directory and uses Altavista's patented search technology. The company has exclusive alliances with Altavista, Hotbot, @Home, Comcast and is a premier Netscape search provider.

Looksmart lets users explore through categories to help pinpoint Web destinations tailored to their interests.

The company received its initial investments from Reader's Digest Association. Recent investment rounds include Cox Enterprises, Australia Mezzanine Investments and MacQuarie Bank.

Of the 320 million publicly indexable pages on the World Wide Web, only between three per cent (Lycos) and 34 per cent (Hotbot) are touched by the six main search engines.

And a recent study by 'Science' magazine reported that none of the six most popular full text engines - Hotbot, Altavista, Northern Light, Excite, Infoseek and Lycos - does a very good job of tracking the Web sites in existence.

After Hotbot's 34 per cent, Altavista came in second for Web coverage with a 28 per cent return and Northern Light third with 20 per cent. Lycos trailed the group by locating just three per cent of the indexable pages on the Web.

"I'm shocked and disappointed," said Lycos engineer Sangam Pant, "because we didn't do as well as I thought we would."

Rajive Mathur, group product manager for Lycos, argued that the study showed that Lycos is the most accurate engine because it returns the fewest dead sites. He pointed out that Lycos specialises in general interest sites that people need most and that the study searched for obscure scientific computer terms.

The other vendors agreed that "the key is to give users the best hits, not the most". Graham Spencer, chief technology officer at Excite, explained: "You can list page after page of useless links, but most people only look at the top few choices."

'Science' author Steve Lawrence said: "Your best bet is to use multiple engines and combine the results." He and co-author Lee Giles found that combining the results of the six engines "yields about three and a half times as many documents as a single engine".

A search engine expert with Delphi Consulting Group, Carl Frappaolo, said we can look for a new breed of search engines in the future.

"As the Web continues to grow, it'll become even harder for one search engine to cover everything," he said. "If you need medical information, you'll use one engine; if you need financial data, you'll use another."

And Julia Pickar, an analyst at Zona Research, added that, although the major search companies are trying to imbue some human logic in these searches and help users get more relevant results, it will remain a hurdle because of the dynamic nature of the Web.

"The search process is extremely flawed," she said.

Yet the major search engines, including Yahoo, Excite, Infoseek, Lycos and Altavista, typically do a good job. Directory-type online search services such as Yahoo offer fewer but more concise listings. Yahoo offers roughly 500,000 listings divided into 25,000 categories while Altavista, a crawler or robot based system, includes about 300 million listings.

Altavista, a subsidiary of Compaq, recently said its upgraded site now features the AV Full View Search, a combination of three search techniques, index, directory and question and answer.

This allows users to ask questions in plain English and receive a short list of intelligently matched, confirmation questions as well as Altavista's keyword search results.

Some engines, such as Lycos, have both an automatic component for locating sites and a subject based Web guide, which is searched first.

And Infoseek recently introduced its Express tool, which claims to work as well with other portals and search sites as it does with Infoseek. The product lets users simultaneously search several sites and provides extended searching capabilities for Web site within specific topic categories.

Behind some of the most popular search engines (Hotbot and Yahoo, for example) is a small software company called Inktomi. The company, which supplies the search engines itself, has a blue chip collection of clients and investors including Microsoft, Intel and Sun Microsystems.

Microsoft and Inktomi recently announced the availability of the Inktomi powered Microsoft Web Search service. The launch of MSN Web search includes a new design scheme, new services and tighter integration among properties.

While search engines continue to index tens to hundreds of millions of Web pages, the new technical challenges are being met by companies such as Google and Clever and Direct Hit, as well as Yahoo and Altavista.

Although the amount of information on the Web continues to grow rapidly, the question still remains, am I searching for information or for a needle in a haystack?

Do you agree?

 

Add your comment

We won't publish your address
By submitting a comment you agree to abide by our Terms & Conditions. Your comment will be moderated before publication.

Poll

Flame virus poll

Are you confident that the UK's IT infrastructure is secure from attack in the wake of the Flame malware revelations?

31%

2%

15%

52%

Connect with V3.co.uk

Sign up to our daily or weekly newsletters

Riso

Colour printing: why the bill keeps outstripping the budget

The wrong printers, for the wrong tasks on the wrong contracts

Qlikview

Magic quadrant for business intelligence platforms

Who leads the BI pack and who should we be watching out for?

Web Developer (ASP.NET C#) - Leeds / Yorkshire

ASP.NET Web Developer ( ASP.NET, C#, SQL Server, CSS...

Technical Consultant, Back Office (IMMEDIATE STARTERS)

THIS ROLE IS LOOKING AT IMMEDIATE STARTERS AND WITH MULTI...

Sales Consultant - Datacentre

Sales Consultant - Data Centre, Colocation, Hosting...

Senior Interaction Designer (User Experience, UCD, Prototypes)

Senior Interaction Designer (User Experience, UCD, Interactive...

To send to more than one email address, simply separate each address with a comma.