Google
'Robots.txt' files do not treat all search engines equally

Google bots get the red carpet treatment

Robots.txt files written to favour Google's web-crawlers

Robert Jaques

Webmasters who control automated web-crawler access to their sites using 'robots.txt' files have a bias that favours Google over other search engines, according to new research.

The claim was made by researchers at Penn State University based on the results of a study of more than 7,500 websites.

Advertisement

C. Lee Giles, David Reese professor of Information Sciences and Technology at Penn State, who led the research team which developed the BotSeer search engine for the study, described the pro-Google bias as "surprising".

"We expected that 'robots.txt' files would treat all search engines equally, or maybe disfavour certain obnoxious bots," he said.

"So we were surprised to discover a strong correlation between the favoured robots and search engine market share."

'Robots.txt' files are not an official standard but, by informal agreement, regulate web-crawlers, also known as 'spiders' and 'bots', which mine the web continuously.

Web policy makers use the files found in a website's directory to restrict crawler access to non-public information.

'Robots.txt' files also are used to reduce server load which can result in denial of service and force a website to shut down. But some web policy makers and administrators are writing 'robots.txt' files which are not uniformly blocking access.

Instead, those files give access to Google, Yahoo and MSN while restricting other search engines, the researchers found.

While the study does not include explanations for why web policy makers have opted to favour Google, the researchers know that the choice was made consciously. Not using a 'robots.txt' file gives all robots equal access to a website.

"'Robots.txt' files are written by web policy makers and administrators who have to intentionally specify Google as the favoured search engine," said Professor Giles.

Not every site has a 'robots.txt' file, although the number is growing. About four in 10 of the 7,500 sites analysed by the researchers had such a file, up from fewer than one in 10 in 1996.

  • Have your say
  • Send to a friend
  • Print
  • Digg
  • Reddit
  • Share

Tags:

Do you agree?

Further reading

IT security

Mafia-style mobs muscle in on malware

McAfee highlights top 10 threats for 2007

Evil Trojan twins control most of world's botnets

Sdbot and Gaobot malware groups responsible for 80 per cent of botnets

Google woos Android developers

Open handset project gets SDK and $10m developer challenge

Web firms fear legal action over user comments

Big guns backing Roommates.com

Related whitepapers

Related jobs

Most watched

Summit video: Intel discusses processors designed for data overload (part one of two)

Intel explains how its Xeon processors can handle data-intensive apps

Summit: Intel discusses processors for data overload (part 2 of 2)

More thoughts on how servers can help manage overload

Analysis and Reports

Remote access - Three steps to getting connected

3.4 million UK professionals now work from home – is your company equipped?

Cost benefits of a global collaboration network

This white paper is a must read for organisations looking for evidence of the bottom-line benefits of high-definition video and voice communications

Poll

Impact of Information Overload poll

Impact of Information Overload poll

What is the biggest problem your firm faces as a result of the data explosion?

View poll results

Advertisement

White paper library

Keep up to date with the latest products, services and technologies from the world's leading IT companies; IThound.com brings you over 6,000 white papers, case studies and analyst reports.

Advertisement

Newsletter signup

Sign up for our range of FREE newsletters:

Existing User

Newsletter user login:

Enter email address to edit your newsletter preferences

Job of the week

Search thousands of IT jobs :

Search thousands of IT jobs:

Advanced search

Hiring now on ComputingCareers:

Related IT jobs

Search thousands of IT jobs :

Search thousands of IT jobs:

Advanced search

Advertisement

Spotlight

deloitte

Summit interview: Deloitte discusses security implications of the data deluge

We chat to Mike Maddison, UK head of Security, Privacy...

ibm logo

IBM boosts mobile shopping with WebSphere Commerce

Update designed to give mobile users a richer, more personalised...

Summit: Intel discusses processors for data overload (part 2 of 2)

More thoughts on how servers can help manage overload

chrome logo

Google plans a Mac version of Chrome

A Mac-friendly version of the browser is in the pipeline

Primary Navigation