Google has re-released an open source version of optical character recognition software originally produced by HP
Google uses OCR to convert documents into text that can be used for indexing

Google re-releases open source OCR software

Tesseract code unearthed from the HP crypt

Matt Chapman

Google has re-released an open source version of optical character recognition (OCR) software originally produced by HP

The Tesseract program was developed by HP between 1985 and 1995 and in its final year was in the top three OCR packages in a competition organised by the University of Las Vegas (UNLV) in Nevada. 

Advertisement

Google said in a statement that, although some people might wonder why the search giant was interested in OCR technology, it fitted in with the company's plans to make information available online.

"We are all about making information available to users, and when this information is in a paper document, OCR is the process by which we can convert the pages of this document into text that can then be used for indexing," said Eric Case on the official Google Code blog

HP stopped working on Tesseract in 1995 and released the code to the Information Science Research Institute at UNLV a couple of years ago so that it could be developed for open source. 

"UNLV was happy to oblige, but they asked for our help in fixing a few bugs that had crept in since 1995 (ever heard of bit rot?)," wrote Case.

"We tracked down the most obvious ones and decided a couple of months ago that Tesseract OCR was stable enough to be re-released as open source."

Google originally chose to keep the launch low-profile but today's announcement includes an advert for engineers to work on the project

The software currently supports only English, does not include a page layout analysis module, struggles with greyscale and colour documents, and will not match the accuracy of the best commercial OCR packages currently available.

"Yet, as far as we know, despite its shortcomings, Tesseract is far more accurate than any other open source OCR package out there," wrote Case.

  • Have your say
  • Send to a friend
  • Print
  • Digg
  • Reddit
  • Share

Tags:

Do you agree?

Further reading

Google chief executive Eric Schmidt

Google chief joins Apple board

Eric Schmidt brings 'insights and experience' to Jobs & Co

Google has teamed up with eBay to provide click-to-call adverts that work with Skype and Google Talk

Google talks to eBay

Search giant integrating VoIP adverts with Skype

Google right to protect its trademark

Objection to 'google' as a verb is justified, says lawyer

Google takes wraps off Writely

Online word processor free for all

Related whitepapers

Related jobs

Most watched

V3.co.uk weekly debrief, 13 Nov 09

This week we discuss the inaugural V3.co.uk Summit

Summit: Salesforce.com on SaaS and information overload

How web services contribute to data headaches

Analysis and Reports

Remote access - Three steps to getting connected

3.4 million UK professionals now work from home – is your company equipped?

Cost benefits of a global collaboration network

This white paper is a must read for organisations looking for evidence of the bottom-line benefits of high-definition video and voice communications

Poll

Impact of Information Overload poll

Impact of Information Overload poll

What is the biggest problem your firm faces as a result of the data explosion?

View poll results

Advertisement

White paper library

Keep up to date with the latest products, services and technologies from the world's leading IT companies; IThound.com brings you over 6,000 white papers, case studies and analyst reports.

Advertisement

Newsletter signup

Sign up for our range of FREE newsletters:

Existing User

Newsletter user login:

Enter email address to edit your newsletter preferences

Job of the week

Search thousands of IT jobs :

Search thousands of IT jobs:

Advanced search

Hiring now on ComputingCareers:

Related IT jobs

Search thousands of IT jobs :

Search thousands of IT jobs:

Advanced search

Advertisement

Spotlight

V3.co.uk weekly debrief, 13 Nov 09

This week we discuss the inaugural V3.co.uk Summit

Fingers on keyboard

New Flash vulnerability discovered

Web sites could be vulnerable to Flash attacks

Chris Adams

Summit: Microsoft Office to the rescue

Chris Adams, Office Client product manager for Microsoft UK, explains...

Illegal downloader

Industry and human rights campaigners united in opposition to "three strikes" plan

Critics says government proposals to curb illegal downloading are unworkable...

Primary Navigation