A Silicon Valley company plans to ship content indexing technology, which will automatically categorise structured and unstructured data to create directories for intranets and Web portals.
Semio has until now sold three dimensional search engines, but hopes to differentiate itself in the area of content categorisation by automating the process.
Users need to identify the categories they want their information to be indexed into before Semio?s Taxonomy offering can create an information directory for it to be held in. Taxonomy then categorises the content, which can come from either the Web or enterprise databases, and presents it to customers via a searchable Web browser interface.
Claude Vogel, Semio?s founder and chief executive, claims the technology can also perform real time updates to the directories and provide thesaurus like links so users can cross reference functionality.
Vogel said: "Once the initial high level choices have been made, the creation of the directory involves no other manual labour."
Hadley Reynolds, director of research at the Delphi Group, added: "Creating taxonomies out of document collections in an automated way, without human invention, is a key element to the problem of delivering a corporate portal. There are several approaches to this type of solution and Semio's is among the strongest."
He continued that Semio Taxonomy also enables portal application developers to steer the categorisation process by choosing the categories themselves.
The Web now holds more than 270 million documents and the number is growing by a million pages a day, which means that cataloging and organising electronic summaries by humans is becoming almost impossible.
However, the categorisation of Web sites and pages is still undertaken manually by leading Internet directory service provider, Yahoo! and by rivals, Excite and Infoseek.
But IBM's Almaden Research Center is also trying to solve the problem by working on an algorithm called "Clever." This automates the process of finding relevant Web pages by analysing both the actual links in the Web pages and the immediate words surrounding the link.
Semio Taxonomy is currently in beta with customers such as Symantec and the US Postal Service, but will ship in June as a Web service targeted at vertical portal and Internet commerce providers. Semio will also sell it to enterprises for use with corporate intranets and extranets.
To comment on this story, email [email protected]
Found by calculating the strength of the material deep inside the crust of neutron stars
Can highlight in real-time the relevant regions of an image being described
Double legal trouble for Musk as he also faces civil lawsuit over renewed British pot-holer 'paedo' claims
Battery development could help boost performance of smartphones