|
||||||||||
|
|
||||||||||
|
The structure of HarvestHarvest consists of two parts, the so-called Gatherer and the so-called Broker.The Gatherer gathers information, reprocesses and delivers it to the (or any) Broker. The Broker files these data. In addition, it makes an interface (in the form of a HTML page) with which a user can set up his search inquiry to the system.
Functioning of the Gatherer:The Gatherer gathers the information which are on the servers. Coarsely outlined, it proceeds as follows:
On default the Gatherer starts with an URL (that is
a WWW- page where it should start its course)
to work at this page and indicates its contents.
It is possible to configure the Gatherer precisely to prevent it from jumping randomly through the net; one e.g. can default the inquiry depth and specific domains or exclude specific (non-public) lists. The Gatherer 'scours' one or several servers in this way. It is also able to indicate "non suitable WWW formats" like POSTSCRIPT or compressed data. Technically process called "enumberator" tries to extract links form HTML-files. A Summerizer collects the objects from the net and tries to find information and structure, with will describe the object. For this, the summarizer (which contains all information about how to decode the given file format) uses a process called "essence". If the Gatherer is done with its collection, it delivers the determined data record to a broker in packed form.
Functioning of the Broker:The broker puts an index on the data record. This can now be searched. The Broker contains what the user regards to be the search engine of Harvest.
Graphical representation of the construction of Harvest
|
|||||||||
|
||||||||||
|
with funds of the German Ministry of Education and Research (BMBF) and of the Government of Lower Saxony.
Last Update: 18. Feb. 2008 © 2001-2002, ISN Oldenburg GmbH |
|||||||||