ANDS Logo
bannerbannerbannerbanner
 Find research data:

Introduction to metadata harvesting

What is harvesting and how does it work?

Harvesting is an automated, regular process of collecting metadata descriptions from different sources to create useful aggregations of metadata and related services. The harvesting process is controlled by Data Source Administrators.

ANDS prefers to use the OAI-PMH harvest protocol for harvesting RIF-CS documents. In some cases formats other than RIF-CS may be supported by arrangement.

OAI-PMH enables automated, regular harvesting, and is included in major repository software products. If you do not already have OAI-PMH capability as part of a repository solution, there are free OAI-PMH tools and development libraries available online. ANDS also supports metadata exchange using direct HTTP, and deleting of records as part of the harvest (OAI-PMH delete).

 

Harvesting process

 

The harvest process


In simple terms, harvesting works as follows. An institution with a data store or metadata store (defined in OAI-PMH as a Data Provider) is ready to provide metadata to ANDS (defined in OAI-PMH as a Service Provider). RIF-CS XML documents are exposed at the harvest point (a web address) ready for  harvesting. The following steps are initiated by the Data Source Administrator.

  1. The Data Source Administrator logs in to the Data Source record in the ANDS Registry database, and configures the harvest by setting Provider Type, Harvest Method and dates. The Data Source Administrator, on behalf of the Data Source organisation (the OAI-PMH Data Provider), determines when and what to harvest.

  2. The Data Source record controls the actions of the ANDS Harvester application. The harvester itself is a software application operated by the OAI-PMH Service Provider (ANDS).

  3. The Harvester acts in accordance with the Data Source Administrator's instructions, and requests files from the Data Source server.

  4. The Data Source server provides files to the Harvester.

  5. The Harvester inserts the metadata it has retrieved into the ANDS Collections Registry. In some cases transformations are applied at this stage.

  6. The metadata is formatted for display and made discoverable through Research Data Australia using search or  browse methods. The metadata is also exposed for search engine crawling so that records can be found using global search engines, as well as being made available to be harvested by other OAI-PMH registries.

More information: How to configure harvestsHarvest troubleshooting

Technical references: Harvester Service Technical and User Guide PDF | Implementing an OAI-PMH RIF-CS Provider (includes information about using jOAI)

Date Change history
26 October 2010 First web publication
13 July 2011 Added link to information about using OAI-PMH delete
30 Sept 2011 Added link to harvest troubleshooting information in ANDS FAQ and technical references

 

 

 

 

Please send any feedback on this page to guides@ands.org.au