ANDS supports the development of institution-wide solutions for the discovery and reuse of research data collections. Part of this is ensuring that the metadata for these collections is properly managed so that it can be harvested and exposed to search engines as well as to researchers and research administrators. Metadata Stores are a key component of this infrastructure.
Types of metadata stores
Metadata stores can be distinguished between by their coverage, the granularity of data that they describe and the specialisation of their descriptions.
Based on coverage, types of metadata stores include:
local metadata store
coverage over data produced by a single instrument or research group
institutional metadata store
coverage over data produced across the institution, typically by a variety of research groups and disciplines
national metadata store
coverage over data produced across a country, by a variety of institutions. (Research Data Australia is an instance of a national store)
discipline-specific metadata store
coverage over data produced within a discipline, across a variety of research groups, institutions, and (typically) countries
Metadata about research collections is best created and managed close to where the research data is created, in local metadata stores, tightly integrated with research groups and their activities. This metadata should be relevant to researcher needs and easily accessible.
However, the metadata stores with broader coverage are essential if data collections are to be discovered, tracked and used outside the immediate context of the research - across a discipline or an institution. Stores with broader scope are likely to have more users than local stores, and institutional and national stores use more generic formats, applicable to more domains. Stores with broader scope typically act as metadata aggregators, gathering metadata (or appropriate distillations of metadata) from local systems.
Based on granularity, types of metadata stores include:
collection-level metadata store
describes data collections (collections, datasets, etc)
object-level metadata store
describes individual data objects (files, database rows, spreadsheets, physical objects)
integrated metadata store
describes both individual data objects and the collections that they comprise, in the one system and is typically coupled with data storage for the data being described
Based on specialisation, types of metadata stores include:
specialist metadata store
captures metadata of interest to a discipline specialist
generic metadata store
only captures metadata which is of interest to a general audience; for example, university administration, university research office, general public, researchers in other fields
The specialisation of a metadata store depends on who will be using it. Both are necessary: specialist metadata may be generated first (especially if automated), but is usually difficult for it to be repurposed automatically into generic metadata.
Institutional solutions tend to be generic, since their metadata descriptions cannot be discipline-specific. However, an institutional solution can be configured to provide different solutions for different disciplines.
Object-level stores are typically specialist, because discipline knowledge is needed to make sense of individual data objects. Data capture often produces specialist metadata automatically. If a specialist store is managing data objects and the discipline needs to organise those objects into a collection, it will usually do so as an integrated store, so that the management of objects and collections is co-located.
Institutions are all different and have different needs and approaches. There is no single solution that fits all. Nevertheless, ANDS encourages its partners to consider deploying one of the existing solutions rather than duplicating development effort internally.
Descriptions of data collections should not be seen as information islands. They need to be connected to other kinds of information, which may be stored and managed in different data stores. For example, ANDS requires information about related parties and activities to accompany collections. The authoritative sources of truth for information about people can be HR and Research Office systems. A metadata store should be reusing that metadata, rather than creating its own records, with potentially inaccurate information. A characteristic of high-quality metadata is that it is created once and then reused as needed.
If the contextual information is common across different institutions, it is appropriate to have a common external authority for the information. A common description of a grant or researcher across institutions allows users to navigate between data collections held by different institutions, but involving the same research team members.
This means that deploying a metadata stores solution usually involves integrating multiple sources of truth, possibly including external sources of truth. If such data has already been aggregated or centralised in the institution (e.g. as a data warehouse), it can be exploited by institutional metadata stores.
Local metadata stores
Local metadata stores are crucial to good research data management and to then populating broad-scope metadata stores. Researchers should consider the following requirements for their local stores.
The local metadata store should:
- Store metadata that supports discovery and evaluation of data (e.g. keywords).
- Store metadata in a format which is in common use in the discipline.
- Store metadata that supports reuse of data (e.g. experimental configuration, interpretation of dependent variables, access rights-these may simply be a link to a separate file or a paper).
- Export metadata to other formats commonly used in describing metadata, especially in metadata aggregators (note that OAI-PMH requires a feed to be available in Dublin Core).
- Support aggregation of metadata (harvesting and/or syndication) - especially for international discipline repositories.
- Support automated gathering of metadata from instruments (e.g. file header), and of related metadata from other databases (e.g. Instrument booking systems, HR systems, grants programs).
- Integrate in researcher workflows with minimal disruption (e.g. through web services & APIs).
- Allow error checking, validation, and use of constrained vocabularies.
- Allow metadata describing both collections and objects within collections, if that is appropriate to the discipline.
- Allow hierarchical organisation of metadata, where appropriate to the discipline (e.g. ordering metadata by project and/or experiment).
Not all metadata store solutions will satisfy all requirements; automated metadata gathering and integration, in particular, are not widespread, and should not automatically disqualify a candidate store. All these features are worth considering in evaluating candidates, and researchers need to work out which features are priorities for them. The highest priorities are likely to be commonly used formats, hierarchical organisation and aggregation support.
There are a number of different institution‐wide solutions for the discovery and reuse of research data collections, including: