Metadata elements are the lifeblood for finding and reusing research data. Data is only as valuable as the metadata which describes and connects it. In addition to selecting a metadata standard or schema, whenever possible you should also use a controlled vocabulary. A controlled vocabulary provides a consistent way to describe data.
Metadata: your new best friend
Metadata is structured information about a resource that describes characteristics such as content, quality, format, location and contact information. Creating metadata to describe research data is very similar to the process for descriptive cataloguing of library resources.
Metadata schema are sets of metadata elements (or fields) for describing a particular type of information resource. Numerous metadata schema exist for describing research data across different disciplines.
1. Start by watching this short (2:29 min) video about medical metadata
2. Read the short ANDS Introduction to Metadata to understand what metadata is and why is it the lifeblood of research data sharing!
3. Let’s revisit at least one of the good quality metadata records for medical data we met in previous Things. Why do you think these records are considered ‘good quality’? Hint: consider both the type and quality of information provided. What metadata included in this record help discovery and reuse of the data?
- The Australian Longitudinal Study of Ageing: Wave 1, 1992
- National Survey of Midlife Development in the United States (MIDUS)
- Population Health data collection for the City of Greater Bendigo
If you have time, explore the UK Digital Curation Center’s Directory of Disciplinary Metadata. You might find a schema that is applicable to your research!
Consider why, if metadata is the lifeblood of data discoverability and reuse, is it often neglected or not richly done when data is published.
Control your language, please!
Controlled vocabularies significantly improve data discovery. They make data more shareable with researchers in the same discipline because everyone is ‘talking the same language’ when searching for specific data e.g. medical conditions, plants, animals etc. There are hundreds of controlled vocabularies used in medical practice, many of which can be utilised in research.
- Explore one of the following:
- ICD-10 for classification of diseases
- RxNorm for clinical drugs (US based, so has US spelling and brand names)
- SNOMED-CT for clinical terminology, which has an Australian version, SNOMED-CT-AU
- MeSH for medical subject headings (particularly useful for making your research more findable in literature searches)
- Have a flick through these slides about health vocabularies
Consider: How would the use of a controlled vocabulary be helpful within your field?