This guide is intended for eResearch infrastructure support providers and researchers. It includes a range of issues concerning data citation and how the culture of data citation is developing as a scholarly practice.
What do we mean by data citation?
Data citation refers to the practice of providing a reference to data in the same way as researchers routinely provide a bibliographic reference to other scholarly resources. While data has often been shared in the past, it is seldom cited in the same way as journal articles or other publications. This culture is, however, changing. Data citation underpins the recognition of data as a primary research output rather than as a by-product of research. When datasets are cited, they achieve a validity and significance within the scholarly communications cycle. Citation of data enables recognition of scholarly effort in disciplines and organisations that want to acknowledge and reward data outputs.
Did you know...?
- The creation of data is increasingly being recognised as a primary research output.
- There is a global network of discipline and institutional data repositories where research data collections have metadata records which will form a full data citation with a persistent identifier.
- Increasing accessibility to publicly funded data means that more research data will be discovered and reused, and thus needs to be correctly cited to be counted and tracked.
- Data use and reuse can be tracked and recorded in the same way as research publications.
- Data citation information may soon be incorporated into practices for research evaluation and reward.
- Some bibliographic management systems (e.g. EndNote) now include a template for research data citations.
The ANDS approach to data citation
An important ANDS goal is to enable more researchers to reuse research data more often. To achieve this, ANDS is engaged in activities that will make it easier to share data, to recognise the importance of making data available and to make data citation a standard practice. To this end:
- ANDS is engaging with research funding agencies to promote data publication as a primary research output and the inclusion of data in the research assessment process.
- ANDS is working with Thomson Reuters Data Citation Index to track and record data citations as part of research assessment activities.
Whilst Digital Object Identifiers (DOIs) are not essential for data citation, they are a very useful tool to not only track data citation metrics but to also link DOIs from journal articles and other related services (e.g. software associated with the dataset). For this reason, ANDS has several connections with DOIs:
- ANDS is a member of DataCite, a group of leading research libraries and technical information providers that aims to make it easier for research datasets to be handled as independent, citable, unique scientific objects. This is done by using Digital Object Identifiers (DOI) as permanent identifiers for datasets.
- ANDS Cite My Data Service provides datasets with a unique and traceable identifier (a DOI). Data which has a DOI and is discoverable through data portals such as Research Data Australia (RDA), can then be used by researchers to cite their own, and others', data in publications. The DOI assigned to a dataset can also be used for citation by other researchers when the data is reused.
How do you cite data?
Data citation standards vary across disciplines and publishers. However, DataCite recommends the following format:
Creator (Publication Year) Title. Publisher. Identifier
Hanigan, Ivan. (2010): Meteorological Data for Australian Postal Areas. Australian Data Archive. DOI:10.4225/13/50BBFCFE08A12
It may also be appropriate to include two optional properties: Version and/or ResourceType
Creator (Publication Year): Title. Version. Publisher. ResourceType. Identifier
Colley, Sarah. ( 2010 ) Archaeological Fish Bone Images Archive Tables. 1st edition. Sydney eScholarship Repository Sydney. http://ses.library.usyd.edu.au/handle/2123/6253
Abraham, G; Kowalczyk, A; Loi, S; Haviv, I; Zobel, J. (2011) Computational Model for Gene Set Analysis to predict breast cancer prognosis based on microarray gene expression data. Computer Science and Software Engineering, The University of Melbourne.
Computational Model. doi:10.4225/02/4E9F69C011BC8
Other recommended formats
Various data repositories (e.g. ICPSR and other social science data centres) provide a recommended format for citing data from that repository. For example:
Kessler, Ronald C. National Comorbidity Survey: Baseline (NCS-1), 1990-1992 (Restricted Version) [Computer file]. ICPSR25381-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2009-05-11. doi:10.3886/ICPSR2538
How do you count data citations?
It is now possible for citation indices such as the Thomson Reuters Data Citation Index to measure the reuse of research data. This is analogous to the use of products such as Web of Science and Scopus to measure citations to journal articles and other types of scholarly publications. Such metrics are commonly used for performance appraisal and reporting. Benefits to your organisation of having your records tracked through data citation indices include:
- capitalise on investments to date in data management capability
- representation in globally recognised citation indexing services
- ability to track reuse of institutional data assets
- increase the visibility of institutional data assets.
Connect all your research output citations
The connection between data and publication is increasingly recognised. The following record comes from RDA. Figure 1 shows how the dataset should be cited. Figure 2 shows how the dataset is linked to several related research outputs, including original and derived datasets, software, and publications.
Figure 1: Users can click on the 'Cite' link in RDA to see how to cite the dataset described
Figure 2: Create linkages between related research outputs including data, software and publications