What is data citation?
Data citation refers to the practice of providing a reference to data in the same way as researchers routinely provide a bibliographic reference to outputs such as journal articles, reports and conference papers. Citing data is increasingly being recognised as one of the key practices leading to recognition of data as a primary research output. This is important because:
- when datasets are routinely cited they will achieve greater validity and significance within the scholarly communications cycle
- citation of data enables recognition of scholarly effort with the potential for reward based on data outputs
- the use of data should be appropriately attributed in scholarly outputs as with other types of publication.
Assigning a Digital Object Identifier (DOI) to data facilitates data citation and is considered best practice. A DOI is a type of persistent identifier that indicates a dataset will be well managed and accessible for long term use. It is now routine practice for publishers to assign DOIs to journal articles and for authors to include them in article citations. The ANDS DOI Service (Cite My Data) allows Australian research organisations to mint DOIs for datasets and collections so they can be included in data citations.
Data citation is international best practice:
- Joint Declaration of Data Citation Principles
- There are emerging approaches to the citation of dynamic data
- The video below from Research Data Netherlands succinctly explain the concepts of data citation and DOIs.
A comprehensive list of data citation webinars is available via the ANDS YouTube channel.
- Mons, B., Haagen, H. van, Chichester, C., Hoen, P.-B. 'T, Dunnen, J. T. den, Ommen, G. van, et al. (2011). The value of data. Nature Genetics, 43(4), 281-3. DOII: 10.1038/ng0411-28
Data citation poster
Free printed copies of the poster in pamphlet form are available by emailing firstname.lastname@example.org.
Benefits of data citation
Why cite data?
Citation of data brings numerous benefits for researchers and institutions. For example:
- Evidence suggests that including citable data in related publications increases the citation rate of those publications.
- Routine citation of data will assist in gaining acknowledgement of data as a first class research output.
- Citations for published data can be included in CVs and biographical sketches along with journal articles, reports and conference papers.
- Only cited data can be counted and tracked (in a similar manner to journal articles) to measure impact.
Why include a DOI in data citations?
While data may be cited without a DOI, assigning a DOI to data and including it in a data citation is considered best practice. DOIs provide additional benefits such as:
- Easy and persistent access to research data available via the internet
- Enhanced discovery, retrieval and management of data to enable data reuse and verification of research results.
- Support for automated tracking of data outputs:
- Piwowar HA, Vision TJ. (2013) Data reuse and the open data citation advantage. PeerJ 1:e175 https://doi.org/10.7717/peerj.175
- Callaghan S, Donegan S, Pepler S, (2012) Making Data a First Class Scientific Output: Data Citation and Publication by NERC's Environmantal Data Centres. The International Journal of Digital Curation, Vol 7, Issue 1. doi:10.2218/ijdc.v7i1.218
- National Information Standards Organisation (2013) Recommended practices for online supplemental journal article materials. NISO RP-15-2013.
How to cite research data
Data citation styles continue to evolve and vary across disciplines and publishers. DataCite recommends using one of the following formats:
- Mandatory citation elements only Hanigan, Ivan (2012): Monthly drought data for Australia 1890-2008 using the Hutchinson Drought Index. The Australian National University Australian Data Archive.
- Including version and resource type
Bradford, Matt; Murphy, Helen; Ford, Andrew; Hogan, Dominic; Metcalfe, Dan (2014): CSIRO Permanent Rainforest Plots of North Queensland. v2. CSIRO. Data Collection.
Nature Publishing’s Data Policy states:
Citations of datasets should include the minimum information recommended by DataCite and follow Nature Research style i.e. authors, title, publisher (repository name), identifier.
Dataset identifiers including DOIs should be expressed as full URLs. For example:
- Hao, Z., AghaKouchak, A., Nakhjiri, N. & Farahmand, A. Global Integrated Drought Monitoring and Prediction System (GIDMaPS) Data sets. Figshare http://dx.doi.org/10.6084/m9.figshare.853801 (2014)
- See a published example in Nature Communications. (items 1 & 43 in the reference list)
- Hanigan, Ivan. (2010) Meteorological Data for Australian Postal Areas . Australian Data Archive. DOI: 10.4225/13/50BBFCFE08A12
Not all datasets have a DOI that can be referenced in a citation and alternatives to the DataCite citation style do exist. The basic APA style for citing data is shown in the example below:
- Pew Hispanic Center. (2008). 2007 Hispanic Healthcare Survey [Data file and code book]. Retrieved from http://www.pewhispanic.org/2007/09/23/2007-hispanic-healthcare-survey/