The UK Data Archive provides many reasons for sharing and enabling reuse of data, including:
- encouraging scientific enquiry and debate:
- by encouraging the improvement and validation of research methods
- by maximising transparency and accountability through scrutiny of research findings.
- promoting innovation and potential new data uses:
- leading to new collaborations between data users and data creators
- reducing the cost of duplicating data collection
- increasing the impact and visibility of research
- providing credit to the researcher as a research output in its own right
- providing great resources for education and training.
How do you make data available for reuse?
Good data management is the key for data (re)use:
- Planning for reuse and publication from the start.
- Appropriate recognition of others' data through appropriate citation.
- Appropriate rules of use through simple and explicit data licensing approaches.
- Sufficient metadata describing how the data has been specified, collected, analysed and transformed.
- Use of standard vocabularies in the metadata also enables reuse.
- Data resulting from research needing ethical permission and oversight needs particular preparation if it is to be shared.
The most effective way to get your data reused is to publish it.
Finding and accessing data for reuse
At present, data repositories are not exhaustive and the size and quality of their data collections varies. Finding the data you want to reuse typically involves searching multiple sources, including:
1. General and discipline-specific data repositories and discovery portals
See also the lists of repositories at Nature and re3data (Registry of Research Data Repositories). Of special interest is the graphical browser of repositories at re3data by subject. re3data uses the German subject coding, called DFG.
Discipline specific examples
2. Grey literature
Much data is not formally published in a repository or portal. A general web search may reveal project websites, publications pointing to data, and contact details of the primary researcher. Increasingly, datasets are being published with associated links to grants, software, scholarly publications, visualisations, project websites and other inputs and outputs.
3. Research publications
Research based on a dataset may contain a statement about where those data are stored and how access may be gained.
4. Peer networks
Through your own network of colleagues and peers.
Data access may be open (unrestricted or minimal restrictions), conditional or restricted (reusers must provide some information, pay a fee, and/or meet criteria before access is granted), or closed for reuse.
Data will have a metadata record in a repository. Like a catalogue record for a publication, this will contain information about the dataset, including how it can be accessed. The record might navigate to another repository, project page, or contact details for the data custodian.
In Research Data Australia, information about access conditions and how to access the dataset is found here:
When you get the data
A metadata record in a repository will likely be the first information about the dataset that you read. This record should contain enough information about the sample and/or subjects of the data, content areas, data collection methods, and format of the data for you to determine whether these data are what you need.
A data reuser is often not familiar with the secondary data. Take some time to:
- read user and technical manuals about how data collection was designed and carried out.
- find out about any instruments used to collect the data.
- read study protocols and interview/survey questions.
- understand the characteristics of the sample from which the data was drawn.
- find out if and how the data have been modified from their original form; e.g. have they been confidentialised, weighted, or treated for missing data?
- find out what variables are included in the dataset and how these were constructed.
Assessing secondary data is much like evaluating the quality of a research paper. Consider factors that relate to the reliability and validity of research results, such as whether:
- the source is trusted
- the sample characteristics, time of collection, and response rate (if relevant) of the data are appropriate
- the methods of data collection are appropriate and acceptable in your discipline
- the data were collected in a consistent way
- any data coding or modification is appropriate and sufficient
- the documentation of the original study in which the data were collected is detailed enough for you to assess its quality
- there is enough information in the metadata or data to properly cite the original source.
What am I allowed to do with the data?
All research data intended for reuse should have a licence — a legal document that states how the data should be attributed, and for which purposes it may and may not be used.
Licensing information should appear in the metadata record or be provided by the data custodian when you seek access. If licensing information is not clear, contact the data custodian for more information and/or to negotiate licence terms.
For example, in Research Data Australia, the licence for the data is found with other information about access conditions, as shown in the above image.
Licensing data that contains secondary or 'reused' data
As data reuse and the combining of new and secondary data becomes more common, reusers are needing to apply licences to these 'derived' datasets. The type of licence you choose for such a dataset will depend on (a) the amount or proportion of secondary data in the 'new' dataset, and (b) the terms of the original licence under which those secondary data were acquired. For example, whether the original licence required that new versions of the data be licensed under the same terms.
For more information about using multiple datasets see the licensing and copyright for data page.
Attributing your data source
It is best practice, and typically a condition of the licence under which data are shared, that you reference the source of secondary data in your own research outputs that use these data. This is akin to providing a bibliographic reference to other research outputs, such as journal articles.
Data reuse stories
Issue 23 of ANDS' quarterly newsletter magazine, Share, was themed around data reuse. It includes a number of stories about researchers reusing data, including to make new discoveries.
- Pienta, A.M., O'Rourke, J. Mc., Franks, M.M. Getting started: Working with secondary data. In Secondary data analysis: an introduction for psychologists. (2011). Eds Kali H. Trzesniewski, M. Brent Donnellan, & Richard E. American Psychological Association.
- Adams, J., Khan, H.T.A., and Raeside, R. (2014). Research Methods for Business and Social Science Students. SAGE Publications.
- Johnson, G. (2014). Research Methods for Public Administrators: Third Edition. Hoboken : Taylor and Francis.
- Sorenson, H. T., Sabroe, S., & Olsen, J. (1996). A framework for evaluation of secondary data sources for epidemiological research. International Journal of Epidemiology, 25(2), 435–442.
- McCaston, M.K. (2005). Tips for Collecting, Reviewing, and Analyzing Secondary Data (PDF, 0.23 MB). CARE.
- A toolkit for data transparency takes shape, Technology feature, Nature, 20 Aug 2018