What is 'open data'?
While numerous definitions of open data exist, most focus on similar characteristics. To summarise, open data can generally be defined as data that is:
- freely available to download in a reusable form. Large or complex data may be accessible via a service or facility that enables access in-situ or the compilation of sub-sets
- licensed with minimal restrictions to reuse
- well described with provenance and reuse information provided
- available in convenient, modifiable and open formats
- managed by the provider on an ongoing basis.
The Open Data Handbook provides an introduction to the legal, social and technical aspects of open data. It discusses what open data is as well as why and how to make data open.
Who benefits from open data?
Everyone! According to the Royal Society, open data supports:
- new research and new types of research
- the application of automated knowledge discovery tools online
- the verification of previous results
- a broader base set of data than any one researcher can hope to collect
- the exploration of topics not envisioned by the initial investigators
- the creation of new data sets, information and knowledge when data from multiple sources are combined
- the transfer of factual information to promote development and capacity building in developing countries
- interdisciplinary, inter-sectoral, inter-institutional and international research.
The diagram below illustrates the many ways open data benefits researchers, research organisations, funders, policy makers and the broader community.
For an economic assessment of open data, read the 2014 report Open Research Data: A report to the Australian National Data Service from the Victoria Institute for Strategic Economic Studies.
The website Why Open Research?, funded by the Shuttleworth Foundation, also has a set of accessible arguments for why research should be open, including increasing visibility, reducing publishing costs and getting more funding.
Overcoming barriers to opening data
While some people are reluctant to make their data open, there are often ways to overcome possible barriers. Let's look at a few commonly raised issues.
Someone might use my data to 'scoop' me
This can simply be a matter of timing. For example, you may choose to restrict access to your data until a key paper is published. You decide the appropriate time for making your data open.
However, there is little evidence to suggest 'scooping' is a high risk for open data. In an interview reported in Nature, Professor Issac Kohone from Harvard Medical School stated:
"[we] need to convince people that the likelihood of being scooped if they put their data out there [is] not going to be high ... we need to do away with a culture of sitting on data until we have mined every useful scientific grain out of it".
In a similar vein, some researchers report that any possible loss of future potential papers is well offset by the more immediate rewards of data citations and collaborative opportunities.
In fact, many researchers find that opening up their data has greatly benefited their research. It has been reported that Professor Tim Gowers, Royal Society Research Professor at the University of Cambridge, opened up his data to crowd-source an unsolved mathematical problem. Twenty-seven people made 800 substantive contributions to solve the problem in a matter of days. Professor Gowers commented that this approach to research was "like driving a car whilst normal research is like pushing it".
My data are sensitive due to cultural, ethical, ecological or security considerations
There are circumstances where it may not be appropriate to make data open. For example, where individuals may be identified, threatened species located or information affecting national security revealed. However, there may be ways to make sensitive data at least partially open. These are outlined in the ANDS Guide - Publishing and sharing sensitive data which provides practical advice and examples for sharing sensitive data.
I won't get any recognition or reward for making my data open
It could be argued that few tangible rewards currently exist for those who make their data open. However, things are starting to change. Tools such as Thomson Reuters Data Citation Index, enable citation metrics to be captured for data, in much the same way as they are for publications. This provides the opportunity for data citation metrics to be included in project proposals, promotion cases and CVs.
There are contractual or commercial interests associated with my data
In some cases, research data may underpin a commercialisation opportunity such as a patent. Or it may be that contractually, IP arising from a project is owned by a third party. In others cases though, data is not shared because of the uncertainty arising from data not being explicitly addressed in contracts and project plans. Ideally, discussions around data ownership, ongoing management and access should start at the project proposal stage.
Start from a position of "why not make the data open?" and consider how any perceived risks associated with making the data open can be addressed.
Making data open
Let's look at the five characteristics of open data in a little more detail. It's worth noting that even if you can't meet all the criteria for 'open data' there are benefits in making data as open as possible. Fewer barriers means more opportunities for data to be reused and cited.
A report by Knowledge Exchange (2014) discusses some of the motivations and incentives for researchers to make data open.
Open data is
Which ideally means ...
So preferably not ...
|Freely available to download|| |
a) There is no cost to access the data
a) Costed at more than reproduction cost
|Licensed||An open license such as CC-BY is applied.||A restrictive license, or worse, no license at all. If no license is applied, no reuse is permitted.|
|Well described||Standards based metadata is used with details of data elements and inclusion of data dictionaries. Describe the purpose of the collection, the characteristics of the sample and the method of data collection.||Metadata descriptions that are very brief or will not be widely understood. Avoid jargon and abbreviations and don't assume prior knowledge of the data or subject domain.|
|Provided in an open format||The data is in a convenient, modifiable and open format that can be readily retrieved, downloaded, indexed and searched. Where possible, formats should be machine-readable and non-proprietary formats are preferred. For example, prefer netCDF over .xls.||Obscure formats or formats that require proprietary software to open and reuse.|
|Well managed||The data is managed on an ongoing basis with a point of contact designated to assist with data use.||Data that is loaded on to a server and forgotten.|
Open data in Research Data Australia
In December 2014 it became possible for collection descriptions in Research Data Australia (RDA) to include information that highlights the 'openness' of the data being described. Collection records can be encoded as being openly accessible and openly licensed and include a link to download the data or access the data via a service. See Figure 1 below.
The new RDA interface released in April 2015 significantly raised the profile of data that has open characteristics. The interface provides strong visual indicators for data that is publicly accessible online and offers search and browse options that enable users to easily discover and access open data (see Figure 2). The goal is to maximise the reuse and citation of data described in RDA.
Take advantage of the opportunities these enhancements offer by ensuring your records provide the relevant RIF-CS encoding. Also, be sure to apply an open license where possible. A CC-BY licence is an open licence but also requires the data provider gets attribution when the data is reused.
Figure 1: A sample collection record encoded to highlight accessibility of the data described. The open license is also highlighted.
Figure 2: RDA home page showing options for searching or browsing publicly accessible data.
Resources and further information
- Open Data Collections
- ANDS Guide: Sensitive data: Publishing and sharing
- Open Knowledge Foundation (Australia)
- Australian Government Open Data Toolkit
- RECODE: Policy recommendations for open access to research data: an EU funded project
- Sowing the seed: incentives and motivations for sharing research data: a researchers perspective (study commissioned by Knowledge Exchange)