Skip to content

Search for Research Data

Search the ANDS Site

Search
Search
http://www.ands.org.au http://www.ands.org.au

Guides

Data storage

Share
Share

This guide is for those who are interested in learning more about options for long term storage of high impact data for the benefit of the nation.

Why Data Storage matters

Making your data shareable and reusable is in part met by storage solutions which make research data discoverable and accessible over the long term:

  • This involves integration with metadata maintenance to continue to provide access and curation
  • Different choices of data storage have implications for metadata management and data access

Storage solutions discussed here are not mutually exclusive, but complementary. Each solution is designed to address different needs. In designing a data management strategy, institutions need to harness all the available types of storage. Focusing on just one (such as institutional data stores or repositories) foregoes the advantages of the others, and is unlikely to satisfy all the requirements of researchers themselves.

For example, institutional solutions are more sustainable than discipline based solutions, because they have concrete institutional commitments guaranteeing long-term storage. This solution satisfies the requirement for long-term access. However discipline solutions are where researchers search for data by default. Because they provide a more discipline appropriate environment to do discovery, they are a better fit for the discovery requirement.

Data storage solutions should enable a researcher's data to be made part of the scholarly record: discoverable, sustainably managed and well described in to order facilitate its reuse (where appropriate).

1. Obligations and Expectations

The Australian Code for the Responsible Conduct of Research (see the ANDS Guide Research data policy and the Australian Code for the Responsible Conduct of Research) was developed by the National Health and Medical Research Council (NHMRC), the Australian Research Council (ARC), and Universities Australia. Published in 2007, it is currently under review.

Section 2, Management of Research Data and Primary Materials, states:
"The central aim is that sufficient materials and data are retained to justify the outcomes of the research and to defend them if they are challenged. The potential value of the material for further research should also be considered, particularly where the research would be difficult or impossible to repeat."

In pursuit of this aim, two sections of the Code talk in particular about data storage.

  • Section 2.2: "Institutions must provide facilities for the safe and secure storage of research data and for maintaining records of where research data are stored".
  • Section 2.6: "Researchers must manage research data and primary materials in accordance with the policy of the institution [and] Retain research data, including electronic data, in a durable, indexed and retrievable form."

2. Types of Storage options/solutions

Description Example
Individual / Project Data StoreLocal storage controlled by researchers and project teamsUSB, hard drive on individual laptop, local drives (c:, group drives)
Institutional RepositoryThrough the ARROW program (2004-07), Australian universities have implemented an institutional repository. Although these repositories have been designed for document objects, e.g publications, many of them are based on software that allows them to store a range of data objectsMonash University Research Repository contains both cyclone tracking data and ethno-musicology fieldwork recordings
Institutional Data StoreA number of Australian universities and large research institutions are putting in place specialized data stores, optimised for large numbers of large objectsMonash University Large Research Data Store (LaRDS), and the University of Melbourne offers flexible solutions
National Data Store InfrastructureResearch Data Storage Infrastructure (RDSI) Project (2010-2014) is an Australian Government funded data Storage project which aimed to:
  • enhance data centre development
  • support retention and integration of nationally significant data assets into the national collaboration and data fabric
The RDS (2015-) project supersedes RDSI and focuses on supporting nine research domains including astronomy, climate and weather science, marine science, minerals and exploration data

NSW, Vic, Qld, SA, Tas, ACT and WA have storage for merit-allocated and commercial storage

Cloud Store
(Corporate Data Store Infrastructure)
Cloud storage is networked enterprise storage where data is stored in virtualized pools of storage which are generally hosted by third partiesThe typical cloud storage providers includes:
Discipline RepositoryA number of disciplines have well-established locations for storing and sharing data, managed typically by a consortium of institutional members. This storing and sharing often occurs in combination with the publication process

A list of 1,200 data repositories including many discipline repositories is available at re3data.org

Examples are:

Australian Data Archive (for social science data)

Aboriginal and Torres Strait Islander Data Archive

3. Which solution is right for my data?

The various storage solutions can be compared against two main criteria:

  1. The value of the data and its potential for reuse
  2. The types of components which give value to data, such as its discoverability, curation, and whether the storage is reliable, large and sustainable

The only type of solution which does not satisfy the goals of reuse and sharing data are the individual and project level solutions. The reason they remain so widely used is their convenience, low cost and that they can be set up rapidly to answer some of the research needs during a research project. To steer researchers away from such quick fix solutions, institutions need to encourage researchers to plan at the beginning of a project for how they will store data, and to budget for it. Such planning takes place in a data management plan, and is described in other ANDS guides; the Data management plans is one starting point.

Storage SolutionsSuitable for Working Data

Suitable for
High Value, High Reuse Data

WeaknessesStrengths
Individual / Project Data StoreYesNoHard for outsiders to discover, fragile sustainabilityCheap, fast, easy, convenient
Institutional RepositoryNoYesMay not be optimised for dataReliable, well curated, sustainable, supports bibliometrics
Institutional Data StoreYesYes In some cases not supporting discoveryReliable, sustainable
Cloud StoreYes Depends on sufficient data descriptionMay be subject to internet bandwidth and file security concerns, less control over the hardware where your data is storedAutomated backup, sharing and access to cheap compute to process stored data
National Data Store (RDS, etc)Yes YesOver longer term funding requiredHighly reliable, supports large datasets
Discipline RepositoryNo Yes Quality of long term funding support is highly variable

Supports researcher discovery

The University of South Australia has developed a one page storage options guide for their researchers.

UNISA Storage options

4. Interaction between storage solutions

Discipline stores are often the repository of record for a discipline, and institutions should develop a position on how to interact with discipline stores. Institutions may agree to store experimental data, with the discipline store referring out to the data as a registry. Institutional repositories may store data which has also been registered in discipline stores; if they do, the associated metadata may need to be adjusted for a more generalist audience, and the relative priority of deposit will need to be addressed.

Two examples of models of discipline repositories interacting with other repositories are:

  • International Virtual Observatory Alliance is a federated data store: data policies and metadata are managed centrally, but storing the data itself is shared among members of the consortium.
  • Worldwide Protein Data Bank is a consortium whose members are themselves collaboratories, and is funded by a variety of sources, with a single archive mirrored between its members.