This guide is for those who are interested in learning more about options for long term storage of high impact data for the benefit of the nation.
Why Data Storage matters
Making your data shareable and reusable is in part met by storage solutions which make research data discoverable and accessible over the long term:
- This involves integration with metadata maintenance to continue to provide access and curation
- Different choices of data storage have implications for metadata management and data access
Storage solutions discussed here are not mutually exclusive, but complementary. Each solution is designed to address different needs. In designing a data management strategy, institutions need to harness all the available types of storage. Focusing on just one (such as institutional data stores or repositories) foregoes the advantages of the others, and is unlikely to satisfy all the requirements of researchers themselves.
For example, institutional solutions are more sustainable than discipline based solutions, because they have concrete institutional commitments guaranteeing long-term storage. This solution satisfies the requirement for long-term access. However discipline solutions are where researchers search for data by default. Because they provide a more discipline appropriate environment to do discovery, they are a better fit for the discovery requirement.
Data storage solutions should enable a researcher's data to be made part of the scholarly record: discoverable, sustainably managed and well described in to order facilitate its reuse (where appropriate).
1. Obligations and Expectations
The Australian Code for the Responsible Conduct of Research (see the ANDS Guide Research data policy and the Australian Code for the Responsible Conduct of Research) was developed by the National Health and Medical Research Council (NHMRC), the Australian Research Council (ARC), and Universities Australia. Published in 2007, it is currently under review.
Section 2, Management of Research Data and Primary Materials, states:
"The central aim is that sufficient materials and data are retained to justify the outcomes of the research and to defend them if they are challenged. The potential value of the material for further research should also be considered, particularly where the research would be difficult or impossible to repeat."
In pursuit of this aim, two sections of the Code talk in particular about data storage.
- Section 2.2: "Institutions must provide facilities for the safe and secure storage of research data and for maintaining records of where research data are stored".
- Section 2.6: "Researchers must manage research data and primary materials in accordance with the policy of the institution [and] Retain research data, including electronic data, in a durable, indexed and retrievable form."
2. Types of Storage options/solutions
|Individual / Project Data Store||Local storage controlled by researchers and project teams||USB, hard drive on individual laptop, local drives (c:, group drives)|
|Institutional Repository||Through the ARROW program (2004-07), Australian universities have implemented an institutional repository. Although these repositories have been designed for document objects, e.g publications, many of them are based on software that allows them to store a range of data objects||Monash University Research Repository contains both cyclone tracking data and ethno-musicology fieldwork recordings|
|Institutional Data Store||A number of Australian universities and large research institutions are putting in place specialized data stores, optimised for large numbers of large objects||Monash University Large Research Data Store (LaRDS), and the University of Melbourne offers flexible solutions|
|National Data Store Infrastructure||Research Data Storage Infrastructure (RDSI) Project (2010-2014) is an Australian Government funded data Storage project which aimed to:|
NSW, Vic, Qld, SA, Tas, ACT and WA have storage for merit-allocated and commercial storage
(Corporate Data Store Infrastructure)
|Cloud storage is networked enterprise storage where data is stored in virtualized pools of storage which are generally hosted by third parties||The typical cloud storage providers includes:|
|Discipline Repository||A number of disciplines have well-established locations for storing and sharing data, managed typically by a consortium of institutional members. This storing and sharing often occurs in combination with the publication process|
A list of 1,200 data repositories including many discipline repositories is available at re3data.org
Australian Data Archive (for social science data)
3. Which solution is right for my data?
The various storage solutions can be compared against two main criteria:
- The value of the data and its potential for reuse
- The types of components which give value to data, such as its discoverability, curation, and whether the storage is reliable, large and sustainable
The only type of solution which does not satisfy the goals of reuse and sharing data are the individual and project level solutions. The reason they remain so widely used is their convenience, low cost and that they can be set up rapidly to answer some of the research needs during a research project. To steer researchers away from such quick fix solutions, institutions need to encourage researchers to plan at the beginning of a project for how they will store data, and to budget for it. Such planning takes place in a data management plan, and is described in other ANDS guides; the Data management plans is one starting point.
|Storage Solutions||Suitable for Working Data|
|Individual / Project Data Store||Yes||No||Hard for outsiders to discover, fragile sustainability||Cheap, fast, easy, convenient|
|Institutional Repository||No||Yes||May not be optimised for data||Reliable, well curated, sustainable, supports bibliometrics|
|Institutional Data Store||Yes||Yes||In some cases not supporting discovery||Reliable, sustainable|
|Cloud Store||Yes||Depends on sufficient data description||May be subject to internet bandwidth and file security concerns, less control over the hardware where your data is stored||Automated backup, sharing and access to cheap compute to process stored data|
|National Data Store (RDS, etc)||Yes||Yes||Over longer term funding required||Highly reliable, supports large datasets|
|Discipline Repository||No||Yes||Quality of long term funding support is highly variable|
Supports researcher discovery
The University of South Australia has developed a one page storage options guide for their researchers.
4. Interaction between storage solutions
Discipline stores are often the repository of record for a discipline, and institutions should develop a position on how to interact with discipline stores. Institutions may agree to store experimental data, with the discipline store referring out to the data as a registry. Institutional repositories may store data which has also been registered in discipline stores; if they do, the associated metadata may need to be adjusted for a more generalist audience, and the relative priority of deposit will need to be addressed.
Two examples of models of discipline repositories interacting with other repositories are:
- International Virtual Observatory Alliance is a federated data store: data policies and metadata are managed centrally, but storing the data itself is shared among members of the consortium.
- Worldwide Protein Data Bank is a consortium whose members are themselves collaboratories, and is funded by a variety of sources, with a single archive mirrored between its members.