Digital preservation can be defined as a "series of managed activities necessary to ensure continued access to digital materials for as long as necessary." (Digital Preservation Handbook)
Ensuring access to, and use of enduring data assets is a shared responsibility across a research institution. An ideal digital preservation environment contains a mix of: policies, processes, and resources; including staff and technologies. The active management of research data reduces threats to their long-term research value and mitigates the risk of digital obsolescence, thus ensuring the data can be reused and re-analysed in the future.
Key concepts about data preservation
- Why preserve data? Digital Curation Centre, UK
- Principles and good practice for preserving data: ICPSR, USA
- Data archives and digital preservation: Consortium of European Social Science Data Archives, Norway
- ANDS Guide to File Formats
Institutional planning for research data preservation
The International Federation of Data Organisations for Social Science Data Preservation page lists and explains three broad areas for institutions to consider when addressing data preservation:
- Organisational infrastructure: policies, procedures, practices and people including legal and regulatory frameworks, preservation skills and knowledge, and all aspects of funding and resource planning
- Technological concerns: including equipment, software, hardware, media monitoring and refreshment strategies
- Data curation including pre-ingest initiatives; ingest functions; archival storage and preservation; and disseminating and providing access to data for its designated community.
Resources from ANDS partners
- Digital Preservation Strategy: Roadmap 2015-2025 (University of Melbourne)
- Research Data Management Policy 2014 (University of Sydney)
- UNSWorks Digital Preservation Policy (University of New South Wales)
- Guide to preservation (Curtin University Library)
- Guide to preservation (Deakin University)
- Guide to retention (Monash University)
Frameworks to help assess preservation needs
- Data Seal of Approval
- Audit and Certification of Trustworthy Digital Repositories such as the Recommendation for Space Data System Practices (2011)
- Digital Preservation Environment Maturity Matrix (National and State Libraries Australasia, 2013)
- COPTR (Community Owned digital Preservation Tool Registry): finding and evaluation tool to help preserve digital data eg BagIT, JHOVE, CINCH, FIDO
- Archivematica: web- and standards-based, open-source application which allows your institution to preserve long-term access to trustworthy, authentic and reliable digital content
- BitCurator Access: open-source software that supports the provision of access to disk images
- Preservica: comprehensive suite of OAIS (Open Archival Information System) compliant workflows for ingest, data management, storage, access, preservation – making digital preservation a natural part of your information life-cycle
Careful thought about files at the beginning of a research project can save a lot of time, money and heartache later in a project.
File formats govern the ability to use and reuse data in the future, with the ongoing accessibility of data an important consideration.
Formats more likely to be accessible in the future are non-proprietary, open, documented standard commonly usage by research community, standard representation (ASCII, Unicode), unencrypted and uncompressed.
- UK Data Archive's Data Formats table lists optimal data formats that are used for long-term preservation of data.
- ANDS Guide on File Formats covers institutional planning implications, covers obsolescence, file migration, open/proprietary formats, lossy/lossless formats, compression, standards and more.
File naming conventions
A File Naming Convention is a framework for naming your files in a way that describes what they contain and how they relate to other files. It is essential to establish a FNC before you begin to collect data to prevent against a backlog of unorganized files that could lead to misplaced or lost data.
Naming records consistently, logically and in a predictable way will distinguish similar records from one another at a glance, and by doing so will facilitate the storage and retrieval of records, which will enable users to browse file names more effectively and efficiently. Naming records according to agreed conventions should also make file naming easier for colleagues because they will not have to 're-think' the process each time.
The University of Edinburgh has a comprehensive yet easy to follow list (with examples and explanations) of 13 Rules for file naming conventions.
Having logical and known naming conventions in place can also help you with version control.
Because digital research data can so easily be changed, copied, or over-written, researchers need to be able to protect its authenticity. Working with outdated versions of files wastes research time and valuable data can be put at risk.
Version control can prevent this. Version control is the means by which different versions and drafts of a document (or file or record or dataset) are managed. This is particularly important if data is being used by multiple members of a research team, or if research files are shared across different locations.
Version control involves a process of naming and distinguishing between a series of draft documents (or file or record or dataset) which lead to a final (or approved) version, which in turn may be subject to further amendments. It also provides an audit trail for the revision and update of draft and final versions.
University of Leicester has some excellent resources on the subject.
In the case of research data, a new version of a dataset may be created when an existing dataset is reprocessed, corrected or appended with additional data. Data versioning is one means by which to track changes associated with ‘dynamic’ data that is not static over time.