Data capture is the process of collecting data which will be processed and used later to fulfil certain purposes. Ways of capturing data can range from high end technologies (e.g. Synchrotron, sensor networks and computer simulation models) to low end paper instruments used in the field. Data with good metadata attached at the point of capture can expediate data sharing, publishing and citation.
Data capture essentials
Collecting data is a costly activity. Planning ahead will ensure captured data that are valid and support reuse.
|1. Decide what to collect||Is there existing data which could be reused or incorporated into new data?|
|2. Data capture tools|
Data capture tools should:
|3. Collection process||Processes should be documented, transparent and reproducible.|
|4. Compliance with privacy regulations||The process of capturing data is compliant with privacy and ethics regulations. See publishing sensitive data.|
Metadata capture essentials
Metadata are of great value - the more information there is about data, the greater the value of the data. Metadata capture can be automated or manual. Either way, metadata should be captured as closely as possible to the creation of the data. Metadata essentials include:
- Using a community or discipline metadata standard.
- Using a community or discipline vocabulary.
- Follow institutional policies and discipline procedures for metadata capture and management
- Link metadata to data to ensure provenance and citation accuracy.
- Ensure metadata can support reuse: eg date, parameter settings, callibrations, software used, and computing environment etc.
ANDS Data Capture program
The Data Capture program was undertaken in 2010-2013; the 69 projects at over 30 institutions aimed to simplify for researchers the process of routinely capturing data and rich metadata as close as possible to the point of creation, and depositing these data and metadata into well-managed stores. This was achieved by augmenting and adapting existing, commonly used data creation and capture infrastructure. Data creation and data capture phases of research were fully integrated to enable effective ingestion into the research data and metadata stores at the institution or elsewhere.
The ANDS Data Capture projects were designed to build infrastructure on at least one of three levels, although most worked across these levels:
- Instruments: to build infrastructure "pipes" between instruments and well supported data and metadata storage facilities.
- Software: to create these "pipes" and other infrastructure to enable better management and descriptions of research data and associated metadata.
- Management: use of this infrastructure to better manage and describe research data and associated metadata.
This integration makes it easier for researchers to contribute data to Research Data Australia directly from the lab, instrument, fieldwork site, etc. It also ensures that higher quality metadata (critical for reuse and discovery) is produced through automated and semi-automated systems.
Selected Data Capture projects
|Spatially Integrated Social Science||These online tools allow researchers to quickly access rich Australian socio-spatial datasets related to voting outcomes and census data, conduct statistical modelling, and visualise spatial relationships in the data.|
|Biomedical Data Platform (Molecular Biology)||The MyTardis solution seamlessly facilitates the capture and annotation of datasets and exposes such data within searchable metadata stores like ANDS' Research Data Australia.|
|Macquarie Papyri||Macquarie Papyri facilitates access to and understanding of the collection of some 900 ancient papyri and related items.|
|Redmap Australia||Redmap (Range Extension Database and Mapping) invites Australians to share sightings of marine species that are 'uncommon' to their local seas.|
|ExSite9||Supports the description of highly structured or unstructured (multimedia) data captured out in the field.|
|Smart Water||Data capture from instruments, reporting on domestic water use.|
|Sydney Harbour Observatory Data Capture||Portal for finding, visualising, and downloading geospatial data relating to Sydney Harbour.|
Best Practices for Preparing Environmental Data Sets to Share and Archive by the Oak Ridge National Laboratory Distributed Active Archive Center.