National eResearch Architecture Taskforce (NeAT) Projects
The NeAT Projects developed discipline-specific eResearch services for national research communities. A summary of each of the NeAT projects and their status is given below:
- ASeSS ASSDA Services for e-Social Sciences
- AusCover Workflow Workflow Services to enable a Large-Scale Temporal-Spatial Ecosystem Digital Information Service
- Aus-e-Lit Collaborative Integration and Annotation Services for Australian Literature Communities
- BioFlow Bioinformatics Workflows
- BioSecurity Collaboration Platform
- Data-MINX A Data Fabric for Characterisation – Microscopy, Imaging, Neutron and X-ray Facilities
- DIAS-B Data Integration and Annotation Service in Biodiversity
- Human Variome Software and Data Support for the Australian Node of the Human Variome Project
- MACDDAP Marine and Climate Data Discovery and Access Project
- NCJRDN National Criminal Justice Research Data Network
- PODD Phenomics Ontology Driven Data Management
- Remote CT Remote Computed Tomography Reconstruction, Simulation and Visualisation Service
- SISS Spatial Information Services Stack
Aims: The ASeSS project will provide to the Australian Social Science Data Archive (ASSDA) community for the first time, simplified, unified, national access to ASSDA datasets from a variety of sources and in a wide variety of data formats. The ASeSS platform will also provide a standardised platform for topic and theme-specific ASSDA sub-archives to be created in for example, historical documents, indigenous data, and qualitative data. The activities undertaken during the ASeSS development will also place ASSDA in a strong position to cooperate with international data archives on data-sharing and co-development of standardised data tools.
Main participants: ASSDA and ANUSF.
Project Committee Chair: A/Prof. Ken Reed, Chair of the Executive Committee of the Australian Consortium for Social & Political Research (ACSPRI).
Achievements: The ASeSS project has progressed to schedule, focusing on conducting reviews of ASSDA workflows and also performing analyses of data sources, storage platforms and tools for use in web-based workflows and data management. Fedora Commons has been selected as the platform for the new system. The inaugural international Social Science Tools Consortium workshop was hosted at ANU to initiate planned, collaborative development of tools for social science data archives.
AusCover Workflow Workflow Services to enable a Large-Scale Temporal-Spatial Ecosystem Digital Information Service
Aims: AusCover is a component of the NCRIS Terrestrial Ecosystems Research Network (TERN) capability and is focused on organising remote sensing data sources and products for terrestrial ecosystems research. AusCover will enable, for the first time in Australia, the storage of these data sets online in a form that makes them directly accessible to the user community.
This project, which is part of the broader AusCover activity (which is just commencing), will provide easy-to-use workflow tools and services that will enable researchers to process AusCover data sets using the ARCS grid (or cloud) computing infrastructure. The same workflow tools will also assist AusCover data providers, allowing them to more easily process raw satellite data in order to generate derived data products in the standard data formats that users require.
Main participants: TERN Auscover, CSIRO, Curtin University and IVEC.
Project Committee Chair: Prof. Andy Pitman, Convenor of the ARC Research Network for Earth System Science.
Aims: The Aus-e-Lit Project is enhancing the existing AustLit web portal to provide data integration and search services across multiple national databases relevant to Australian literature, and developing tools to enable researchers to annotate information and to create compound digital objects capturing and explaining relationships in the corpus.
Main participants: AustLit and the UQ eResearch Centre.
Project Committee Chair: Dr. Paul Genoni, President of the Association for the Study of Australian Literature (ASAL).
Achievements: The project has delivered a prototype federated search across Austlit and a number of selected external databases and a full-text search across all AustLit data. These new services are available to users through the AustLit portal. Additional databases at the National Library of Australia will be added once the NLA’s internal federated search project is completed. In addition, the Aus-e-Lit project has developed prototypes of a Collaborative Annotation Service in conjunction with a Compound-object Authoring and Publishing Service, that have been demonsrated at a number of conferences and workshops, including the ASAL conference, and are currently being tested by a small group of selected researchers who are experts in the field of Australian literary studies.
Aims: To provide a simple web-based workflow tool to enable life sciences researchers to specify genomics and proteomics workflows that can be executed on the ARCS Grid and interface to the ARCS Data Fabric for data input and output. The system will be deployable as an “appliance” with the required software, middleware and server hardware, that can be installed at a site and managed remotely if required. The appliance can interface with local HPC systems and/or submit compute jobs to the ARCS Grid. The appliance concept will be tested with trial deployments at the IVEC Informatics Facility at Murdoch University, the Queensland Facility for Advanced Bioinformatics (QFAB) and the proposed Life Sciences Computation Centre (LSCC) in Victoria.
Main participants:Murdoch University (Centre for Comparative Genomics and Australian Bioinformatics Facility), QFAB, Victorian Life Sciences Computation Initiative.
Project Committee Chair: Dr. David Lovell, CSIRO
Aims: CSIRO's Australian Animal Health Laboratory (AAHL) is a national centre of excellence in laboratory diagnosis, research and technical advice in animal health. It plays a vital role in maintaining Australia's capability to quickly diagnose exotic and emerging animal diseases. One of the key features of AAHL is its high bio-containment facility. The rules governing personnel and material access into the bio-containment facility are critical and must be observed at all times or the ramifications for national biosecurity, trade and public health can be profound. The sharing of a broad range of data, including images and sequence data (such as pathology and visual records), must also comply with these strict procedures which is currently difficult to achieve.
The BioSecurity project will implement a collaboration platform at the AAHL facility, comprising two nodes, one on each side of the bio-containment barrier. The collaboration platform will greatly assist in the flow of complex information across the containment barrier. It will enable quick and transparent introduction of information from a variety of data sources including personal, organisational, specialised (such as from pathology and microscopy systems), live in-vivo animal experimental data (e.g. heart rates), sequence data and shared resources arising from simulation models and historical information in both visual and written form. The collaboration platform will provide video conferencing, shared displays, access to applications software for local machines and will integrate critical technology platforms (microscopy and pathology) into a common, shared visual workspace.
Following the successful implementation of the collaboration platform at the AAHL facility, two additional nodes will be implemented within state government departments that collaborate with AAHL on biosecurity research and emergency response. It is expected that this collaboration platform may have broader applicability within the Australian Biosecurity Information Network (ABIN).
Main participants: AAHL and the CSIRO ICT Centre.
Project Committee Chair: Dr. Martyn Jeggo, Director of the Australian Animal Health Laboratory.
Achievements:Detailed user requirements and workflow analysis have been undertaken at AAHL, and work is progressing on matching available advanced collaboration tools to the AAHL requirements and developing a system architecture for the collaboration platform.
Aims: The project will develop services to provide easy and reliable transfer of experimental data from all major Australian characterisation facilities to remote data storage or repositories, particularly the ARCS Data Fabric, and to enable authenticated sharing of data with specified collaborators. The project will also develop services for automated conversion of data to standard formats and automated generation of searchable metadata, and the means to publish data in a way that it is easily discoverable.
Main participants: Intersect, University of Sydney, VeRSI, Monash University, ANSTO, AMMRF, Australian Synchrotron.
Project Committee Chair: A/Prof. Ashley Buckle, Monash University.
Achievements: A comprehensive review was undertaken of the ICAT metadata catalog used by UK facilities, with initial pilot deployments of the ICAT and the DataPortal web interface at OPAL (ANSTO), the Australian Synchrotron and University of Sydney and initiation of pilot metadata ingest processes at OPAL, USyd AMMRF node and USyd crystallography. The DataPortal has been Shibboleth enabled to work with AAF. The VeRSI VBL Gateway service, which provides a web interface for data transfer from the synchrotron to repositories (including the ARCS Data Fabric) and the user’s desktop, was installed at OPAL. Development of a prototype Data Transfer Service to extend the capabilities provided by the VeRSI VBL Gateway, based on Open Grid Forum standards and proposed extensions. Collaborative linkages established with the UK STFC (eScience, National Grid Service, ISIS neutron source and Diamond synchrotron), Open Middleware Infrastructure Institute (OMII-UK) and UKOLN (repository and preservation expertise).
Project Restructure: The DataMINX project is in the process of being restructured. The project has had a number of problems, including a late start due to the merging of two proposals and a lengthy process for recruiting developers, and problems in managing a large project with many stakeholders and participants, and developers split over four locations.
The availability of additional funding in this area from the ANDS Data Capture program has led to the proposed restructuring of DataMINX into four projects, which has been agreed by the DataMINX Project Committee:
- Metadata capture at the Synchrotron, funding by ANDS
- Metadata capture at ANSTO, funding by ANDS
- A project focused on data and metadata management for AMMRF, funded by NeAT
- A Data Transfer Service, to be developed and deployed at the characterization facilities, and supported by ARCS
The stakeholders have agreed to manage the first two projects as single co-ordinated activity. Existing staff funded by DataMINX will continue to be funded to work on relevant activities until the new projects begin. At that point DataMINX will formally be wound up, and staff redeployed.
Aims: The DIAS-B project aims to provide core services that are required by the NCRIS 5.2.3 Atlas of Living Australia (ALA) capability to support management and discovery of biodiversity data resources, but will have broader applicability. The main outcomes that will be generated from this project are the implementation of an operational metadata repository for registration, search and integration of biodiversity data resources and an operational annotation repository for annotations relating to biodiversity data (including flagging of possible errors and dta quality issues), including services to create, recover and harvest annotations.
Main participants: ALA, CSIRO ICT Centre and the UQ eResearch Centre.
Project Committee Chair: Mr. Donald Hobern, Director of the Atlas of Living Australia.
Achievements: The project has evaluated options for a metadata repository and selected Fedora Commons. It has implemented a prototype metadata repository and completed initial metadata harvesting from a limited range of information sources. A prototype and demonstrator of the annotation environment has been developed that works on all browsers without requiring plugins, and was presented at the recent eResearch Australasia conference.
Aims: Diagnostic laboratories across Australia are funded to provide doctors with genetic tests on patients for a range of conditions. Compared to other medical tests, genetic tests can be rapid, non-invasive and decisive. These laboratories have consequently become the storehouses of important genetic information relevant to a range of diseases. To date, these laboratories, and the diagnosticians, researchers and clinicians who work with them, work in isolation from each other. Although some final results may be communicable via peer-reviewed publications or informal collaborations, working information is collected privately according to the requirements of each researcher or clinician and is not typically transferable between them.
The Human Variome project will create a national data repository called the Australian Human Variome Database (AHVD) to hold information on genetic variations associated with human disease that have been characterised by Australian laboratories and clinics. The project will also provide a service to enable submission of laboratory and clinic data to the AHVD using the existing workflows of these laboratories and clinics.
Main participants: Howard Florey Institute and VPAC.
Project Committee Chair: Emeritus Prof. John Coghlan, current University of Melbourne representative to the International Human Variome Project, and former Director of the Howard Florey Institute.
Achievements: A Principles and IT Architecture Committee (PITAC) has been convened to develop use cases and data standards, review existing relevant systems and technologies (including the work of BioGrid Australia), and develop a system architecture. PITAC has had some initial meetings and a draft of a system specifications document has been created. A document specifying the long-term governance and operational structure for the Australian node of the Human Variome project has also been drafted.
Aims: The MACDDAP project is developing software that will allow large marine and climate data sets to be integrated into a "virtual database" and delivered to researchers through a wider range of data streams, This will enable marine and climate data throughout Australia to be more discoverable, searchable and conformable with standard vocabularies and enable researchers to collect and aggregate data across disciplines for knowledge discovery.
Main participants: IMOS, TPAC, CSIRO, Bureau of Meteorology and Architecta.
Project Committee Chair: Mr. Tim Moltmann, Director of the Integrated Marine Observing System (IMOS).
Achievements: So far the project has delivered:
- A new release of the TPAC Digital Library with geo-spatial search capabilities.
- A first release of the data aggregator service, which enables users to extract data from different data sets, each containing various attributes, time periods and geospatial extents, and combine them into a single user product.
- A new version of the THREDDS Data Server (TDS4.0) providing users with the ability to create maps according to the Open Geospatial Consortium (OGC) Web Map Service standard.
Work has also been progressing on developing translators between different data formats, migrating the TPAC Digital Library web portal to a more modern framework, adding support for authentication using AAF, and enhancements to the IMOS-GeoNetwork Metadata Entry and Search Tool (MEST) to provide improved operability with OPeNDAP and OGC data servers and with AAF security capabilities.
Aims: There is growing demand for quantitative research in the Australian criminal justice sector. Agencies are increasingly calling on researchers and criminologists to provide the evidence base to enable efficient service provision and for the development of effective crime prevention and reduction strategies. Australia is ‘data rich’ with information on reported crime levels, arrest rates, court processing, sentencing, imprisonment rates and re-offending routinely collected and held in government and university-based centres across Australia. However researchers do not have equal or easy access to these resources.
The NCJRDN project aims to increase the quantity and quality of criminal justice sector research that can be undertaken, by reducing the barriers to accessing that data and promoting cross-jurisdictional research and collaboration. The project will deliver the following services to the criminal justice sector research community:
- “Pooling” currently available administrative datasets to provide a single point of access to various de-identified, unit-record level research datasets sourced from criminal justice sector agencies across Australian jurisdictions
- Encouraging data custodians to release data for research purposes by establishing controlled access mechanisms which are ethically sound and maintain the highest levels of confidentiality
- Developing minimum standards for storing and managing criminal justice data
- Showcasing tools being developed by criminal justice sector researchers in Australia
Main participants:UWA, with input from State Coordinating Agencies in all states and territories.
Project Committee Chair: Dr. Adam Tomison, Director of the Australian Institute of Criminology.
Achievements: A survey of existing data holdings of interest has been initiated with State Coordinating Agencies and data custodians. A scan to identify related data repositories has been undertaken, with surveys sent to managers of relevant repositories asking about data management, data policy and technical information about the structure of their repository.
Aims: The Integrated Biological Sciences component of NCRIS contains two major Phenomics initiatives - The Australian Plant Phenomics Facility (APPF), specialising in phenotyping of crop and model plant species; and the Australian Phenomics Network (APN), which specialises in the phenotyping of mouse models. Both facilities have common requirements to gather and annotate data from both high and low throughput phenotyping devices.
The PODD project will deliver a data management service that can manage multiple formats (text, image, video) and not be constrained to a finite set of phenotyping platforms and data formats. The project will also provide the capability to manage a repository of the associated metadata using Fedora Commons. A range of tools and other features will be developed to provide web-based discovery interfaces for users, external repositories and services and will support the automatic capture and annotation of data and metadata from instrumentation, where possible. The project will also deliver the ability to publish data, or make it publicly available after a pre-determined period.
Main participants:APPF, APN and the UQ eResearch Centre.
Project Committee Chair: Ms Adrienne McKenzie, Head of the Australian Phenomics Network Services.
Achievements:User requirements and use cases have been collected and documented, design of data models is well advanced, design of user interfaces has begun, and a development Fedora Commons repository has been implemented and tested with some phenomics data.
Aims: The main focus of the Remote CT project will be to develop a three-part service for remote 3D Computed Tomography (CT) reconstruction, simulation, analysis and visualization. The service will be deployed at the Imaging and Medical Beamline at the Australian Synchrotron and the ANU micro-CT facility, and is expected to be applicable to other facilities. The first component of the service will provide researchers with the functionality for access from their home institution to CT data processing at the remote processing facility (which will be the MASSIVE supercomputer for the Australian Synchrotron), providing for different modes of CT reconstruction, analysis and simulation. The second component of the service will provide 3D data visualisation capabilities against datasets located remotely and also allow seamless transfer of the resulting images to the remote user desktop. This functionality will be useful for preliminary analysis of raw and reconstructed CT data, as well as for producing final publication and presentation quality images, and also for remote collaboration between different teams participating in CT experiments. The third component of the service will enable secure and efficient transfer of large CT file sets to and from the facilities, providing the user with a transparent user-friendly interface.
Main participants:CSIRO, VeRSI, VPAC, ANU and the Australian Synchrotron.
Project Committee Chair: Prof. Rob Lewis, Director of the Monash Centre for Synchrotron Science.
Achievements:Some initial use cases have been documented, an initial code review has been done on the software prototype, CSIRO have agreed to release the code under an open source license, and a technical working group is progressing the system design. A small test cluster is being installed at the synchrotron, to be used for testing until MASSIVE is in place.
Aims: The establishment of the Spatial Information Services Stack (SISS) is a consequence of the increasing complexity of spatial data integration and the need to interact with multiple communities of interest through access to spatial data, information and services based on open standards for interoperability. SISS aims to provide an information exchange layer above data sources and to work with the providers and consumers of spatial data to assist them in adopting the access and exchange mechanisms. Through the SISS project some of the core services needed for the creation of a federated spatial data commons will be developed, including an OGC Catalog Service to provide a register of spatial data registries and a Discovery Portal providing access to the registers. SISS will build on existing open source software to develop common software components that can be deployed with spatial data holdings to make the holdings accessible within the spatial data commons.
Main participants: Auscope, CSIRO and IVEC.
Project Committee Chair: Dr. Bob Haydon, CEO of AuScope.
Achievements:The SISS technology stack is now operational and the AuScope Discovery Portal uses all components to source data from multiple government organizations and research groups who have deployed the stack. Feedback from government agencies and research groups during the AuScope mid-term review indicates a very positive view of the SISS components and the development approach and deployment assistance being provided. The AuScope road show and Spatial@Gov conference have also been successful in demonstrating the SISS and assisting organizations with understanding what deployment requires from them. As a result, all State and Territory Geological Surveys are committed to deploying the SISS against Mineral Occurences, Mining Activity, Mines and Borehole header databases and the Bureau of Meteorology announced at Spatial@Gov that they also intend on using SISS. In addition to the government agencies, the CSIRO Minerals Down Under (MDU) flagship program will deploy the SISS against a number of geochemistry datasets and there will be improvements to the data services from the Centre of Excellence for 3D Mineral Mapping.