Skip to content

Search for Research Data

Search the ANDS Site

Search
Search
http://www.ands.org.au http://www.ands.org.au
Share
Share

Overview
Background
Establishing a harvest to DCI
Optimising records for DCI
Assessing your records for DCI readiness
Table 1: RIF-CS to DCI metadata transform table (crosswalk)
Appendix 1: Additional encoding notes

Overview of the Research Data Australia - Data Citation Index collaboration

ANDS is working with Clarivate Analytics (formerly Thomson Reuters) to implement a process that enables Research Data Australia (RDA) records to be transformed to a Data Citation Index (DCI) compliant  format and harvested to DCI from ANDS. This means individual institutions are able to have their records indexed by DCI without having to develop their own feed to DCI and manage the ongoing process.

During 2013, ANDS commenced a pilot project with a small number of provider institutions to test cross-walks, assess metadata quality and establish requirements for a business-as-usual service. The pilot project has now been completed and provided valuable input to service planning and development.

We are now working on a data source by data source basis to progressively implement a production feed of records from RDA to DCI. This involves close collaboration between Clarivate Analytics, ANDS and provider institutions to assess records and establish business processes for the production feed. Some  providers will be able to move forward with this quite quickly while in other cases, some changes to the RIF-CS metadata may be required before a production harvest to DCI can be established.

Background

Records in the Data Citation Index are intended to:

  • provide attribution for a data object to the person(s) and institution(s) creating the data
  • provide a standard form of citation for each data object to encourage citation (the format of the data citation recommended by Clarivate Analytics follows the DataCite guidelines)
  • track citations and reuse of data in the scientific literature and provide bidirectional links between research articles and the data they use or generate
  • provide a means to discover data associated with research publications.

Broadly, in order to be accepted into the Data Citation Index the records in a data source:

  • must be able to provide minimum required metadata required to validate against the DCI schema
    • ANDS have developed a mapping from RIF-CS to the DCI schema and a guide to optimising records for DCI compliance.
    • Elements needed to create a data citation must be present in the metadata.
  • should describe data objects held in repositories under the control of the ANDS partner or data provider
    • Records should not point to institutional web pages or replicate metadata descriptions for data held in other repositories, e.g. PANGAEA More information(PDF, 0.5 MB).
    • If your data source does contain such records, they can be "tagged" for exclusion from the harvest to DCI.
  • should describe data collections, datasets or repositories - see RIF-CS Type
  • meet the Clarivate Analytics repository evaluation, selection and coverage policies.

Establishing a harvest from your data source to the Data Citation Index

The high level workflow for including an RDA data source in the DCI harvest involves:

  • RDA provider contacts their Outreach Officer or services@ands.org.au to express interest in establishing a DCI harvest.
  • ANDS and the provider review and discuss record quality and transform as well as the proposed business processes and agree to proceed.
  • ANDS provides an initial harvest from the data source to DCI and advises Clarivate Analytics of the nominated contact for the data source.
  • Clarivate Analytics assess a sample of records in the DCI output against their criteria for inclusion as described above. They also check quality of content, compliance with the DCI metadata schema and the richness of the record as assessed against the content available in the source repository.
  • Clarivate Analytics staff will liaise directly with the nominated contact for the data source to discuss the metadata assessment and to create a Repository Record for the data source in DCI. This record provides the Repository Name in each DCI record. All collection records for the data source will be linked    to this record in DCI. The screenshot below shows an example.
  • Production harvest from the data source to DCI established.
  • Clarivate Analytics provide a DCI admin log-in for use by the nominated data source contact.
  • Records are reharvested from RDA to DCI on a regular basis.
Fig 1: DCI Repository record. All records from a data source will be linked to this record in the Data Citation Index

dci repository record

Optimising records for the DCI transform

The information provided in Table 1 and associated notes, is intended to complement and augment the guidance provided in RDA Content Providers Guide. Records that comply with best practice and incorporate the DCI specific guidance provided, should validate against the DCI schema. It is preferred that in the RIF-CS citationInfo element, the citationMetadata type format is  used. This will ensure the DCI mandatory elements and data citation are accurately represented in the DCI. The screenshot below shows an "exemplar" record in DCI.

Fig 2: Exemplar record in the Data Citation Index

DCI record

Assessing your records for DCI readiness

An early step in establishing a harvest to DCI is to review the DCI transform of a representative sample of records from your data source. While the focus here is on the transform of records, it is important to also carefully review the accuracy and completeness of content in your records. Incorrect  content (for example, misspelling of names) will affect the discoverability and capture of citation metrics for your records. It is also important that the records describe objects that are in scope for the DCI, e.g. they are not secondary records describing data held elsewhere. More information (PDF, 0.5 MB).

To enable you to review your records, ANDS has:

  • documented the RIF-CS to DCI transform mapping
  • created a simple web service that enables Data Source Administrators (DSAs) to preview records in their sata Source that have been transformed to the DCI metadata format (XML output) using the mapping. To use the web service, in the production or demo environment, you need to have DSA permissions. Once    logged in, you can preview your records in RDA Production or Demo.

If you prefer, you can use your web browser to access the service to generate DCI XML output for a specific record. To do this, you can either:

OR

where the URL_FROM_RDA is that part of the URL after the slash (/) when viewing the record in RDA. For example:
http://researchdata.ands.org.au/aad-benthic-sampling-database

Fig 3: Screenshot of the DCI preview tool

DCI preview screenshot

Depending on your browser, you may need to "View Source" to see the XML output.

Fig 4: Screen shot of XML output (fictitious record)

DCI XML

If you have any queries about using the web service, or assessing your records, please contact services@ands.org.au.

Table 1: RIF-CS to DCI transform v.3

* indicates the element is required by DCI

# indicates the element will be used to populate the citation element in DCI using the following convention: Author/s (Year): Title. Source. Source URL

Note: where more than one RIF-CS element can be used to populate a DCI element, the options are given in preferred order.

Follow the hyperlink to see additional DCI encoding advice about the RIF-CS element.

DCI element name

DCI description

Maps to RIF-CS element

Record ID*

The unique and persistent identifier for the record as it appears in the repository/database. This will be used to identify changed/updated records in future updates received

registryObject:key

Date provided*

Date record was extracted for Clarivate Analytics

System generated

Repository Name*

Name of repository/database from which record was extracted

registryObject@Group

This will be used to associate the record with the correct repository record

Owner*

Owner of the data repository/database

registryObject@Group

Author*#

Names of the authors/persons/groups/organization responsible for creating the data - the people who should gain credit from the data citation. Typically one or more name strings, ideally parsed into Lastname, Forename, Suffix. Group Authors can be included

registryObject:collection:citationInfo:citationMetadata:contributor

OR

relatedObject:Party:name:relationType=IsPrincipalInvestigatorOf

OR

relatedObject:Party:name:relationType=author

OR

relatedObject:Party:name:relationType=coInvestigator

OR

relatedObject:Party:name:relationType=isOwnedBy

OR

relatedObject:Party:name:relationType=hasCollector

OR

registryObject@Group

Author role

One role per author as an attribute of the Author element. Role in creating the data resource, e.g., Editor, Creator, Curator, Repository Manager, Principal Investigator etc. If possible indicate the role to be applied across all authors if not available on individual records

relatedObject:Party:relation

(where Type is one of those specified under Author above)

Researcher ID

ResearcherID (from http://www.researcherid.com) or ORCID ID if available

registryObject:Party:identifier:type=[any]

(where the party is mapped to "Author" in the citation)

any party identifier will be transferred

Author address

One address per author if available, with identification of which author it refers. Ideally provide parsed into Organization, Address, City, ZIP/Postcode, Country. If unable to provide parsed, provide complete address string with elements separated by commas or line breaks

relatedObject:Party:physical address


relatedObject:Party:electronic address

Title*#

Title of the data resource featured in the record. This could be a data study, data set, or other data resource. Data studies with multiple data sets should ideally be supplied with the data sets as separate records referencing the parent study; if this is not possible, a combined record for the data            study can include references to the data sets used in the study.

registryObject:collection:name

(Title as displayed in RDA)

Source URL*#

Full URL/DOI or other web URI which can be used to link to the resource

registryObject:collection:citationInfo:citationMetadata:identifier: type="doi"

OR

registryObject:collection:citationInfo:citationMetadata:identifier: type="handle"

OR

registryObject:collection:citationInfo:citationMetadata:identifier: type="uri"

OR

registryObject:collection:citationInfo:citationMetadata:identifier: type="purl"

OR

registryObject:collection:identifier:type="doi"

OR

registryObject:collection:identifier:type="handle"

OR

registryObject:collection:identifier:type="uri"

OR

registryObject:collection:identifier:type="purl"

OR

registryObject:collection:citationInfo:citationMetadata:identifier: type="url"

OR

registryObject:collection:location:address:electronic:type="url"

Source*#

Publisher/distributor of the data resource. This may be the repository itself or could be a third party supplier or institution; also include any recognised abbreviation of the publisher if available.

registryObject:collection:citationInfo:citationMetadata:publisher

OR

registryObject@Group

Year*#

A publication year for the data is required to enable citation tracking. Provide either the:


Date the data resource with the given title was created by the Author(s) - not the date provided to Clarivate Analytics


or


Date the data resource with the given title was deposited in the online collection

registryObject:collection:citationInfo:citationMetadata:date:type= "publication date"

OR

registryObject:collection:citationInfo:citationMetadata:date: type="issued"

OR

registryObject:collection:citationInfo:citationMetadata:date:type= ʺcreatedʺ

OR

registryObject:collection:dates:type=ʺissuedʺ

OR

registryObject:collection:dates:type=ʺavailableʺ

OR

registryObject:collection:dates:type=ʺcreatedʺ

OR

registryObject:Collection@dateModified

OR

registryObject:Collection@dateAccessioned

OR

YYYY the record was ingested into RDA

Abstract*

Description of the digital resource

registryObject:collection:description:type="full"

AND/OR

registryObject:collection:description:type="brief"

Where present, the types below will also be transformed

registryObject:collection:description:type="SignificanceStatement"

registryObject:collection:description:type="Notes"

registryObject:collection:description:type="Lineage"

Parent record reference

ID of parent record (E.g. if dataset is related to a study etc. ) and the parent/child are provided as a separate records

registryObject:relatedObject:key:relationType=isPartOf

Version

Version identifier for the version of the data

registryObject:collection:citationInfo:citationMetadata:version

Rights/

Licensing

Any rights restriction or licensing statements for the data (eg CC)

registryObject:collection:rights:type=rightStatement

AND

registryObject:collection:rights:type=accessRights

AND

registryObject:collection:rights:type=licence

Language

Language of the data set in the repository

Default to English

Author keywords

Any author keywords provided by the data creator to describe the data

registryObject:collection:subject

Geospatial data

Geographical location of the study, including country, or geospatial coordinate data

registryObject:coverage:spatial

Time

Time period relating to the data

registryObject:coverage:temporal

Methodology

Method employed in obtaining the data

registryObject:collection:relatedInfo:type="reuseInformation"

Named person

Personal or organisational names relating to the subject of the data

registryObject:collection:subject:type="AU-ANL:PEAU"

AND

registryObject:collection:subject:type="orcid"

Grant No.

Grant number(s)

registryObject:activity:identifier[@type='arc' or 'nhmrc']

(where relation Type =isOutputOf)

Funding organisation

Funding organisation(s)

registryObject:activity:party:title

(where relation Type =isFundedBy AND where activity relation Type=isOutputOf)

Citations

Data relating to the bibliographic citations and references associated with the data resource.
If possible bibliographic fields should
be sent separately, but a concatenated reference is also acceptable. PubMedID can be provided as a substitute for a full citation

registryObject:collection:relatedInfo:type="publication"

Appendix 1: Additional notes to clarify and optimise RIF-CS encoding for the Data Citation Index

RIF-CS Group

  • Displays in DCI as "Repository Name"
  • Displays in DCI as "Source" where citationInfo:citationMetadata:publisher is not provided
    "Source" will be used to populate the Publisher element in the DCI citation statement where citationInfo:citationMetadata:publisher is not provided
    "Repository Name" is used to link all provider records to the appropriate DCI repository record.

RIF-CS Related Object (Parties)

  • The DCI information model does not provide for separate records for related parties, activities and services. So, where citationMetadata is not provided in the RIF-CS, related Parties of specified Types are used to populate the required "Author" element in DCI.
  • As this element is used to form the citation statement in DCI, care should be taken to select the appropriate Type for each related Party so that only those who should be credited in the citation statement are included.
  • The accepted Types are documented in the transform table.

RIF-CS Related Info (type=publication)

  • Displays in DCI as "Citations". Do not confuse with RIF-CS Citation Information.
  • While this element is optional in DCI, it is highly recommended that this information be provided where possible as it allows Clarivate Analytics to create bi-directional links between a record in the DCI to records for related publications held elsewhere in the Web of Science suite of products. For users,    this enables easy navigation between related resources in the Web of Science.
  • Where possible, provide a PubMedID for a related publication as this is sufficient information for Clarivate Analytics to link to a related publication. Otherwise, include a fullCitation for the related publication as a Note in the relatedInfo element and choose relation type=isCitedBy as per the (fictitious)    example below.
  • <relatedInfo type="publication">
    <identifier type="DOI">10.1016/j.amjmed.2014.06.036</identifier>
    <title>Acute porphyrias in the USA: features of 108 subjects from Porphyria Consortium</title>
    <relation type=isCitedBy</relation>
    <notes>Bonkovsky, HL, Maddukuri, VC, Yazici,  C. (2014) Acute porphyrias in the USA: features of 108 subjects from Porphyria Consortium.  American Journal of Medicine, vol 16, 2014, p.1-6</notes>
    </relatedInfo>

RIF-CS Citation Information

  • Displays in DCI as "How to cite this resource". Do not confuse with DCI Citations.
  • Where it is provided, citationMetadata is the preferred source of content for the DCI citation elements including: Author (Year) Title. Publisher. Identifier.
  • Where citationMetadata is not provided, the elements making up the citation will be populated according to the transform table. Note that the Publisher will be inferred from RIF-CS Group.

Tagging records to be excluded from your data source harvest to DCI

  • Some RDA data sources will contain records that are considered 'out of scope' for DCI (but in scope for RDA). For example, those RDA records that describe data that is held in an external repository such as PANGAEA where it is also described and discoverable. This may occur for example, where a researcher    has elected to deposit their data with a domain specific repository but also wishes the data to be discoverable through their institutional metadata store and RDA. In such cases, the 'out of scope' records can be tagged in the ANDS Registry where they will be programmatically excluded from the harvest    to DCI but still appear in RDA.
  • Use the tag excludeDCI and create it as a 'secret' rather than 'public' tag so that it is not visible to RDA users. More about tagging.
Date Change history
9 December 2013 Updated to provide details of Phase 2 of the pilot project. Transform (crosswalk) updated to v2.
11 September 2014      Updated to reflect business-as-usual service. Transform (crosswalk) updated to v3.
6 March 2017Updated to change Thomson Reuters to Clarivate Analytics