Skip to content

Search for Research Data

Search the ANDS Site


Establishing a harvest from your data source to the Data Citation Index


The high level workflow for including a Research Data Australia (RDA) data source in the Data Citation Index (DCI) harvest involves:

  • RDA provider contacts their Outreach Officer or to express interest in establishing a DCI harvest 
  • ANDS and the provider review and discuss record quality and transform as well as the proposed business processes and agree to proceed
  • ANDS provides an initial harvest from the data source to DCI and advises Clarivate Analytics of the nominated contact for the data source
  • Clarivate Analytics assess a sample of records in the DCI output against their criteria for inclusion. They also check quality of content, compliance with the DCI metadata schema and the richness of the record as assessed against the content available in the source repository
  • Clarivate Analytics staff will liaise directly with the nominated contact for the data source to discuss the metadata assessment and to create a Repository Record for the data source in DCI. This record provides the Repository Name in each DCI record. All collection records for the data source will be linked to this record in DCI. The screenshot below shows an example
  • production harvest from the data source to DCI established
  • Clarivate Analytics provide a DCI admin login for use by the nominated data source contact
  • records are reharvested from RDA to DCI on a regular basis.

Fig 1: DCI Repository record. All records from a data source will be linked to this record in the Data Citation Index


Optimising records for the DCI transform

The information provided in Table 1 and associated notes, is intended to complement and augment the guidance provided in RDA Content Providers Guide. Records that comply with best practice and incorporate the DCI specific guidance provided, should validate against the DCI schema. It is preferred that in the RIF-CS citationInfo element, the citationMetadata type format is used. This will ensure the DCI mandatory elements and data citation are accurately represented in the DCI. The screenshot below shows an "exemplar" record in DCI.

Fig 2: Exemplar record in the Data Citation Index

exemplar-rottnest />

Assessing your records for DCI readiness

An early step in establishing a harvest to DCI is to review the DCI transform of a representative sample of records from your data source. While the focus here is on the transform of records, it is important to also carefully review the accuracy and completeness of content in your records. Incorrect content (for example, misspelling of names) will affect the discoverability and capture of citation metrics for your records. It is also important that the records describe objects that are in scope for the DCI, e.g. they are not secondary records describing data held elsewhere.

More information on DCI readiness (PDF, 0.5 MB)

To enable you to review your records, ANDS has:

  • documented the RIF-CS to DCI transform mapping
  • created a simple web service that enables Data Source Administrators (DSAs) to preview records in their data source that have been transformed to the DCI metadata format (XML output) using the mapping. To use the web service, in the Production or Demo environment, you need to have DSA permissions.

Fig 3: Screenshot of the DCI preview tool


You can also choose to export the transformed records as a file.

Fig 4: Screenshot of the DCI export tool

DCI export

Depending on your browser, you may need to "View Source" to see the XML output.

Fig 5: Sample XML output


If you have any queries about using the web service, or assessing your records, please contact

Table 1: RIF-CS to DCI transform v.3

* indicates the element is required by DCI

# indicates the element will be used to populate the citation element in DCI using the following convention: Author/s (Year): Title. Source. Source URL

Note: where more than one RIF-CS element can be used to populate a DCI element, the options are given in preferred order.

Follow the hyperlink to see additional DCI encoding advice about the RIF-CS element.

DCI element name

DCI description

Maps to RIF-CS element

Record ID*

The unique and persistent identifier for the record as it appears in the repository/database. This will be used to identify changed/updated records in future updates received


Date provided*

Date record was extracted for Clarivate Analytics

System generated

Repository Name*

Name of repository/database from which record was extracted


This will be used to associate the record with the correct repository record


Owner of the data repository/database



Names of the authors/persons/groups/organization responsible for creating the data - the people who should gain credit from the data citation. Typically one or more name strings, ideally parsed into Lastname, Forename, Suffix. Group Authors can be included














Author role

One role per author as an attribute of the Author element. Role in creating the data resource, e.g., Editor, Creator, Curator, Repository Manager, Principal Investigator etc. If possible indicate the role to be applied across all authors if not available on individual records


(where Type is one of those specified under Author above)

Researcher ID

ResearcherID (from or ORCID ID if available


(where the party is mapped to "Author" in the citation)

any party identifier will be transferred

Author address

One address per author if available, with identification of which author it refers. Ideally provide parsed into Organization, Address, City, ZIP/Postcode, Country. If unable to provide parsed, provide complete address string with elements separated by commas or line breaks

relatedObject:Party:physical address

relatedObject:Party:electronic address


Title of the data resource featured in the record. This could be a data study, data set, or other data resource. Data studies with multiple data sets should ideally be supplied with the data sets as separate records referencing the parent study; if this is not possible, a combined record for the data study can include references to the data sets used in the study.


(Title as displayed in RDA)

Source URL*#

Full URL/DOI or other web URI which can be used to link to the resource

registryObject:collection:citationInfo:citationMetadata:identifier: type="doi"


registryObject:collection:citationInfo:citationMetadata:identifier: type="handle"


registryObject:collection:citationInfo:citationMetadata:identifier: type="uri"


registryObject:collection:citationInfo:citationMetadata:identifier: type="purl"










registryObject:collection:citationInfo:citationMetadata:identifier: type="url"




Publisher/distributor of the data resource. This may be the repository itself or could be a third party supplier or institution; also include any recognised abbreviation of the publisher if available.





A publication year for the data is required to enable citation tracking. Provide either the:

Date the data resource with the given title was created by the Author(s) - not the date provided to Clarivate Analytics


Date the data resource with the given title was deposited in the online collection

registryObject:collection:citationInfo:citationMetadata:date:type= "publication date"


registryObject:collection:citationInfo:citationMetadata:date: type="issued"


registryObject:collection:citationInfo:citationMetadata:date:type= ʺcreatedʺ












YYYY the record was ingested into RDA


Description of the digital resource




Where present, the types below will also be transformed




Parent record reference

ID of parent record (E.g. if dataset is related to a study etc.) and the parent/child are provided as a separate records



Version identifier for the version of the data




Any rights restriction or licensing statements for the data (eg CCBY)







Language of the data set in the repository

Default to English

Author keywords

Any author keywords provided by the data creator to describe the data


Geospatial data

Geographical location of the study, including country, or geospatial coordinate data



Time period relating to the data



Method employed in obtaining the data


Named person

Personal or organisational names relating to the subject of the data




Grant No.

Grant number(s)

registryObject:activity:identifier[@type='arc' or 'nhmrc']

(where relation Type =isOutputOf)

Funding organisation

Funding organisation(s)


(where relation Type =isFundedBy AND where activity relation Type=isOutputOf)


Data relating to the bibliographic citations and references associated with the data resource.

If possible bibliographic fields should be sent separately, but a  concatenated reference is also acceptable. PubMedID can be provided as a substitute for a full citation


Appendix 1: Additional notes to clarify and optimise RIF-CS encoding for the Data Citation Index

RIF-CS Group

  • Displays in DCI as "Repository Name"
  • Displays in DCI as "Source" where citationInfo:citationMetadata:publisher is not provided
  • "Source" will be used to populate the Publisher element in the DCI citation statement where citationInfo:citationMetadata:publisher is not provided
  • "Repository Name" is used to link all provider records to the appropriate DCI repository record.

RIF-CS Related Object (Parties)

  • The DCI information model does not provide for separate records for related parties, activities and services. So, where citationMetadata is not provided in the RIF-CS, related Parties of specified Types are used to populate the required "Author" element in DCI.
  • As this element is used to form the citation statement in DCI, care should be taken to select the appropriate Type for each related Party so that only those who should be credited in the citation statement are included.
  • The accepted Types are documented in the transform table.

RIF-CS Related Info (type=publication)

  • Displays in DCI as "Citations". Do not confuse with RIF-CS Citation Information.
  • While this element is optional in DCI, it is highly recommended that this information be provided where possible as it allows Clarivate Analytics to create bi-directional links between a record in the DCI to records for related publications held elsewhere in the Web of Science suite of products. For users, this enables easy navigation between related resources in the Web of Science.
  • Where possible, provide a PubMedID for a related publication as this is sufficient information for Clarivate Analytics to link to a related publication. Otherwise, include a fullCitation for the related publication as a Note in the relatedInfo element and choose relation type=isCitedBy as per the (fictitious) example below.
  • <relatedInfo type="publication">
  • <identifier type="DOI">10.1016/j.amjmed.2014.06.036</identifier>
  • <title>Acute porphyrias in the USA: features of 108 subjects from Porphyria Consortium</title>
  • <relation type=isCitedBy</relation>
  • <notes>Bonkovsky, HL, Maddukuri, VC, Yazici, C. (2014) Acute porphyrias in the USA: features of 108 subjects from Porphyria Consortium. American Journal of Medicine, vol 16, 2014, p.1-6</notes>
  • </relatedInfo>

RIF-CS Citation Information

  • Displays in DCI as "How to cite this resource". Do not confuse with DCI Citations.
  • Where it is provided, citationMetadata is the preferred source of content for the DCI citation elements including: Author (Year) Title. Publisher. Identifier.
  • Where citationMetadata is not provided, the elements making up the citation will be populated according to the transform table. Note that the Publisher will be inferred from RIF-CS Group.

Tagging records to be excluded from your data source harvest to DCI

  • Some RDA data sources will contain records that are considered 'out of scope' for DCI (but in scope for RDA). For example, those RDA records that describe data that is held in an external repository such as PANGAEA where it is also described and discoverable. This may occur for example, where a researcher has elected to deposit their data with a domain specific repository but also wishes the data to be discoverable through their institutional metadata store and RDA. In such cases, the 'out of scope' records can be tagged in the ANDS Registry where they will be programmatically excluded from the harvest to DCI but still appear in RDA.
  • Use the tag excludeDCI and create it as a 'secret' rather than 'public' tag so that it is not visible to RDA users. More about tagging.


Change history

9 December 2013

Updated to provide details of Phase 2 of the pilot project. Transform (crosswalk) updated to v2.

11 September 2014

Updated to reflect business-as-usual service. Transform (crosswalk) updated to v3.

6 March 2017

Updated to change Thomson Reuters to Clarivate Analytics