The high level workflow for including a Research Data Australia (RDA) data source in the Data Citation Index (DCI) harvest involves:
- RDA provider contacts their Outreach Officer or email@example.com to express interest in establishing a DCI harvest
- ANDS and the provider review and discuss record quality and transform as well as the proposed business processes and agree to proceed
- ANDS provides an initial harvest from the data source to DCI and advises Clarivate Analytics of the nominated contact for the data source
- Clarivate Analytics assess a sample of records in the DCI output against their criteria for inclusion. They also check quality of content, compliance with the DCI metadata schema and the richness of the record as assessed against the content available in the source repository
- Clarivate Analytics staff will liaise directly with the nominated contact for the data source to discuss the metadata assessment and to create a Repository Record for the data source in DCI. This record provides the Repository Name in each DCI record. All collection records for the data source will be linked to this record in DCI. The screenshot below shows an example
- production harvest from the data source to DCI established
- Clarivate Analytics provide a DCI admin login for use by the nominated data source contact
- records are reharvested from RDA to DCI on a regular basis.
Fig 1: DCI Repository record. All records from a data source will be linked to this record in the Data Citation Index
Optimising records for the DCI transform
The information provided in Table 1 and associated notes, is intended to complement and augment the guidance provided in RDA Content Providers Guide. Records that comply with best practice and incorporate the DCI specific guidance provided, should validate against the DCI schema. It is preferred that in the RIF-CS citationInfo element, the citationMetadata type format is used. This will ensure the DCI mandatory elements and data citation are accurately represented in the DCI. The screenshot below shows an "exemplar" record in DCI.
Fig 2: Exemplar record in the Data Citation Index
Assessing your records for DCI readiness
An early step in establishing a harvest to DCI is to review the DCI transform of a representative sample of records from your data source. While the focus here is on the transform of records, it is important to also carefully review the accuracy and completeness of content in your records. Incorrect content (for example, misspelling of names) will affect the discoverability and capture of citation metrics for your records. It is also important that the records describe objects that are in scope for the DCI, e.g. they are not secondary records describing data held elsewhere.
To enable you to review your records, ANDS has:
- documented the RIF-CS to DCI transform mapping
- created a simple web service that enables Data Source Administrators (DSAs) to preview records in their data source that have been transformed to the DCI metadata format (XML output) using the mapping. To use the web service, in the Production or Demo environment, you need to have DSA permissions.
Fig 3: Screenshot of the DCI preview tool
You can also choose to export the transformed records as a file.
Fig 4: Screenshot of the DCI export tool
Depending on your browser, you may need to "View Source" to see the XML output.
Fig 5: Sample XML output
If you have any queries about using the web service, or assessing your records, please contact firstname.lastname@example.org.
Table 1: RIF-CS to DCI transform v.3
* indicates the element is required by DCI
# indicates the element will be used to populate the citation element in DCI using the following convention: Author/s (Year): Title. Source. Source URL
Note: where more than one RIF-CS element can be used to populate a DCI element, the options are given in preferred order.
Follow the hyperlink to see additional DCI encoding advice about the RIF-CS element.
DCI element name
Maps to RIF-CS element
The unique and persistent identifier for the record as it appears in the repository/database. This will be used to identify changed/updated records in future updates received
Date record was extracted for Clarivate Analytics
Name of repository/database from which record was extracted
This will be used to associate the record with the correct repository record
Owner of the data repository/database
Names of the authors/persons/groups/organization responsible for creating the data - the people who should gain credit from the data citation. Typically one or more name strings, ideally parsed into Lastname, Forename, Suffix. Group Authors can be included
One role per author as an attribute of the Author element. Role in creating the data resource, e.g., Editor, Creator, Curator, Repository Manager, Principal Investigator etc. If possible indicate the role to be applied across all authors if not available on individual records
(where Type is one of those specified under Author above)
(where the party is mapped to "Author" in the citation)
any party identifier will be transferred
One address per author if available, with identification of which author it refers. Ideally provide parsed into Organization, Address, City, ZIP/Postcode, Country. If unable to provide parsed, provide complete address string with elements separated by commas or line breaks
Title of the data resource featured in the record. This could be a data study, data set, or other data resource. Data studies with multiple data sets should ideally be supplied with the data sets as separate records referencing the parent study; if this is not possible, a combined record for the data study can include references to the data sets used in the study.
(Title as displayed in RDA)
Full URL/DOI or other web URI which can be used to link to the resource
Publisher/distributor of the data resource. This may be the repository itself or could be a third party supplier or institution; also include any recognised abbreviation of the publisher if available.
A publication year for the data is required to enable citation tracking. Provide either the:
Date the data resource with the given title was created by the Author(s) - not the date provided to Clarivate Analytics
Date the data resource with the given title was deposited in the online collection
registryObject:collection:citationInfo:citationMetadata:date:type= "publication date"
YYYY the record was ingested into RDA
Description of the digital resource
Where present, the types below will also be transformed
Parent record reference
ID of parent record (E.g. if dataset is related to a study etc.) and the parent/child are provided as a separate records
Version identifier for the version of the data
Any rights restriction or licensing statements for the data (eg CCBY)
Language of the data set in the repository
Default to English
Any author keywords provided by the data creator to describe the data
Geographical location of the study, including country, or geospatial coordinate data
Time period relating to the data
Method employed in obtaining the data
Personal or organisational names relating to the subject of the data
registryObject:activity:identifier[@type='arc' or 'nhmrc']
(where relation Type =isOutputOf)
(where relation Type =isFundedBy AND where activity relation Type=isOutputOf)
Data relating to the bibliographic citations and references associated with the data resource.
If possible bibliographic fields should be sent separately, but a concatenated reference is also acceptable. PubMedID can be provided as a substitute for a full citation
Appendix 1: Additional notes to clarify and optimise RIF-CS encoding for the Data Citation Index
- Displays in DCI as "Repository Name"
- Displays in DCI as "Source" where citationInfo:citationMetadata:publisher is not provided
- "Source" will be used to populate the Publisher element in the DCI citation statement where citationInfo:citationMetadata:publisher is not provided
- "Repository Name" is used to link all provider records to the appropriate DCI repository record.
RIF-CS Related Object (Parties)
- The DCI information model does not provide for separate records for related parties, activities and services. So, where citationMetadata is not provided in the RIF-CS, related Parties of specified Types are used to populate the required "Author" element in DCI.
- As this element is used to form the citation statement in DCI, care should be taken to select the appropriate Type for each related Party so that only those who should be credited in the citation statement are included.
- The accepted Types are documented in the transform table.
RIF-CS Related Info (type=publication)
- Displays in DCI as "Citations". Do not confuse with RIF-CS Citation Information.
- While this element is optional in DCI, it is highly recommended that this information be provided where possible as it allows Clarivate Analytics to create bi-directional links between a record in the DCI to records for related publications held elsewhere in the Web of Science suite of products. For users, this enables easy navigation between related resources in the Web of Science.
- Where possible, provide a PubMedID for a related publication as this is sufficient information for Clarivate Analytics to link to a related publication. Otherwise, include a fullCitation for the related publication as a Note in the relatedInfo element and choose relation type=isCitedBy as per the (fictitious) example below.
- <relatedInfo type="publication">
- <identifier type="DOI">10.1016/j.amjmed.2014.06.036</identifier>
- <title>Acute porphyrias in the USA: features of 108 subjects from Porphyria Consortium</title>
- <relation type=isCitedBy</relation>
- <notes>Bonkovsky, HL, Maddukuri, VC, Yazici, C. (2014) Acute porphyrias in the USA: features of 108 subjects from Porphyria Consortium. American Journal of Medicine, vol 16, 2014, p.1-6</notes>
RIF-CS Citation Information
- Displays in DCI as "How to cite this resource". Do not confuse with DCI Citations.
- Where it is provided, citationMetadata is the preferred source of content for the DCI citation elements including: Author (Year) Title. Publisher. Identifier.
- Where citationMetadata is not provided, the elements making up the citation will be populated according to the transform table. Note that the Publisher will be inferred from RIF-CS Group.
Tagging records to be excluded from your data source harvest to DCI
- Some RDA data sources will contain records that are considered 'out of scope' for DCI (but in scope for RDA). For example, those RDA records that describe data that is held in an external repository such as PANGAEA where it is also described and discoverable. This may occur for example, where a researcher has elected to deposit their data with a domain specific repository but also wishes the data to be discoverable through their institutional metadata store and RDA. In such cases, the 'out of scope' records can be tagged in the ANDS Registry where they will be programmatically excluded from the harvest to DCI but still appear in RDA.
- Use the tag excludeDCI and create it as a 'secret' rather than 'public' tag so that it is not visible to RDA users. More about tagging.
9 December 2013
Updated to provide details of Phase 2 of the pilot project. Transform (crosswalk) updated to v2.
11 September 2014
Updated to reflect business-as-usual service. Transform (crosswalk) updated to v3.
6 March 2017
Updated to change Thomson Reuters to Clarivate Analytics