CrossRef metadata best practice to support KPIs for funding agencies

Version History

Warnings

Some of the CrossRef deposit schema elements required to implement the best practice described here have not been implemented as of 2013–09–09 as they are still being reviewed by the TWG.

Background

Funding agencies and publishers are interested in being able to measure Key Performance Indicators (KPIs) related to mandates such the February 22nd OSTP memo on Public Access to the Results of Federally Funded Research. CrossRef is extending its Application Programming Interfaces (APIs) to enable funding agencies and publishers to query CrossRef metadata in support of generating such KPIs. Organisations such as CHORUS and SHARE can make use of these APIs in order to create KPI Dashboards measuring, amongst other things: Publications relating to research funded by particular agencies. 1. The licenses under which said publications have been released. 2. The location of the full text of the Best Available Version (BAV) for said publications for both reading and Text & Data Mining (TDM) applications. 3. The long-term preservation arrangements that have been made for the VOR of said publications. The CrossRef extended APIs, of course, will only work if publishers supply the appropriate metadata. This document outlines the metadata that publishers will need to provide in order to support such KPI reporting.

Conventions

Although this document is not an RFC, it will follow the conventions of rfc2119 in the use of the following terms:

  1. must - This word, or the terms “REQUIRED” or “SHALL”, mean that the definition is an absolute requirement for meeting best practice.
  2. must not - This phrase, or the phrase “SHALL NOT”, mean that the definition is an absolute prohibition for meeting best practice.
  3. should - This word,or the adjective “RECOMMENDED”, mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.
  4. should not - This phrase, or the phrase “NOT RECOMMENDED” mean that there may exist valid reasons in particular circumstances when the particular behavior is acceptable or even useful, but the full implications should be understood and the case carefully weighed before implementing any behavior described with this label.

Summary

In Order to support basic agency and publisher KPIs:

In order to enhance the utility of CrossRef metadata to agencies and in order to enable more sophisticated agency/publisher KPIs: - Publishers should consider participating in CrossMark in order to record updates, errata, corrigenda,retractions and withdrawals. - Publishers should consider depositing abstracts using CrossRef’s JATS abstract support. - Publishers should consider collecting and depositing ORCIDs for publication authors. - Publishers should consider making the bibliographic metadata and references for documents resulting from agency funding maximally available by overriding CrossRef opt-outs using the <metadata_distribution_opt> and <reference_distribution_opt> elements.

Funding Information

CrossRef supports the recording of funding information for a publication via it’s FundRef program. FundRef defines an open, standard registry of funder names and funder identifiers that can be used in order to increase the accuracy of the funding information recorded. Although FundRef supports recording award_numbers along with funder identifiers, FundRef does not define standards for recording award numbers as practice varies greatly across funders.

To support funder KPIs, members must deposits funder metadata using the specifications defined for the FundRef program. Specifically, when depositing metadata you:

  1. must include funder information.
  2. must not deposit your funder names without at least trying to map them to FundRef identifiers in the FundRef registry. Depositing funder names that are included in the FundRef registry, but without their respective FundRef Funder Identifiers, will pollute the FundRef metadata and lower the value of the service for all participants. Note that the KPI APIs will only work for Funder metadata that includes FundRef Funder Identifiers.
  3. should include award numbers in FundRef metadata when possible. Although the standard KPI API does not make direct use of award numbers, individual agencies may be able to make use of included award numbers where found.
  4. should deposit FundRef data as part of a CrossMark record if you (the publisher) already are (or are planning on becoming) a participant in CrossMark. There are two reasons for this: First, it ensures that the Funder Metadata is available both in a standard machine readable format AND via a standard UI for readers. Second, it ensures that the Funder metadata is made maximally reusable via a CC Zero license waiver. Note that publishers do not need to have implemented CrossMark yet to deposit Funder metadata via CrossMark. We expect that publishers may take a year or more before that have fully implemented all of CrossMark’s features.

See CrossRef’s Help pages for Technical details on depositing FundRef metadata.

License information

One of the main drivers for the FunRef KPI API is that many funders are required to report on the public availability of the results of funder-financed research. Funders are therefor interested in understanding how publications related to funded research are licensed.

To deposit license information, publishers must use the <license_ref> element. The value of the <license_ref> element must be a stable HTTP URI which points to a human readable document that either includes (or guides the reader to) any copyright and/or licensing information related to the CrossRef DOI of the content. The URI must point either to a location on the publisher’s site or to the stable location of any well-known licenses such as those of the Creative Commons.

Note that it is entirely acceptable to record a <license_ref> URI as a “placeholder.” If you are still working out specific licensing terms, the URI you record should point to a blank page or even a simple re-assertion of the document’s copyright. There is a big difference between recording at least some <license_ref> URI and recording no <license_ref> URI at all. The former indicates an intent to eventually clarify licensing information, whereas the latter indicates that the licensing information is likely to remain ambiguous.

use of the <license_ref> element is best explained through examples.

The <license_ref> for content licensed under the popular CC-BY license, would look like this:

<license_ref>http://creativecommons.org/licenses/by/3.0/deed.en_US</license_ref>

Where as the Journal of Psychoceramics might record that their content is licensed under a proprietary license like this:

<license_ref>http://www.psychoceramics.org/license_v1.html</license_ref>

You can deposit multiple <license_ref> elements- so the following would indicate that a document was available under a dual license (e.g. one for commercial applications and one for non-commercial applications).

<license_ref>http://www.psychoceramics.org/non_commercial_license_v1.html</license_ref>
<license_ref>http://www.psychoceramics.org/commercial_license_v1.html</license_ref>

Embargos

Publishers may want to record that a document is under embargo. That is- that it is available under access control and a proprietary license for a set period of time, after which it is available under an open license. Publishers wishing to record embargoes should use the optional start_date attribute on the <license_ref> element. The following records that the content is under a proprietary license from its date of publication on February 3, 2014 and that it is under a CC-BY license a year later on February 3, 2015:

<license_ref start_date="2014-02-03">http://www.psychoceramics.org/license_v1.html</license_ref>
<license_ref start_date="2015-02-03">http://creativecommons.org/licenses/by/3.0/deed.en_US</license_ref>

Note that the value of the start_date element must be recorded using the format YYYY-MM-DD The start_date attribute can be combined with multiple <license_ref> elements to indicate that a document is under a proprietary license during an embargo, but that it is then under a dual (commercial/non-commercial) license a year later)

<license_ref start_date="2014-02-03">http://www.psychoceramics.org/license_v1.html</license_ref>
<license_ref start_date="2015-02-03">http://www.psychoceramics.org/non_commercial_license_v1.html</license_ref>
<license_ref start_date="2015-02-03">http://www.psychoceramics.org/commercial_license_v1.html</license_ref>

Note that there is no corresponding end_date attribute for the <license_ref> element. This is because including end dates could introduces ambiguities. For example:

You might ask why one should record a license that starts in the future? Wouldn’t it be better to just update the <license_ref> element at the time the license changes? By recording that another license takes effect in the future, you are informing the consumer of the metadata that the current restricted license is only for the embargo period. In short, you are recording the intent to change the license when the embargo is done.

In the above examples, the <license_ref> element is unqualified and should therefor be considered to apply to the content pointed to by any <resource> URIs included in the CrossRef metadata. The CrossRef metadata schema supports recording different license for different versions of the resource and this will be discussed below. However, first let’s look at at the role the <resource> element plays in providing funding agency KPIs.

Recording links to full text and/or archived versions of documents, etc.

Funders are not just interested in reporting on the licensing terms of funder-financed research. They are also interested in making sure that the full text content of the BAV is made available for reading, automated processing and archiving.

To this end, publishers need to be able to record links to the full text of the content a DOI refers to. Additionally, publishers will want to offer different versions (e.g. AM or VOR) and different representations (e.g. PDF for viewing, XML for TDM, etc.) of the content tailored for specific applications.

The <resource> element in CrossRef metadata has most often used to record an HTTP URI pointing at the publisher’s landing page for the CrossRef DOI in question. However, the CrossRef schema has long supported the recording of multiple <resource> elements in order to enable, for example:

CrossRef has extended the ability to record multiple <resource> elements in order to allow the recording of URIs which point to the full text of content identified by the CrossRef DOI. The publisher can record multiple representations of the full text (e.g. PDF, XML, plain text) using the new mime_type attribute and then, through their access control systems, control who is able to reach which representation and under which conditions.

Note that, by recording a <resource> that points to the full text, you are not necessarily guaranteeing that the URI will be accessible

Note also that the publisher could theoretically choose to only deposit <resource> elements for full text representations once an embargo has ended. However, this approach may proove frought, as any mistakes or delays in the redeposit process might lead the funding agency to beleive that the publisher has not made the relevant content accessible at the end of the embargo period.

Further detail on using the <resource> element for recording links to full text can be found on the Prospect support site and in the CrossRef deposit schema documentation for the <collection> and <resource> elements.

Different Licenses for Different Versions of the Content

Some publishers may want to record different licenses for different versions of the <resource> element recorded in CrossRef metadata. For example, one <resource> element may point to a URI intended for subscribed readers. While another <resource> element may point to a version of the document intended for Text and Data Mining (TDM) applications. Similarly, a publisher may choose to apply one license to the “Author Accepted Manuscript” (AM) and another to the “Version of Record” (VOR).

To accommodate these scenarios, the <license_ref> element supports an applies_to element. Similarly, the <resource> element has been extended to support a content_version attribute. Publishers can use these element/attribute combinations to apply specific licenses to specific versions of the resource. For example, to indicate the “VOR” version of a document is licensed under a proprietary license, but that the “AM” version of the same document is licensed under an open license, the <license_ref> and <resource> elements could be combined like this:

<license_ref applies_to="vor">http://www.psychoceramics.org/license_v1.html</license_ref>
<!-- … -->
<license_ref applies_to="am">http://creativecommons.org/licenses/by/3.0/deed.en_US</license_ref>
<!--- other CrossRef Metadata -->
<resource content_version="vor">http://www.psychoceramics.org/fulltext/vor/10.5555/12345678</resource>
<!-- … -->
<resource content_version="am">http://www.psychoceramics.org/fulltext/am/10.5555/12345678</resource>

The <license_ref> and <resource> elements along with their respective start_date, applies_to, and content_type attributes can all be combined to support more complex assertions. So, for example the following example says that a document is only available under a proprietary license for readers during an embargo period, but is then available to the public for reading under a more open license and for non-commercial TDM applications under a specific TDM license:

<license_ref start_date="2014-02-03" applies_to="vor">http://www.psychoceramics.org/license_v1.html</license_ref>
<!-- … -->
<license_ref start_date="2015-02-03" applies_to="am">http://www.psychoceramics.org/open_license.html</license_ref>
<!-- … -->
<license_ref start_date="2015-02-03" applies_to="tdm">http://www.psychoceramics.org/nc_tdm_license.html</license_ref>
<!--- other CrossRef Metadata -->
<resource content_version="vor">http://www.psychoceramics.org/fulltext/vor/10.5555/12345678</resource>
<!-- … -->
<resource content_version="am">http://www.psychoceramics.org/fulltext/am/10.5555/12345678</resource>
<resource content_version="tdm">http://www.psychoceramics.org/fulltext/tdm/10.5555/12345678.xml</resource>

Detailed information on recording licensing information in CrossRef metadata can be found in the CrossRef schema documentation for the <license_ref> element

Bonus Points

The more metadata that publishers record for publications arising from agency funded research, the more useful that metadata will be to said agencies and the more value they will see from publishers. Where as the above sections details metadata elements that agencies will expect in order to be able to compile basic KPIs and offer portal services, additional metadata will allow agencies to create even more sophisticated KPIs and services. As such, publishers should seriously consider depositing the following additional metadata elements in their CrossRef deposits.

Distributing Standard Bibliographic Metadata

Metadata deposited to CrossRef is made available freely via numerous CrossRef query APIs, however all deposited metadata is subject to opt-outs in the case of bulk distribution APIS and data dumps. In order to make sure that bibliographic metadata for publications arising from agency funding is maximally available, publishers should consider setting the value off the <metadata_distribution_opts> element for DOIs to any. Further details can be found in CrossRef’s schema documentation for the <metadata_distribution_opts> element.

Distributing References

References made in publications arising from agency funding can provide agencies with an overview of what literature is considered important in the fields that they fund. Many publishers deposit references to CrossRef as part of their participation CrossRef’s CitedBy service. However, participation in CitedBy does not automatically make references available via CrossRef’s standard APIs. In order for publishers to distribute references along with standard bibliographic metadata, publishers need to set the <reference_distribution_opt> element to any for each DOI deposit where they want to make references openly available. By setting this element, references for the DOI will be distributed without restriction through all of CrossRefs APIs and bulk metadata dumps. Further details can be found in CrossRef’s schema documentation for the <reference_distribution_opt> element.

CrossMark

CrossMark provides a standard mechanism for alerting researchers to updates to published documents- including corrections, errata, corrigenda retractions and withdrawals. Use of the the CrossMark service sends a signal to researchers and agencies that publishers are committed to maintaining the integrity of the scholarly record.

Additionally, CrossMark also provides a standard, cross-publisher, user interface that researchers can use to view FundRef information and Licensing information. This user interface works both from publisher landing pages and from published PDFs. More information can be found on the CrossMark support site

Abstracts

Many funding agencies are interested in building custom portals which highlight agency funded research. In order to provide users of these portals with the best experience, agencies will want, where possible, to display abstracts along with standard bibliographic metadata.

CrossRef supports the deposit of abstracts conforming to the JATS abstract element. Further details can be found in the CrossRef Schema Documentation of the <abstract> element.

ORCIDs

ORCIDs are unique identifiers for researchers. CrossRef supports the deposit of ORCIDs for authors. The presence of ORCIDs in CrossRef metadata will, in turn, allow agencies to tie agency funded research publications directly to researchers. Wide-spread use of ORCIDs in CrossRef deposits could even let agencies start to develop publication KPIs for researchers that they fund. Further details on CrossRef’s ORCID support can be found in the CrossRef Schema Documentation of the <ORCID> element