The JISC funded ticTOCs project has developed a service to enable users to discover, subscribe to, and re-use Table of Contents (TOC) RSS feeds for thousands of journals from a wide range of publishers.
The initial phase of the project included analysis of publishers current practices with regard to the provision of RSS TOCs. The analysis revealed a range of issues which may impact on the ability of end users and feed aggregators to effectively utilize feeds.
Just because a particular publisher’s feed "looks good" in the major feed readers isn't enough to make said feed truly usable. RSS feeds are designed to be aggregated. When aggregated, even minor variations in feeds can be greatly distracting and make for an unpleasant user experience. For instance, there is currently wide variation amongst scholarly publishers in the use of the <channel><title> element. Examples encountered after having looked at only ten publishers during the early stages of the ticTOCs project included:
<title>Nature</title>
<title>BMJ Current Issue</title>
<title>British Journal of Visual Impairment current issue</title>
<title>Journal of Geophysics and Engineering latest papers</title>
<title>British Journal of Criminology – current issue</title>
<title>Journal of managerial psychology: table of contents</title>
<title>Science Direct Publication: Biochemical and Biophysical research communications</title>
<title>SpringerLink - Journal</title>
<title>Blackwell Synergy: International Journal of Cosmetic Science: Table of Contents</title>
<title>NATURE –LONDON-</title>
Variations in practice amongst publisher feeds can be irritating for end-users, but they can be insurmountable for automated processes. RSS feeds are increasingly being consumed by knowledge discovery and data mining services. In these cases, variations in date formats, the practice of lumping all authors together in one <dc:creator> element, or generating invalid XML can render the RSS feed useless to the service accessing it.
The draft recommendations below are a result of this initial analysis and are ultimately intended to facilitate good practice in the production and provision of TOC RSS Feeds. The guidelines include general recommendations for good practice, specific recommendations on the use of RSS Modules and an example RSS TOC feed. Ultimately, we expect that industry wide adoption of these best practices will help drive more traffic to publisher web sites. Note that most of these recommendation can also be applied to non-TOC RSS feeds such as thematic feeds, automated search result feeds, etc.
The Best Practice recommendation group included the following members:
The recommendations were based on early work developed by Malcolm Moffat (ICBL, Heriot-Watt University).
Your feedback and comments are much appreciated. Please email rss_best_practice@crossref.org
Use the RSS 1.0 specification because of its greater flexibility. RSS 1.0 extended with RSS 1.0 modules is ideally suited to the provision of RSS table of contents type materials. A number of publishers are already utilising this approach. Some publishers offer a choice of formats for their TOCs (e.g. RSS 1.0 and RSS 2.0).
Use RSS 1.0 Modules (e.g. Dublin Core Module, CONTENT Module and PRISM Module) to extend TOC RSS feed functionality [see details and examples below].
Validate TOC RSS feeds using an RSS validation tool.(e.g.W3C feed Validator or Redland RSS 1.0 validator).
Do not include HTML markup in standard RSS Feed elements e.g. Avoid using HTML tags (such as <b> <p> <a href> etc) in the <item><description> element. The <item><description> should only include plain text as it is not possible to know how the feed will be presented and including markup can prevent your feed from being correctly displayed.
Include abstracts/summaries in your feeds. There is increasing evidence that providing users with more information in a feed will drive more users to your site. Conversely, there is also evidence that users will choose not to subscribe to "partial-feeds" due to the inconvenience associated with reading them. The “full vs partial feed” is the subject of intense debate. At the very least you should familiarize yourself with the issue before making a decision about whether or not to include abstracts/summaries in your feeds.
Use the RSS 1.0 Content Module to present HTML marked up content. For example the <content:encoded> element can be used to provide an alternative marked up version of the <item><description> [see details and examples below].
Do not restrict access to TOC RSS feeds. RSS is an excellent and cost-effective way of driving traffic to, and increasing brand awareness of publisher's content. Restricting access to the feeds themselves (e.g. to subscribers only) negates many of the potential benefits that RSS can bring.
Understand the purpose of each RSS TOC feed you provide. Provide multiple feeds, rather than diluting the message of one with information irrelevant to its audience. For example it may be appropriate to provide separate feeds for the current TOC issue only and a combined TOC for a number of recent issues.
Ensure your web server is configured to serve RSS files using a media type of application/xml. Using the correct mime types allow client-side applications such as browsers and feed readers to more accurately identify and process content. See Section 9 "Notes on Media Types" for further discussion.
Provide OPML files to enable aggregators or end users to utilise a number of your feeds. In some instances it may be appropriate to provide a range of OPML files (e.g. OPML files for each subject category) or to enable end users to create custom OPML files on the fly.
Name your feeds correctly. The title of the RSS feed is encoded in a number of places:
the /opml/body/outline[@title] attribute of the outline element in an OPML file
the /rdf/channel/title element in the RSS feed
As discussed above, RSS feeds are designed to be read in aggregate. When RSS feeds from different sources are intermingled in one view, variations in the content and formatting of the title element can become distracting and/or confusing. Most of these variations in title formatting amongst publishers occur because the publisher has chosen to "annotate" the title with a description of the feed (e.g. "current issue", "latest papers", "table of contents", etc.). These annotations can vary in their naming conventions, their formatting and their placement (e.g. prepended or appended). In addition to being confusing to the researcher, such variations in the title element make it difficult for automated processes to make sense of the title as they have no consistent way of distinguishing the actual content of the title element from the annotation.
The practice of annotating the title channel should generally be deprecated. Ideally, if explanatory text is needed to describe the feed, then this should be placed in the <description> element of the feed.
However, some might object that RSS readers do not consistently make use of the feed description element and that title annotations are still necessary in order to allow users to distinguish between different feeds. In this case we recommend that such annotations be confined to non table of content feeds (e.g. saved searches, "most cited articles", etc.) and that the RSS feed for the <title> element for table of contents for the current issue should only contain the official name of the publication.
Finally, in order to introduce some constancy where annotations are included, publishers should:
Thus, the title of the TOC feed for the current issue of the Journal of Psychoceramics would be encoded like this:
<title>Journal of Psychoceramics</title>
Where as the feed of most cited articles from the Journal of Psychoceramics would be encoded like this:
<title>Journal of Psychoceramics [most cited articles]</title>
RSS 1.0 Modules are XML-namespace based compartmentalised extensions to RSS 1.0. This namespace based modularization allows RSS 1.0 to be extended without the need for rewrites of the core RSS 1.0 specification and without the need for consensus on each and every element. In a nutshell, RSS 1.0 Modules allow the basic RSS 1.0 format to be extended in standard ways by specifying which modules and namespaces are being used. A range of Standard and Proposed RSS 1.0 modules are available which enable RSS 1.0 to be extended in a multitude of ways (e.g. provision of information on feed status or frequency of feed updates, and extension for use with audio or wiki type content).
From the perspective of TOC feeds four RSS 1.0 modules are particularly relevant:
mod_admin provides administrative properties that can be used to help improve the robustness and reliability of broad RSS usage between providers, aggregators, clients, and other users.
mod_content is a module to extend RSS to permit the inclusion of actual content rather than just metadata descriptions of content.
mod_dc is a module to extend RSS via the use of the Dublin Core element set
mod_prism is a module to extend RSS via the use of the PRISM element set.
Full details of these modules can be found from the links above. The remainder of this section attempts to provide brief outlines of these three modules along with examples of commonly used elements and short notes on their usage.
The RSS 1.0 Admin Module 'mod_admin' provides two useful properties that can be added to an RSS feed in order to provide a means for consumers of a feed to provide feedback on errors encountered in the feed. This kind of mechanism is essential in a large syndication network in which feeds may be aggregated and re-syndicated before reaching the eventual consumer: there needs to be clear labelling within the feed itself as the client may not be consuming it from its original location.
The two additional elements that the admin module specifies are as follows:
The first and most important piece of information is a means for submitting error reports. The admin:errorReportsTo element should contain a URI that can be used to report any problem with the RSS feed. In the most typical scenario this URI will be a mailto: URI that provides an email address of a person, an email alias that may forward messages to one or more people, or an automated bug tracking system that will capture the error report for later review. The preferred mechanism is up to the publisher, although a mail alias is the simplest and least brittle approach, allowing messages to be handled manually if necessary but without introducing a personal email address into the feed.
The second element defined in the specification is the admin:generatorAgent element. This should be a unique URI that identifies the application that generated the RSS feed, i.e. a URI that identifies the software rather than the website of origin. Providing this additional information provides a means for identifying feeds that may all suffer from a common problem (because of a consistent bug in the software), or a feed that was generated by an older version of some software containing a problem that has subsequently been fixed. The latter scenario is relevant when one considers that RSS feeds may be aggregated and stored for some time.
The following example illustrates the usage of each element. Note that as the values of both elements are URIs, and are hence pointers to other web resources, then the "rdf:resource" attribute is used rather than element content.
<channel rdf:about="...">
<admin:generatorAgent rdf:resource="http://www.example.org/software/platform/1.0"/>
<admin:errorReportsTo rdf:resource="mailto:errors@example.org"/>
</channel>
The RSS 1.0 Content Module 'mod_content' was originally intended to allow RSS 1.0 to be extended to permit the inclusion actual content, rather than just metadata descriptions, within RSS 1.0. It has, however, become common practice to use this module to enable inclusion of HTML marked up versions of <item><descriptions> within feeds. This is accomplished by the use of the <content:encoded> element as illustrated below.
Element
|
Typical Example
|
<content:encoded> |
<content:encoded> <![CDATA[ <p> <b>Biochemistry: Designer enzymes</b> </p> <p>Nature 448, 757 (2007). <a href="http://dx.doi.org/10.1038/448757a">doi:10.1038/448757a</a> </p> <p>Authors: Michael P. Robertson & William G. Scott</p> <p>Evolution has crafted thousands of enzymes that are efficient catalysts for a plethora of reactions. Human attempts at enzyme design trail far behind, but may benefit from exploiting evolutionary tactics.</p> ]]> </content:encoded> |
Using the RSS 1.0 Content module in this way allows both metadata descriptions and HTML marked up content to be simultaneously presented within a feed. Aggregators, feed readers, and other third parties can then choose which form to utilise.
The RSS 1.0 Dublin Core Module 'mod_dc' allows RSS 1.0 to be extended to utilise the Dublin Core Metadata Element Set.
Typically used elements include:
Element
|
Typical Examples
|
<dc:publisher> |
<dc:publisher>Institute of Physics Publishing</dc:publisher> |
<dc:language> |
<dc:language>en</dc:language> |
<dc:rights> |
<dc:rights>Copyright Institute of Physics Publishing 2007</dc:rights> |
<dc:date> |
<dc:date>2007-03-23</dc:date> |
<dc:title> |
<dc:title>Rock classification based on resistivity patterns</dc:title> |
<dc:creator> |
<dc:creator>Linek, Margarete</dc:creator> <dc:creator>Jungmann, Matthias</dc:creator> <dc:creator>Berlage, Thomas</dc:creator> |
<dc:identifier> |
<dc:identifier>doi:10.1088/1742-2132/4/2/006</dc:identifier> |
Usage Notes:
<dc:creator> A 'creator' is an entity primarily responsible for making the content of the resource. Examples of a 'creator' include a person, an organization, or a service. According to the Dublin Core Usage Guide "Creators should be listed separately, preferably in the same order that they appear in the publication. Personal names should be listed surname or family name first, followed by forename or given name. When in doubt, give the name as it appears, and do not invert."
<dc:date> The recommended best practice for encoding the date value is defined in a profile of
ISO 8601 [W3CDTF] and follows the YYYY-MM-DD format.
<dc:identifier> An 'identifier' is an unambiguous reference to the resource within a given context. The recommended best practice is to identify the resource by means of a string or number conforming to a formal identification system such as the Digital Object Identifier (DOI).
<dc:language> The recommended best practice for the values of the Language element is defined by RFC 1766 [RFC1766] which includes a two-letter Language Code (taken from the ISO 639 standard [ISO639]), followed optionally, by a two-letter Country Code (taken from the ISO 3166 standard [ISO3166]). For example, 'en' for English, 'fr' for French, or 'en-GB' for English used in the United Kingdom. Generally, this document recomends that Country Code not be used to qualify the Language Code as this will have the widest acceptance by downstream applications (i.e. 'en' for English should be preferred to 'en-GB' or 'en-US').
See the Dublin Core Usage Guide for further details on the usage of individual dublin core elements.
The RSS 1.0 PRISM Module 'mod_prism' allows RSS 1.0 to be extended to utilise the PRISM specification (Publisher Requirements for Industry Standard Metadata). Basically the PRISM element set comprises of a range of metadata elements specifically selected for use in electronic publishing contexts and is therefore highly appropriate for use within TOC related RSS feeds.
Typically used elements include:
Element
|
Typical Example
|
<prism:doi>* |
<prism:doi>10.1093/bjc/azn067</prism:doi> |
<prism:url>* |
<prism:url>http://dx.doi.org/10.1093/bjc/azn067</prism:url> |
<prism:publicationName> |
<prism:publicationName>British Journal of Criminology</prism:publicationName> |
<prism:publicationDate> |
<prism:publicationDate>2009-01</prism:publicationDate> |
<prism:issn> |
<prism:issn>0007-0955</prism:issn> |
<prism:eIssn> |
<prism:eIssn>1464-3529</prism:eIssn> |
<prism:copyright> |
<prism:copyright>Copyright Oxford Journals 2009</prism:copyright> |
<prism:rightsAgent> |
<prism:rightsAgent>journals.permissions@oxfordjournals.org </prism:rightsAgent> |
<prism:volume> |
<prism:volume>49</prism:volume> |
<prism:number> |
<prism:number>1</prism:number> |
<prism:startingPage> |
<prism:startingPage>68</prism:startingPage> |
<prism:endingPage> |
<prism:endingPage>87</prism:endingPage> |
* Elements require use of PRISM 2.0 namespace (or above), otherwise PRISM 1.2 namespace is sufficient.
Usage Notes:
<prism:doi> The DOI should be given in "bare" form, i.e. without the "doi:" prefix which is to be used in citation formats (and also used within the <dc:identifier> element).
<prism:publicationDate> The Publication Date is defined as "Announced date and time when the resource is released to the public...The publication date for an issue is the date that it became available for sale." It is NOT the cover date, which is covered by the prism:coverDate and prism:coverDisplayDate elements. Recommended best practice is to use the ISO 8601 [W3CDTF] which follows the YYYY-MM-DD format.
<prism:url> The URL should be URL for the article specified using the DOI with the DOI Proxy Server, e.g. "http://dx.doi.org/10.1093/bjc/azn067". It must not be the URL of the final article.
This document strongly recommends the use of PRISM 2.0 (or above) as this allows for the direct inclusion of DOI and associated URL into feeds.
See the PRISM specifications for further details on individual PRISM elements.
An example RSS 1.0 TOC feed utilising the the Admin, Content, Dublin Core and PRISM modules is available here. Both <channel> and <item> elements are shown below.
<channel rdf:about="http://www.nature.com/nature/current_issue/rss">
<title>Nature</title>
<description>Nature is a weekly international journal publishing the finest peer-reviewed research in all fields of science and technology on the basis of its originality, importance, interdisciplinary interest, timeliness, accessibility, elegance and surprising conclusions. Nature also provides rapid, authoritative, insightful and arresting news and interpretation of topical and coming trends affecting science, scientists and the wider public.</description>
<link>http://www.nature.com/nature/current_issue/</link>
<admin:generatorAgent rdf:resource="http://www.nature.com/"/>
<admin:errorReportsTo rdf:resource="mailto:feedback@nature.com"/>
<dc:publisher>Nature Publishing Group</dc:publisher>
<dc:language>en</dc:language>
<dc:rights>© 2009 Nature Publishing Group</dc:rights>
<prism:publicationName>Nature</prism:publicationName>
<prism:issn>0028-0836</prism:issn>
<prism:eIssn>1476-4679</prism:eIssn>
<prism:copyright>© 2009 Nature Publishing Group</prism:copyright>
<prism:rightsAgent>permissions@nature.com</prism:rightsAgent>
<image rdf:resource="http://www.nature.com/includes/rj_globnavimages/nature_logo.gif"/>
<items>
<rdf:Seq>
...
<rdf:li rdf:resource="http://dx.doi.org/10.1038/458587a"/>
...
</rdf:Seq>
</items>
...
<item rdf:resource="http://dx.doi.org/10.1038/458587a"/>
...
</item>
...
</channel>
<item rdf:about="http://dx.doi.org/10.1038/458587a">
<title>Cosmology: Dark matter and dark energy</title>
<link>http://dx.doi.org/10.1038/458587a</link>
<description>Observations continue to indicate that the Universe is dominated by invisible components — dark matte
r and dark energy. Shedding light on this cosmic darkness is a priority for astronomers and physicists.</description>
<content:encoded><![CDATA[
<p>
<b>Cosmology: Dark matter and dark energy</b>
</p>
<p>Nature 458, 587 (2009). <a href="http://dx.doi.org/10.1038/458587a">doi:10.1038/458587a</a>
</p>
<p>Authors: Robert Caldwell
& Marc Kamionkowski</p>
<p>Observations continue to indicate that the Universe is dominated by invisible components — dark matter and dark
energy. Shedding light on this cosmic darkness is a priority for astronomers and physicists.</p>
]]></content:encoded>
<dc:title>Cosmology: Dark matter and dark energy</dc:title>
<dc:creator>Robert Caldwell</dc:creator>
<dc:creator>Marc Kamionkowski</dc:creator>
<dc:identifier>doi:10.1038/458587a</dc:identifier>
<dc:source>Nature 458, 587 (2009)</dc:source>
<dc:date>2009-04-01</dc:date>
<prism:publicationName>Nature</prism:publicationName>
<prism:publicationDate>2009-04-01</prism:publicationDate>
<prism:volume>458</prism:volume>
<prism:number>7238</prism:number>
<prism:section>News and Views Q&A</prism:section>
<prism:startingPage>587</prism:startingPage>
<prism:endingPage>589</prism:endingPage>
<prism:doi>10.1038/458587a</prism:doi>
<prism:url>http://dx.doi.org/10.1038/458587a</prism:url>
</item>
There are several ways in which in which a website might expose RSS feeds in order that a user might find and subscribe to items of interest. One option is to provide a browsable directory of RSS feeds that exists separately to the rest of the publication website. Users would be directed to this list of feeds from help documentation or via additions to typical "alerting" options. The user would then browse the directory and click on feeds of interest in order to subscribe. Assuming that the feed is valid and has been delivered using a suitable mime type, then the subscription process will be relatively straight-forward. The main downside to this option however is that the user must cease their browsing of the publications and switch to browsing of the RSS feeds, making the user experience less than ideal.
A better approach, optimised for the typical use case of a user subscribing to one, or a few, RSS feeds in a single session, is to link the RSS feed directly from the main body of the website, e.g. from a journal homepage or TOC page. This allows a user to quickly subscribe to a feed without having to browse to another section of the website. A standard set of feed icons are being used across the web to include links to RSS feeds on web pages, providing users with a recognisable way to find and identify that RSS feeds are available (see RSS feed icons, below).
A further mechanism for associating one or more RSS feeds with a web page is via the RSS "auto-discovery" mechanism. This involves placing some metadata in the head of the webpage that identifies an associated RSS feed, providing a title and a link to the document. The format of this metadata is standardised and is supported in all modern web browsers and feed readers. If a web browser auto-discovers an RSS feed then it will provide the user with additional options to subscribe to the feed, typically by placing an icon in the location bar, or activating additional menu items. Feed readers support auto-discovery by using this as a fall-back mechanism for finding a feed, if the user attempts to use the URL of the TOC page (for example) instead of a direct link to the RSS feed when attempting to subscribe. The reader will auto-discover the feed and will be able to successfuly carry out the users intended action.
It is recommended that publishers support both auto-discovery and direct, context senstitive, links to RSS feeds from their content browsing pages as these provide the best user experience and conform with current best practices for RSS feed discovery and subscription across the web. Directories of RSS feeds are also useful but are typically of more interest to power users who may, for example, wish to quickly find and subscribe to a large number of feeds in a single session. Directories of RSS feeds can also be published in a machine-readable fashion using a format known as OPML. See below for recommendations on its usage.
Many web sites use simple textual labels (e.g. "RSS Feed", "Subscribe"), etc to present links to RSS links to users. These links often fit easily into existing navigation. However with variation across sites, a user often has to search through a page in order to find whether an RSS feed is available. One partial remedy for this is to support RSS autodiscovery, as the users web browser may include additional cues that a feed is available (e.g. an icon in the location bar), but this does not help if the user is using a desktop RSS reader and needs to find and copy the link in order to subscribe.
To improve the user experience for finding an RSS feed on a page, some effort has been made to standardise on a common way to present RSS feed links to users. The Feed Icons website provides a set of downloadable RSS icons that can be used on a publishers website to link to RSS feeds. While the icons are available in a range of colours, allowing some tailoring for a sites existing design, it is strongly recommended that publishers use the orange and white icon which is the most commonly used icon. The icon is available in several sizes allowing it to be easily incorporated into navigation. For accessibility reasons, publishers should ensure that the icon and RSS feed links have appropriate alternative labels. The icon may also be added next to existing textual links.
A number of websites offering RSS feeds also include additional icons or "badges" that provide quick subscription links for specific RSS reader applications, e.g Google Reader, My Yahoo!, etc. While these may provide some additional convenience, there are some downsides. Firstly the icons take up additional space on the page, which may be better used elsewhere. Secondly the list of applications is likely to change over time, requiring some maintenance of the links. Thirdly, and most importantly, a confusion of additional links may hinder users in finding and using the RSS feeds provided. Publishers are therefore recommended to use the standard RSS feed icons rather than application specific badges.
However, if a publisher does feel that it would be useful to provide a user with a wider range of subscription options, it is recommended that a service like Add to Any be used. This, and similar services, provide an easy way to include multiple subscription links on a page in an unobtrusive way.
As described in the introductory section above, RSS "auto-discovery" defines a standard way for a user agent, such as a web browser or feed reader, to find the RSS feed associated with a page. In short, it is the metadata equivalent of a standard feed icon.
Auto-discovery is enabled by adding a <link/> element into the <head/> of an HTML document. An example head element containing an auto-discovery link is shown below:
<head>
<title>Journal of Treee Studies</title>
<link rel="alternate" title="Journal of Tree Studies" href="http://www.example.org/journals/trees.rss" type="application/rss+xml">
</head>
There are 3 required and one optional attribute on the link element.
Attribute | Required? | Description |
rel | Yes | Should always have a fixed value of "alternate" |
href | Yes | Should contain a link to the RSS feed |
type | Yes | Should indicate the mime-type of the RSS feed. Some applications support multiple values for this element, e.g. allowing the link to indicate whether a feed is RSS or Atom. It is recommended that publishers standardise on the use of "application/rss+xml". |
title | No | A title for the RSS feed |
Note that while strictly speaking the title attribute is optional it is recommended that publishers always include it in their auto-discovery links, making sure that the labelling is clear to enable users to easily identify and subscribe to the feed of interest.
It is possible to include multiple auto-discovery links within a single page, allowing a list of RSS feeds to be discovered. A user would typically be presented with this list of feeds, allowing them to choose the specific feed to which they would like to subscribe. It is common to see multiple auto-discovery links in two situations:
The first scenario has the potential to confuse users: the formats may be different but the content is essentially the same, The second scenario has value in providing a way for a user to be presented with options to choose from a selected of alternatives. Publishers are therefore recommended to not provide multiple auto-discovery feeds to expose different formats, instead using it as a way to present alternatives.
Typically RSS feeds are subscribed to on a individual basis, i.e. as a user discovers a feed of interest they will subscribe to it using their feed reader. However there are scenarios in which it is useful to be able to find and subscribe to a collection of RSS feeds, e.g. all feeds produced by a specific publisher, or all feeds from journals in a specific subject. Supporting this option will allow RSS feed aggregators and indexers to more easily find all feeds exposed from a particular site, or allow an institutional librarian to find all feeds from a specific subject category for incorporation into a library web site, OPAC, or for bulk importing into a feed reader application.
The standard mechanism for exposing collections of RSS feeds is through a technology known as OPML. OPML is an XML vocabulary that can be used to describe a simple directory of RSS feeds that includes the title of the feed, a link to the home page of the feed (e.g. the journal homepage), and a link to the RSS feed itself. A simple OPML document containing two RSS feeds is illustrated below:
<opml version="2.0">
<head>
<title>RSS Feeds for Botany</title>
</head>
<body>
<outline title="Journal of Flowers" type="rss"
xmlUrl="http://www.example.org/journal/flowers/latest.rss"
htmlUrl="http://www.example.org/journal/flowers/latest"/>
<outline title="Journal of Trees" type="rss"
xmlUrl="http://www.example.org/journal/trees/latest.rss"
htmlUrl="http://www.example.org/journal/trees/latest"/>
</body>
</opml>
More information on the OPML format can be found in the specification. OPML is well supported in RSS applications, with the majority of applications allowing a user to import an OPML file and automatically subscribe to all of the listed feeds.
Publishers are therefore recommended to publish OPML documents that list all of the feeds from their website and, ideally, OPML documents for subject categories that list all journal feeds in that category. Unlike RSS feeds there is no standard way to link to OPML feeds from web pages. Various attempts have been made at defining an auto-discovery mechanism for OPML using the HTML link element, but none of these have achieved wide adoption.
Gathering meaningful usage statistics on your RSS usage is probably even more fraught and difficult to do than gathering regular web statistics. This is due to a number of idiosyncrasies around RSS usage, including:
Web based RSS aggregators (e.g. Bloglines, Google Reader) serve as proxies for many users.
All feed readers (including browsers and email clients with RSS reading capabilities) will automatically download feeds multiple times a day
There is an increase in specialized applications for filtering and re-syndicating RSS feeds. We expect to see more of these developed by librarians and researchers.
There are generally three major types of statistics that you may want to gather about your RSS usage:
Subscribers
Clickthroughs
Impressions
Each is discussed in more detail below.
There is evidence that the vast majority of RSS consumption occurs through one of the major feed aggregators, Google, iGoogle, MyYahoo, Bloglines, etc.
The good news is that web based feed readers are generally well-behaved and, when their bots retrieve your RSS feeds, they will typically use the UserAgent string to inform you of how many subscribers they have to the feed they are retrieving. So, for instance looking in your web server logs you might see an entry like this:
GET /journal_of_psychoceramics/toc_rss1.xml - x.x.x.x HTTP/1.1 Bloglines/3.1+(http://www.bloglines.com;+43+subscribers)
Which indicates that there are 43 Bloglines subscribers to the “Journal of Psychoceramics” RSS feed. Obviously, if you offer your RSS feed in multiple formats (RSS 1, RSS 2, ATOM), you would need to sum the subscribers of all the formats to get an overall count.
The bad news is that the term “subscribers” does not mean “active subscribers”. In other words, this statistic gives no evidence that said subscribers are actually reading your feeds (or that they even continue to use the BlogLines service). It is also possible that some “subscribers” have subscribed to your content in multiple systems. For example, a user moving from using BlogLines to Google Reader is unlikely to delete their old BlogLines subscriptions and therefor all of their subscriptions might be double-counted.
Still, the subscriber count not not entirely useless. At the very least the act of subscribing to somethig usually indicates interest and intent.
Measuring “click-throughs” might give one a better idea of how many people are actually actively engaged with your content, though one has to be careful to not sacrifice user experience for the sake of accurate statistics. As discussed above, creating a “partial feed” and forcing a user to click-through to your site in order to get an article abstract just so that you can boost your click-through statistics may ultimately be counter-productive. Again, at the very least you should familiarize yourself with the “full vs partial feed” debate before making a decision about how much you want to depend on click-throughs for measuring usage of your RSS feeds.
Gathering click-through statistics from RSS feeds is generally accomplished by making sure that links in the RSS feed are encoded in some way that lets the publisher look at the referrer and determine that the link was followed from an RSS feed. This is often done by appending parameters to the link url.
Example: |
http://www.zyz.com/index.html?source=rss |
If you generate your RSS feeds dynamically, you might even wish to customize the links based on the UserAgent of the application retrieving the feed:
Example: |
http://www.zyz.com/index.html?source=bloglines_rss |
We recommend that, where possible, Scholarly publishers use the DOI to link to their content. In order to measure click-throughs and still use the DOI for linking to your content, you will want to structure your links using DOI Parameter Passing. In practice, this would mean that a link to an article that might normally be recorded like this:
Normal Link Element Example: |
<link>http://dx.doi.org/10.5555/5551212</link> |
Would instead be encoded like this:
Parameter Passing Enabled Link Element Example: |
<link>http://dx.doi.org/openurl?url_ver=Z39.88-2003& rfr_id=info:sid/psychoceramicsjournal.org& rft_id=doi:10.5555/5551212& rfr_dat="rss%3dyes%26source%3dbloglines_rss"</link> |
Note that for DOI parameter passing to work, the publisher must have implemented a service capable of using the nested parameters and have agreed to participate in DOI Parameter Passing.
Again, one of the problems with gathering statistics on RSS uptake is that the act of reading an RSS feed entry does not normally trigger a page GET on your site. One way in which publishers have tried to get around this is to embed uniquely named, invisible graphic in the feed item. This way, any rendering of the RSS feed item would trigger a GET of the graphic from your site. In theory, this should mean that you should be able to detect when a particular item is being read, but in practice this technique is not very accurate because:
Some feed readers render all RSS posts, regardless of whether a user is actually reading the post. For example, this occurs when scrolling through a list of entries in Google Reader.
Many feed readers will cache content for offline viewing
Some feed readers or bots consuming RSS feeds might strip posts of graphics.
As noted in the introduction to these recommendations, RSS feeds are increasingly used for purposes beyond simply syndicating information to users: feeds are also typically processed, aggregated and repurposed as part of a growing range of knowledge discovery and data mining services. The recommendations in this document aim to support and encourage these diverse users by ensuring that there is rich, well-expressed metadata included in each feed.
However to properly enable these forms, RSS feeds should ideally be clearly annotated with some form of rights statement. Publishers are encouraged to consider the license to reuse metadata embedded in their feeds, such reuses are likely to cover both commercial and non-commercial uses of the data.
The Creative Commons website defines a standard way to include licensing metadata in an RSS 1.0 feed. This approach allows a publisher to use any of the existing range of Creative Commons licensing options and to also provide pointers to additional licensing terms, e.g. to directly enable academic (non-commercial) usage, but require additional agreements for commercial usage.
These are as many methods for creating RSS Feeds as there are for creating HTML web pages. Here is a brief summary of some of the methods.
Manual Coding. This is not a desirable option as it is very time consuming – however for publishers of single journals it may be a viable option.
Online/Desktop Creation Tools. There are many generic tools for creating Feeds, however many of these tools do not support the use of modules.
Content Management Systems. Many content management system have the option to export metadata in RSS format. Open Journal Systems [http://pkp.sfu.ca/?q=ojs] is one example specifically for Journal tables of contents which has an RSS plugin to export feeds in a variety of formats, included RSS 1.0 with DC and PRISM modules.
Databases. RSS feeds can be generated from data stored in a database.
It is essential that Feeds and OPML files are valid. Here are some online validation tools.
Feed Validator for Atom and RSS
W3C Feed Validation Service, for Atom and RSS
OPML Validator Beta
Note that the above-mentioned validation tools will only verify that Feeds or OPML files are valid, they will not confirm that they follow the best practices recommended here. CrossRef will develop and host a tool that can be used to confirm that your feeds are both valid and that they conform to these recommendations. We will update this document with a link to the tool when it is ready for use.
This section provides some additional discussion on the correct use of media types for RSS feeds.
Media types are an important feature of the web architecture that enable browsers and other user agents (e.g. RSS feed readers) to correctly identify and process content that they retrieve from the web. In the context of RSS this is particularly important as without using the correct media type, a user may by shown XML markup rather than being offered the option to subscribe to an RSS feed using their configured feed reader.
Several mimetypes have been commonly used or recommended for use for delivering RSS feeds. This advice is often contradictory and does not always take into account all relevant use cases. These recommendations take a pragmatic approach that attempts to address the most common issues. The following table outlines some commonly used media types and the issues related to them:
Media Type | Issues |
text/xml | This media type is often used for delivering RSS feeds or XML documents. However due to the way that the media type has been specified, user agents may incorrectly process the contents with a character set of US-ASCII. This is highly likely to cause problems with the RSS feeds containing bibliographhic information that typically contains characters from the wider unicode character set. |
application/rss+xml
|
This media type was proposed as a standard for delivering RSS feeds, and is still widely used. It is also used as an identified in RSS auto-discovery links to indicate that the link refers to an RSS feed. However, while media type is well supported in RSS readers, the media type is not formally registered. |
application/rdf+xml | This is the correct, standard media type for delivering RDF/XML documents such as RSS 1.0 feeds, as specified in this document. However this media type is not well supported in browsers and in many cases may cause confusion for an end user who may be prompted to download the feed rather than subscribe to it |
application/atom+xml | This is the correct, standard media type for delivering Atom feeds. It should be used for delivering only Atom documents, and not RSS 1.0 feeds. |
application/xml | This is the standard, default media type for delivering XML documents over the web. It does not suffer from the same character encoding issues as text/xml. The media type is well supported in RSS feaders, so offers the same advantages as application/rss+xml. |
To encourage the use of standardised media types, this document therefore recommends the use of application/xml as the default media type for delivering RSS feeds. It is also recommended that feeds are delivered with UTF-8 character encoding, requiring that the feed be delivered with a content type of: application/xml; encoding=UTF-8.
Note: this might seemingly introduce a slight discrepancy between the recommended media type for delivering an RSS feed, and the media type used to link to an RSS feed from an RSS Autodiscovery link. In the latter case, a media type of application/rss+xml. However it is important to recognise that in the case of linking to a feed, the media type is being used simply as a label, it does not define the processing behaviour of the feed reader.
An example Atom TOC feed based on the cut-down (and slightly modified) version of the example RSS feed given in Sect. 4 is shown here for illustrative purposes only. Both <feed> and <entry> elements are displayed.
<feed xmlns="..." xmlns:dc="..." xmlns:prism="...">
<title type="text">Nature</title>
<author>
<name>Nature Publishing Group</name>
</author>
<updated/>
<id/>
<link rel="self" type="application/atom+xml" href="http://www.nature.com/nature/current_issue/atom"/>
<!--
<image rdf:resource="http://www.nature.com/includes/rj_globnavimages/nature_logo.gif"/>
-->
<icon/>
<rights/>
<dc:publisher>Nature Publishing Group</dc:publisher>
<dc:language>en</dc:language>
<dc:rights>© 2007 Nature Publishing Group</dc:rights>
<prism:publicationName>Nature</prism:publicationName>
<prism:issn>0028-0836</prism:issn>
<prism:eIssn>1476-4679</prism:eIssn>
<prism:copyright>© 2007 Nature Publishing Group</prism:copyright>
<prism:rightsAgent>permissions@nature.com</prism:rightsAgent>
<entry>
...
</entry>
...
</feed>
<entry>
<id/>
<title>Structure-based activity prediction for an enzyme of unknown function</title>
<link rel="alternate" type="text/html" href="http://dx.doi.org/10.1038/nature05981"/>
<summary>With many genomes sequenced, a pressing challenge in biology is predicting the function of the proteins that the genes encode. When proteins are unrelated to others of known activity, bioinformatics inference for function becomes problematic. It would thus be useful to interrogate protein structures for </summary>
<updated/>
<author>
<name>...</name>
</author>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<p><b>Structure-based activity prediction for an enzyme of unknown function</b></p>
<p>Nature 448, 775 (2007). <a href="http://dx.doi.org/10.1038/nature05981">doi:10.1038/nature05981</a>
</p><p>Authors: Johannes C. Hermann, Ricardo Marti-Arbona, Alexander A. Fedorov, Elena Fedorov, Steven C. Almo, Brian K. Shoichet & Frank M. Raushel</p><p>With many genomes sequenced, a pressing challenge in biology is predicting the function of the proteins that the genes encode. When proteins are unrelated to others of known activity, bioinformatics inference for function becomes problematic. It would thus be useful to interrogate protein structures for </p>
</div>
</content>
<dc:title>Structure-based activity prediction for an enzyme of unknown function</dc:title>
<dc:creator>Hermann, Johannes C.</dc:creator>
<dc:creator>Marti-Arbona, Ricardo</dc:creator>
<dc:creator>Fedorov, Alexander A. </dc:creator>
<dc:creator>Fedorov, Elena</dc:creator>
<dc:creator>Almo, Steven C. </dc:creator>
<dc:creator>Shoichet, Brian K.</dc:creator>
<dc:creator>Raushel, Frank M.</dc:creator>
<dc:identifier>doi:10.1038/nature05981</dc:identifier>
<dc:source>Nature 448, 775 (2007)</dc:source>
<dc:date>2007-07-01</dc:date>
<prism:publicationName>Nature</prism:publicationName>
<prism:publicationDate>2007-07-01</prism:publicationDate>
<prism:volume>448</prism:volume>
<prism:number>7155</prism:number>
<prism:section>Article</prism:section>
<prism:startingPage>775</prism:startingPage>
<prism:endingPage>779</prism:endingPage>
</entry>