JournalTOCs Blog

News and Opinions about current awareness on new research

At last we got “200 OK” from Atypon for Maney

without comments

wget for Maney journal RSS feeds

One year ago (21st March 2014 to be exact) we contacted Helen Duce, the Head of E-Publishing at Maney Publishing, because after Maney migrated to its new Atypon’s e-publishing platform (Literatum), JournalTOCs was unable to crawl the TOC RSS feeds of Maney’s journals.

JournalTOCS not only uses the effective and simple RSS feeds to get the latest articles from over 25,000 journals. It also uses a very basic version of the simple, but still effective, wget unix command:

wget -O newtocs.tmp "journal-RSS-feed-URL" 2>&1

That is it. A wget that has nothing to hide or try to use its rich options to force crawling.

As we can only communicate with the publishers, we couldn’t discuss the problem directly with Atypon. So, we contacted Maney many times. While Helen was very helpful, Atypon was telling Maney that everything was OK at their end, but we knew that we were being refused access to the RSS feeds.

Today, Helen gave us the good news that Maney have finally heard back from Atypon on this issue. It turns out that our IP range was blocked by Maney Online (Atypon) because of “abuse monitoring“, given that JournalsTOCs was crawling content (RSS feeds) which Atypon flagged up as abuse.

Fortunately the misunderstanding has been resolved. Atypon has noticed that crawling RSS feeds is not abuse. The very reason for having RSS feeds is to enable other services to crawl and reuse your feeds to facilitate the widest dissemination of your content, which at the end of the day will benefit your business because it would increase the number of visitors to your site.

We are glad to be able to access the RSS feeds of Maney again. We will restore the Maney journals that were selected by the JournalTOCs Index and start to update their TOCs. In the last year, usage (number of followers) for Maney’s journals have decreased at JournalTOCs, but we hope that once users see that Maney’s journals are being updated, they will start to follow Maney journals again.

Publishers that are changing platforms should make sure to check that their RSS feeds continue being accessible for aggregators and discovery services. By working together, publishers, discovery services, aggregators and e-publishing platforms, can create positive impact in facilitating the dissemination of research.

“the success of these systems [link resolvers and knowledgebases] and services is ultimately dependent upon the cooperation of the various players across the supply chain of electronic resource metadata”
(van Ballegooie, Marlene (2015) Knowledgebases: The Cornerstone of E-Resource Management and Access. Serials Review 40(4) pp. 259-266. DOI: 10.1080/00987913.2014.977127)

Written by Santiago Chumbe

March 13th, 2015 at 1:02 pm

Why publishers should never NOINDEX their RSS feeds

without comments

NoIndex

Last week, JournalTOCs stopped indexing all of the 40 journals published by OA Publishing London because this publisher took the unusual and illogical measure of requesting aggregators not to index (aggregate) the RSS feeds for the current issues of its journals. Tables of Contents from the OA Publishing London journals will no longer be updated at JournalTOCs. Those who have been following any of the 40 journals will not be able to keep up with new issues.

Why would OA Publishing London want to stop aggregators and search engines from crawling and collecting its RSS feeds? Years ago, it might just have made some sense using the noindex meta-tag for RSS feeds, but nowadays there is no need to noindex such feeds. Google and the rest of modern search engines can easily identify RSS feeds and they act on that by not including RSS feeds in web search results.

Publishers should, in reality, very much want their RSS feeds to be indexed, because it can help aggregators and search engines to direct users to where the newest content is. Search engines are smart enough to understand the difference between a feed and webpage, and use the feed as a pointer to the webpage where the real source of the content resides. Allowing search engines to index RSS feeds is therefore an important way to drive traffic to the webpages of the actual content.

There is no scenario in which a publisher is not interested in having their latest content indexed. Old feeds generators, such as the deprecated Feedburner, still provide users with the outdated option to noindex feeds to prevent them from being penalized by search engines. Publishers need to be reassured that that it is no longer an issue, and indexed feeds do not create penalty situations. Google itself will normally not show RSS feeds in search results.

The noindex meta-tag is not good for publishers. Any publisher who wants to enable RSS readers, aggregators and APIs to reuse details of their content should make sure to remove the noindex meta-tag from their RSS pages and from their software that generates RSS feeds.

The noindex meta-tag to be removed looks like this:

<meta name="robots" content="noindex">

This code tells search engines and aggregators that they should not index or crawl the content of the RSS feeds.

So, if you want the abstracts of your latest publications to be indexed by JournalTOCs, search engine, aggregator or any web service, and thus ensure that hundreds of thousands of potential readers can discover your content, you should make sure you ARE NOT using the noindex meta-tag.

The noindex meta-tag can help in search engine optimization (SOA) but it should be used wisely, rather than simply assuming that it’s always a good idea to use it. noindex should only be used for web pages you don’t want showing up in search results or want to hide from the external world. For example a test page, archive page, or something similar that is not relevant for the publisher’s business; these should have the noindex tag, so that they don’t end up taking the place of the real important pages in search results (Google’s algorithm tends to avoid placing multiple links from the same domain on the front page (unless the website has a good ranking)).

For optimal crawling, Google recommends using also RSS/Atom feeds

RSS pages (feeds) are not only relevant pages; they are used by the search engines and aggregators to redirect users to your relevant webpages! They help to market your real content. They are good for everyone, including readers, authors, end users and for your business.

Written by Santiago Chumbe

January 26th, 2015 at 5:07 pm

Crowdsourcing the journal selection process

without comments

Selecting the Best Journals with Crowdsourcing

Since this year JournalTOCs has started to move on to a crowdsourcing model to maintain its growing database of journals.

Reaching the 24,000 journals milestone was the turning point. This number practically represents the bulk of relevant journals that have been selected and added by the selection team of JournalTOCs. In May we recognized that the selection process would greatly benefit from the contributions from professionals interested in having all the relevant journals in JournalTOCs.

The decision of using crowdsourcing was mainly based on two facts:

  1. Our small selection team cannot cope with the hundreds of requests we receive every day, most of them from relatively new Open Access (OA) publishers, asking us to add their journals to JournalTOCs. Very few of those journals pass the selection process.
  2. We had a growing number of talented and enthusiastic users, principally professional academic librarians, who have been helping us with the discovery and evaluation of new journals. Almost all the journals suggested by those users have passed the selection criteria.

Crowdselection works for JournalTOCs because the selection process relies upon the knowledge and requirements of those who actually need to use or provide access for the missing journals. In some way our approach is inspired in a crowdsourcing strategy used in the investment market, where the average price produced by ‘grey markets’ have demonstrated to be more accurate than the predictions made by the experts.

For example: Grey markets ran last year on both the Royal Mail and Twitter IPOs were more accurate in predicting prices than bankers and their advisers. On the Twitter IPO, the grey market predicted shares at the end of first day of trading would be worth $44. They actually ended up at $45.06 – incredibly close, particularly when you consider the price set the “expert” bankers was $26.

It was natural then to provide our valued users with the means to add and edit journals. Without realizing we started to use crowdsourcing to expand and update JournalTOCs. Thus gradually, crowdselection is effectively accomplishing the selection process that was once the province of the specialized team. The initial results are very encouraging.

Adding new journals and updating journals involves very few simple steps. The user counts with tools to first verify that both the publisher and the journals are not already registered with JournalTOCs. After this, the journal, and if necessary, the publisher too, can be added to the database. Crowdselection only adds journals that meet the following Selection Criteria:

  • The journal is a scientific or academic journal that publishes peer-reviewed research papers.
  • The journal must have an editor, an editorial board and a verifiable peer-review system in place.
  • The journal must publish TOC RSS feeds for its most recent issues.
  • The journal can be a magazine provided that it has a proven record of publishing only technical and professional reviewed material that is relevant to industry, government and research (e.g. Harvard Business Review Magazine)
  • The journal is an active journal that has published different issues in this year and the previous year. Brand new journals with only one issue published cannot be added to JournalTOCs. In particular we are carefully with new Open Access journals published by dubious houses.

Crowdselection includes an automated system that verifies new journals and the user who has created the journal is contacted if we notice that further guidance is needed.

A positive consequence of using crowdsourcing to maintain the entire database would be the possibility of making all the features of JournalTOCS Premium, that do not require institutional customisation, freely available to anyone, starting with the users that have helped to maintain the database of journals.

Written by Santiago Chumbe

September 29th, 2014 at 12:26 pm

How to grab an RSS feed of the latest articles of a journal and have it show up as a widget on other website

without comments

To grab an RSS feeds for a particular journal from JournalTOCs, you can use the API call journals. For example:

http://www.journaltocs.ac.uk/api/journals/0143-3369?output=articles&user=super.journaltocs@gmail.com

The above call will grab the feeds produced and normalized by JournalTOCs for the journal with ISSN 0143-3369. You must provide the email address you have used to register with JournalTOCs as the value for the parameter “user”.

By default the links of the individual articles are the original links provided by the publisher or the OpenURL links created with your institutional OpenURL if found available. But, if you want that those links include your ezProxy, you need to use a Premium account. In this case, you or your Account Administrator need to go to your “Service Configuration” window and select the “Accounts” tab and find the “Links to use for the articles returned by the API” section. In this section tick the “Append the Institutional ezProxy” option and hit “Save”. Now your RSS feeds will include your proxy-server string in the URLs that go to individual articles (the <link> element in the RSS feeds (please use browser’s “View Page Source” to view the RSS content)

Normalized Journal TOC RSS feeds

Written by Santiago Chumbe

August 27th, 2014 at 6:11 pm

Systematic identification of OA articles from hybrid journals

with 2 comments

JournalTOCs is pleased to announce that the automated identification of Open Access (OA) articles from hybrid journals has started to work today

This is a highly important development in the efforts being made towards enabling systematic and easy identification of Open Access articles for aggregators, discovery services and A&I providers.

Publishers start to enable the systematic identification of Open Access at the Article Level

These first results are the product of collaboration between JournalTOCs and more than 10 established commercial forward thinking publishers.

Being able to systematically and consistently identify Open Access articles, regardless where they have been published, has a huge potential for the progress of Open Access and could play a vital role in the success of using the hybrid model to migrate subscription-based titles to full Open Access in a sustainable way for authors, readers, librarians and publishers.

The technology behind this new service is the simple and easy to use TOC RSS feeds. RSS feeds are also relatively easy to implement.

A publisher wanting to support the automated discovery of Open Access from its journals only needs to create its RSS feeds by following these best practices and these steps.

Example showing how an OA article from a hybrid journal is identified by JournalTOCs:

OA article in a Hybrid journal

http://www.journaltocs.ac.uk/index.php?action=search&query=1740-0597

At this stage the OA articles are only identified as such by the OA logo Open Access and an orange background. As more publishers implement the <cc:license> and <dc:rights> standard elements in their RSS feeds, we will be able to provide information on the type of CC licence and the copyright holder for each OA article. The information will be obtained by combining the possible implementations of the <cc:license> and <dc:rights> elements:

Article copyright
Article copyright belongs to the publisher:
<dc:rights>Copyright © Publication_Year Publisher_Name</dc:rights>
Example:
<dc:rights>Copyright © 2014 ScienceMed Publisher Ltd</dc:rights>
 
Article copyright belongs to the author(s):
<dc:rights>Copyright © Publication_Year First Author_Surname, First_Author_Initial [et al]</dc:rights>
Example:
<dc:rights>Copyright © 2014 Smith J.</dc:rights>
Type of Creative Commons licence (only for OA articles)
– for CC-BY licences:
<cc:license rdf:resource=”http://creativecommons.org/licenses/by/4.0/” />
– for CC-BY-NC licenses:
<cc:license rdf:resource=”http://creativecommons.org/licenses/by-nc/4.0/” />
– for CC-BY-NC-SA licenses:
<cc:license rdf:resource=”http://creativecommons.org/licenses/by-nc-sa/4.0/” />
– for CC-BY-NC-ND licenses:
<cc:license rdf:resource=”http://creativecommons.org/licenses/by-nc-nd/4.0/” />
Subscription-based or non-OA articles
<cc:license></cc:license>
Example of an RSS feeds’ root element showing all the required namespaces to enable OA discovery at the article level:
<rdf:RDF
xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#
xmlns:prism=”http://prismstandard.org/namespaces/basic/2.0/
xmlns:dc=”http://purl.org/dc/elements/1.1/
xmlns:content=”http://purl.org/rss/1.0/modules/content/
xmlns:cc=”http://web.resource.org/cc/
xmlns=”http://purl.org/rss/1.0/
>

Written by Santiago Chumbe

April 7th, 2014 at 4:54 pm