OpenCitations, an infrastructure organization for open scholarship

Authors : Silvio Peroni, David Shotton

OpenCitations is an infrastructure organization for open scholarship dedicated to the publication of open citation data as Linked Open Data using Semantic Web technologies, thereby providing a disruptive alternative to traditional proprietary citation indexes.

Open citation data are valuable for bibliometric analysis, increasing the reproducibility of large-scale analyses by enabling publication of the source data. Following brief introductions to the development and benefits of open scholarship and to Semantic Web technologies, this paper describes OpenCitations and its data sets, tools, services, and activities.

These include the OpenCitations Data Model; the SPAR (Semantic Publishing and Referencing) Ontologies; OpenCitations’ open software of generic applicability for searching, browsing, and providing REST APIs over resource description framework (RDF) triplestores; Open Citation Identifiers (OCIs) and the OpenCitations OCI Resolution Service; the OpenCitations Corpus (OCC), a database of open downloadable bibliographic and citation data made available in RDF under a Creative Commons public domain dedication; and the OpenCitations Indexes of open citation data, of which the first and largest is COCI, the OpenCitations Index of Crossref Open DOI-to-DOI Citations, which currently contains over 624 million bibliographic citations and is receiving considerable usage by the scholarly community.

URL : OpenCitations, an infrastructure organization for open scholarship

DOI : https://doi.org/10.1162/qss_a_00023

The NIH Open Citation Collection: A public access, broad coverage resource

Authors : B. Ian Hutchins, Kirk L. Baker, Matthew T. Davis, Mario A. Diwersy, Ehsanul Haque, Robert M. Harriman, Travis A. Hoppe, Stephen A. Leicht, Payam Meyer, George M. Santangelo

Citation data have remained hidden behind proprietary, restrictive licensing agreements, which raises barriers to entry for analysts wishing to use the data, increases the expense of performing large-scale analyses, and reduces the robustness and reproducibility of the conclusions.

For the past several years, the National Institutes of Health (NIH) Office of Portfolio Analysis (OPA) has been aggregating and enhancing citation data that can be shared publicly. Here, we describe the NIH Open Citation Collection (NIH-OCC), a public access database for biomedical research that is made freely available to the community.

This dataset, which has been carefully generated from unrestricted data sources such as MedLine, PubMed Central (PMC), and CrossRef, now underlies the citation statistics delivered in the NIH iCite analytic platform.

We have also included data from a machine learning pipeline that identifies, extracts, resolves, and disambiguates references from full-text articles available on the internet. Open citation links are available to the public in a major update of iCite (https://icite.od.nih.gov).

URL : The NIH Open Citation Collection: A public access, broad coverage resource

DOI : https://doi.org/10.1371/journal.pbio.3000385

 

COCI, the OpenCitations Index of Crossref open DOI-to-DOI citations

Authors : Ivan Heibi, Silvio Peroni, David Shotton

In this paper, we present COCI, the OpenCitations Index of Crossref open DOI-to-DOI citations (this http URL). COCI is the first open citation index created by OpenCitations, in which we have applied the concept of citations as first-class data entities, and it contains more than 445 million DOI-to-DOI citation links derived from the data available in Crossref.

These citations are described in RDF by means of the newly extended version of the OpenCitations Data Model (OCDM).

We introduce the workflow we have developed for creating these data, and also show the additional services that facilitate the access to and querying of these data via different access points: a SPARQL endpoint, a REST API, bulk downloads, Web interfaces, and direct access to the citations via HTTP content negotiation.

Finally, we present statistics regarding the use of COCI citation data, and we introduce several projects that have already started to use COCI data for different purposes.

URL : https://arxiv.org/abs/1904.06052

Open data to evaluate academic researchers: an experiment with the Italian Scientific Habilitation

Authors : Angelo Di Iorio, Silvio Peroni, Francesco Poggi

The need for scholarly open data is ever increasing. While there are large repositories of open access articles and free publication indexes, there are still a few examples of free citation networks and their coverage is partial.

One of the results is that most of the evaluation processes based on citation counts rely on commercial citation databases. Things are changing under the pressure of the Initiative for Open Citations (I4OC), whose goal is to campaign for scholarly publishers to make their citations as totally open.

This paper investigates the growth of open citations with an experiment on the Italian Scientific Habilitation, the National process for University Professor qualification which instead uses data from commercial indexes.

We simulated the procedure by only using open data and explored similarities and differences with the official results. The outcomes of the experiment show that the amount of open citation data currently available is not yet enough for obtaining similar results.

URL : https://arxiv.org/abs/1902.03287

Towards Open Data for the Citation Content Analysis

Authors : Jose Manuel Barrueco, Thomas Krichel, Sergey Parinov, Victor Lyapunov, Oxana Medvedeva, Varvara Sergeeva

The paper presents first results of the CitEcCyr project funded by RANEPA. The project aims to create a source of open citation data for research papers written in Russian.

Compared to existing sources of citation data, CitEcCyr is working to provide the following added values: a) a transparent and distributed architecture of a technology that generates the citation data; b) an openness of all built/used software and created citation data; c) an extended set of citation data sufficient for the citation content analysis; d) services for public control over a quality of the citation data and a citing activity of researchers.

URL : https://arxiv.org/abs/1710.00302