Open access publishing trend analysis: statistics beyond the perception

Authors : Elisabetta Poltronieri, Elena Bravo, Moreno Curti, Maurizio Ferri, Cristina Mancini


The purpose of this analysis was twofold: to track the number of open access journals acquiring impact factor, and to investigate the distribution of subject categories pertaining to these journals. As a case study, journals in which the researchers of the National Institute of Health (Istituto Superiore di Sanità) in Italy have published were surveyed.


Data were collected by searching open access journals listed in the Directory of Open Access Journals ) then compared with those having an impact factor as tracked by the Journal Citation Reports for the years 2010-2012. Journal Citation Reports subject categories were matched with Medical Subject Headings to provide a larger content classification.


A survey was performed to determine the Directory journals matching the Journal Citation Reports list, and their inclusion in a given subject area.


In the years 2010-2012, an increase in the number of journals was observed for Journal Citation Reports (+ 4.93%) and for the Directory (+18.51%). The discipline showing the highest increment was medicine (315 occurrences, 26%).


From 2010 to 2012, the number of open access journals with impact factor has gradually risen, with a prevalence for journals relating to medicine and biological science disciplines, suggesting that authors prefer to publish more than before in open access journals.



Agreements between Industry and Academia on Publication Rights : A Retrospective Study of Protocols and Publications of Randomized Clinical Trials

Authors : Benjamin Kasenda, Erik von Elm, John J. You, Anette Blümle, Yuki Tomonaga, Ramon Saccilotto et al.


Little is known about publication agreements between industry and academic investigators in trial protocols and the consistency of these agreements with corresponding statements in publications.

We aimed to investigate (i) the existence and types of publication agreements in trial protocols, (ii) the completeness and consistency of the reporting of these agreements in subsequent publications, and (iii) the frequency of co-authorship by industry employees.

Methods and Findings

We used a retrospective cohort of randomized clinical trials (RCTs) based on archived protocols approved by six research ethics committees between 13 January 2000 and 25 November 2003.

Only RCTs with industry involvement were eligible. We investigated the documentation of publication agreements in RCT protocols and statements in corresponding journal publications. Of 647 eligible RCT protocols, 456 (70.5%) mentioned an agreement regarding publication of results. Of these 456, 393 (86.2%) documented an industry partner’s right to disapprove or at least review proposed manuscripts; 39 (8.6%) agreements were without constraints of publication.

The remaining 24 (5.3%) protocols referred to separate agreement documents not accessible to us. Of those 432 protocols with an accessible publication agreement, 268 (62.0%) trials were published. Most agreements documented in the protocol were not reported in the subsequent publication (197/268 [73.5%]).

Of 71 agreements reported in publications, 52 (73.2%) were concordant with those documented in the protocol. In 14 of 37 (37.8%) publications in which statements suggested unrestricted publication rights, at least one co-author was an industry employee.

In 25 protocol-publication pairs, author statements in publications suggested no constraints, but 18 corresponding protocols documented restricting agreements.


Publication agreements constraining academic authors’ independence are common. Journal articles seldom report on publication agreements, and, if they do, statements can be discrepant with the trial protocol.

URL : Agreements between Industry and Academia on Publication Rights : A Retrospective Study of Protocols and Publications of Randomized Clinical Trials


Using the Semantic Web for Rapid Integration of WikiPathways with Other Biological Online Data Resources

Authors : Andra Waagmeester,  Martina Kutmon, Anders Riutta, Ryan Miller,  Egon L. Willighagen, Chris T.  Evelo , Alexander R. Pico

The diversity of online resources storing biological data in different formats provides a challenge for bioinformaticians to integrate and analyse their biological data.

The semantic web provides a standard to facilitate knowledge integration using statements built as triples describing a relation between two objects. WikiPathways, an online collaborative pathway resource, is now available in the semantic web through a SPARQL endpoint at

Having biological pathways in the semantic web allows rapid integration with data from other resources that contain information about elements present in pathways using SPARQL queries.

In order to convert WikiPathways content into meaningful triples we developed two new vocabularies that capture the graphical representation and the pathway logic, respectively. Each gene, protein, and metabolite in a given pathway is defined with a standard set of identifiers to support linking to several other biological resources in the semantic web.

WikiPathways triples were loaded into the Open PHACTS discovery platform and are available through its Web API ( to be used in various tools for drug development.

We combined various semantic web resources with the newly converted WikiPathways content using a variety of SPARQL query types and third-party resources, such as the Open PHACTS API. The ability to use pathway information to form new links across diverse biological data highlights the utility of integrating WikiPathways in the semantic web.

URL : Using the Semantic Web for Rapid Integration of WikiPathways with Other Biological Online Data Resources


Assessment of and Response to Data Needs of Clinical and Translational Science Researchers and Beyond

Objective and Setting

As universities and libraries grapple with data management and “big data,” the need for data management solutions across disciplines is particularly relevant in clinical and translational science (CTS) research, which is designed to traverse disciplinary and institutional boundaries.

At the University of Florida Health Science Center Library, a team of librarians undertook an assessment of the research data management needs of CTS researchers, including an online assessment and follow-up one-on-one interviews.

Design and Methods

The 20-question online assessment was distributed to all investigators affiliated with UF’s Clinical and Translational Science Institute (CTSI) and 59 investigators responded. Follow-up in-depth interviews were conducted with nine faculty and staff members.


Results indicate that UF’s CTS researchers have diverse data management needs that are often specific to their discipline or current research project and span the data lifecycle. A common theme in responses was the need for consistent data management training, particularly for graduate students; this led to localized training within the Health Science Center and CTSI, as well as campus-wide training.

Another campus-wide outcome was the creation of an action-oriented Data Management/Curation Task Force, led by the libraries and with participation from Research Computing and the Office of Research.


Initiating conversations with affected stakeholders and campus leadership about best practices in data management and implications for institutional policy shows the library’s proactive leadership and furthers our goal to provide concrete guidance to our users in this area.

URL : Assessment of and Response to Data Needs of Clinical and Translational Science Researchers and Beyond

Alternative location :

OpenTrials: towards a collaborative open database of all available information on all clinical trials

OpenTrials is a collaborative and open database for all available structured data and documents on all clinical trials, threaded together by individual trial.

With a versatile and expandable data schema, it is initially designed to host and match the following documents and data for each trial: registry entries; links, abstracts, or texts of academic journal papers; portions of regulatory documents describing individual trials; structured data on methods and results extracted by systematic reviewers or other researchers; clinical study reports; and additional documents such as blank consent forms, blank case report forms, and protocols.

The intention is to create an open, freely re-usable index of all such information and to increase discoverability, facilitate research, identify inconsistent data, enable audits on the availability and completeness of this information, support advocacy for better data and drive up standards around open data in evidence-based medicine.

The project has phase I funding. This will allow us to create a practical data schema and populate the database initially through web-scraping, basic record linkage techniques, crowd-sourced curation around selected drug areas, and import of existing sources of structured and documents.

It will also allow us to create user-friendly web interfaces onto the data and conduct user engagement workshops to optimise the database and interface designs.

Where other projects have set out to manually and perfectly curate a narrow range of information on a smaller number of trials, we aim to use a broader range of techniques and attempt to match a very large quantity of information on all trials. We are currently seeking feedback and additional sources of structured data.

URL : OpenTrials: towards a collaborative open database of all available information on all clinical trials

Alternative location :

Data publication with the structural biology data grid supports live analysis

Access to experimental X-ray diffraction image data is fundamental for validation and reproduction of macromolecular models and indispensable for development of structural biology processing methods. Here, we established a diffraction data publication and dissemination system, Structural Biology Data Grid (SBDG;, to preserve primary experimental data sets that support scientific publications.

Data sets are accessible to researchers through a community driven data grid, which facilitates global data access. Our analysis of a pilot collection of crystallographic data sets demonstrates that the information archived by SBDG is sufficient to reprocess data to statistics that meet or exceed the quality of the original published structures.

SBDG has extended its services to the entire community and is used to develop support for other types of biomedical data sets. It is anticipated that access to the experimental data sets will enhance the paradigm shift in the community towards a much more dynamic body of continuously improving data analysis.

URL : Data publication with the structural biology data grid supports live analysis

DOI : 10.1038/ncomms10882

Wikidata as a semantic framework for the Gene Wiki initiative

Open biological data are distributed over many resources making them challenging to integrate, to update and to disseminate quickly. Wikidata is a growing, open community database which can serve this purpose and also provides tight integration with Wikipedia.

In order to improve the state of biological data, facilitate data management and dissemination, we imported all human and mouse genes, and all human and mouse proteins into Wikidata.

In total, 59 721 human genes and 73 355 mouse genes have been imported from NCBI and 27 306 human proteins and 16 728 mouse proteins have been imported from the Swissprot subset of UniProt. As Wikidata is open and can be edited by anybody, our corpus of imported data serves as the starting point for integration of further data by scientists, the Wikidata community and citizen scientists alike.

The first use case for these data is to populate Wikipedia Gene Wiki infoboxes directly from Wikidata with the data integrated above. This enables immediate updates of the Gene Wiki infoboxes as soon as the data in Wikidata are modified.

Although Gene Wiki pages are currently only on the English language version of Wikipedia, the multilingual nature of Wikidata allows for usage of the data we imported in all 280 different language Wikipedias.

Apart from the Gene Wiki infobox use case, a SPARQL endpoint and exporting functionality to several standard formats (e.g. JSON, XML) enable use of the data by scientists.

In summary, we created a fully open and extensible data resource for human and mouse molecular biology and biochemistry data. This resource enriches all the Wikipedias with structured information and serves as a new linking hub for the biological semantic web.

URL : Wikidata as a semantic framework for the Gene Wiki initiative

DOI : 10.1093/database/baw015