Data sharing and reanalysis of randomized controlled trials in leading biomedical journals with a full data sharing policy: survey of studies published in The BMJ and PLOS Medicine

Authors : Florian Naudet, Charlotte Sakarovitch, Perrine Janiaud, Ioana Cristea, Daniele Fanelli, David Moher, John P A Ioannidis

Objectives

To explore the effectiveness of data sharing by randomized controlled trials (RCTs) in journals with a full data sharing policy and to describe potential difficulties encountered in the process of performing reanalyses of the primary outcomes.

Design

Survey of published RCTs.

Setting

PubMed/Medline.

Eligibility criteria

RCTs that had been submitted and published by The BMJ and PLOS Medicine subsequent to the adoption of data sharing policies by these journals.

Main outcome measure

The primary outcome was data availability, defined as the eventual receipt of complete data with clear labelling. Primary outcomes were reanalyzed to assess to what extent studies were reproduced. Difficulties encountered were described.

Results

37 RCTs (21 from The BMJ and 16 from PLOS Medicine) published between 2013 and 2016 met the eligibility criteria. 17/37 (46%, 95% confidence interval 30% to 62%) satisfied the definition of data availability and 14 of the 17 (82%, 59% to 94%) were fully reproduced on all their primary outcomes. Of the remaining RCTs, errors were identified in two but reached similar conclusions and one paper did not provide enough information in the Methods section to reproduce the analyses. Difficulties identified included problems in contacting corresponding authors and lack of resources on their behalf in preparing the datasets. In addition, there was a range of different data sharing practices across study groups.

Conclusions

Data availability was not optimal in two journals with a strong policy for data sharing. When investigators shared data, most reanalyses largely reproduced the original results. Data sharing practices need to become more widespread and streamlined to allow meaningful reanalyses and reuse of data.

 

A scoping review of comparisons between abstracts and full reports in primary biomedical research

Authors : Guowei Li, Luciana P. F. Abbade, Ikunna Nwosu, Yanling Jin, Alvin Leenus, Muhammad Maaz, Mei Wang, Meha Bhatt, Laura Zielinski, Nitika Sanger, Bianca Bantoto, Candice Luo, Ieta Shams, Hamnah Shahid, Yaping Chang, Guangwen Sun, Lawrence Mbuagbaw, Zainab Samaan, Mitchell A. H. Levine, Jonathan D. Adachi, Lehana Thabane

Background

Evidence shows that research abstracts are commonly inconsistent with their corresponding full reports, and may mislead readers.

In this scoping review, which is part of our series on the state of reporting of primary biomedical research, we summarized the evidence from systematic reviews and surveys, to investigate the current state of inconsistent abstract reporting, and to evaluate factors associated with improved reporting by comparing abstracts and their full reports.

Methods

We searched EMBASE, Web of Science, MEDLINE, and CINAHL from January 1st 1996 to September 30th 2016 to retrieve eligible systematic reviews and surveys. Our primary outcome was the level of inconsistency between abstracts and corresponding full reports, which was expressed as a percentage (with a lower percentage indicating better reporting) or categorized rating (such as major/minor difference, high/medium/low inconsistency), as reported by the authors.

We used medians and interquartile ranges to describe the level of inconsistency across studies. No quantitative syntheses were conducted. Data from the included systematic reviews or surveys was summarized qualitatively.

Results

Seventeen studies that addressed this topic were included. The level of inconsistency was reported to have a median of 39% (interquartile range: 14% – 54%), and to range from 4% to 78%. In some studies that separated major from minor inconsistency, the level of major inconsistency ranged from 5% to 45% (median: 19%, interquartile range: 7% – 31%), which included discrepancies in specifying the study design or sample size, designating a primary outcome measure, presenting main results, and drawing a conclusion.

A longer time interval between conference abstracts and the publication of full reports was found to be the only factor which was marginally or significantly associated with increased likelihood of reporting inconsistencies.

Conclusions

This scoping review revealed that abstracts are frequently inconsistent with full reports, and efforts are needed to improve the consistency of abstract reporting in the primary biomedical community.

URL : A scoping review of comparisons between abstracts and full reports in primary biomedical research

DOI : https://doi.org/10.1186/s12874-017-0459-5

Open access policies of high impact medical journals: a cross-sectional study

Authors : Tim Ellison, Tim Koder, Laura Schmidt, Amy Williams, Christopher Winchester

Introduction

Journal publishers increasingly offer governmental and charitable research funders the option to pay for open access with a Creative Commons Attribution (CC BY) licence, which allows sharing and adaptation of published materials for commercial as well as non-commercial use.

The Open Access Scholarly Publishers Association recommends this licence as the least restrictive Creative Commons licence available. We set out to investigate whether pharmaceutical companies are offered the same options.

Methods

Using Journal Selector (Sylogent, Newtown, PA, USA), we identified journals with a 2015 impact factor of at least 15 on 24 May 2017, and excluded journals that only publish review articles from the analysis.

Between 29 June 2017 and 26 July 2017, we collected information about the journals’ open access policies from their websites and/or by email contact. We contacted the journals by email again between 6 December 2017 and 2 January 2018 to confirm our findings.

Results

Thirty-seven non-review journals listed in the Journal Selector database, from 14 publishers, had a 2015 impact factor of at least 15. All 37 journals offered some form of access with varying embargo periods of up to 12 months.

Of these journals, 23 (62%) offered immediate open access with a CC BY licence under certain circumstances (e.g. to specific research funders). Of these 23, only one journal confirmed that it offered a CC BY licence to commercial funders/pharmaceutical companies.

Conclusion

The open access policies of most medical journals with high impact factors restrict the dissemination of medical research funded by the pharmaceutical industry.

To give the scientific community freedom to read, reuse and adapt medical publications, publishers and academic journal editors would ideally allow pharmaceutical companies to fund unrestricted and immediate open access with a CC BY licence.

URL : Open access policies of high impact medical journals: a cross-sectional study

DOI : https://doi.org/10.1101/250613

DataMed – an open source discovery index for finding biomedical datasets

Authors : Xiaoling Chen, Anupama E Gururaj, Burak Ozyurt, Ruiling Liu, Ergin Soysal, Trevor Cohen, Firat Tiryaki, Yueling Li, Nansu Zong, Min Jiang, Deevakar Rogith, Mandana Salimi, Hyeon-eui Kim, Philippe Rocca-Serra, Alejandra Gonzalez-Beltran, Claudiu Farcas, Todd Johnson, Ron Margolis, George Alter, Susanna-Assunta Sansone, Ian M Fore, Lucila Ohno-Machado, Jeffrey S Grethe, Hua Xu

Objective

Finding relevant datasets is important for promoting data reuse in the biomedical domain, but it is challenging given the volume and complexity of biomedical data. Here we describe the development of an open source biomedical data discovery system called DataMed, with the goal of promoting the building of additional data indexes in the biomedical domain.

Materials and Methods

DataMed, which can efficiently index and search diverse types of biomedical datasets across repositories, is developed through the National Institutes of Health–funded biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium.

It consists of 2 main components: (1) a data ingestion pipeline that collects and transforms original metadata information to a unified metadata model, called DatA Tag Suite (DATS), and (2) a search engine that finds relevant datasets based on user-entered queries.

In addition to describing its architecture and techniques, we evaluated individual components within DataMed, including the accuracy of the ingestion pipeline, the prevalence of the DATS model across repositories, and the overall performance of the dataset retrieval engine.

Results and Conclusion

Our manual review shows that the ingestion pipeline could achieve an accuracy of 90% and core elements of DATS had varied frequency across repositories. On a manually curated benchmark dataset, the DataMed search engine achieved an inferred average precision of 0.2033 and a precision at 10 (P@10, the number of relevant results in the top 10 search results) of 0.6022, by implementing advanced natural language processing and terminology services.

Currently, we have made the DataMed system publically available as an open source package for the biomedical community.

DOI : https://doi.org/10.1093/jamia/ocx121

 

Balancing the local and the universal in maintaining ethical access to a genomics biobank

Authors : Catherine Heeney, Shona M. Kerr

Background

Issues of balancing data accessibility with ethical considerations and governance of a genomics research biobank, Generation Scotland, are explored within the evolving policy landscape of the past ten years. During this time data sharing and open data access have become increasingly important topics in biomedical research.

Decisions around data access are influenced by local arrangements for governance and practices such as linkage to health records, and the global through policies for biobanking and the sharing of data with large-scale biomedical research data resources and consortia.

Methods

We use a literature review of policy relevant documents which apply to the conduct of biobanks in two areas: support for open access and the protection of data subjects and researchers managing a bioresource.

We present examples of decision making within a biobank based upon observations of the Generation Scotland Access Committee. We reflect upon how the drive towards open access raises ethical dilemmas for established biorepositories containing data and samples from human subjects.

Results

Despite much discussion in science policy literature about standardisation, the contextual aspects of biobanking are often overlooked. Using our engagement with GS we demonstrate the importance of local arrangements in the creation of a responsive ethical approach to biorepository governance.

We argue that governance decisions regarding access to the biobank are intertwined with considerations about maintenance and viability at the local level. We show that in addition to the focus upon ever more universal and standardised practices, the local expertise gained in the management of such repositories must be supported.

Conclusions

A commitment to open access in genomics research has found almost universal backing in science and health policy circles, but repositories of data and samples from human subjects may have to operate under managed access, to protect privacy, align with participant consent and ensure that the resource can be managed in a sustainable way.

Data access committees need to be reflexive and flexible, to cope with changing technology and opportunities and threats from the wider data sharing environment. To understand these interactions also involves nurturing what is particular about the biobank in its local context.

URL : Balancing the local and the universal in maintaining ethical access to a genomics biobank

DOI : https://doi.org/10.1186/s12910-017-0240-7

Biotea: semantics for Pubmed Central

Authors : Alexander Garcia​, Federico Lopez, Leyla Garcia, Olga Giraldo, Victor Bucheli, Michel Dumontier

A significant portion of biomedical literature is represented in a manner that makes it difficult for consumers to find or aggregate content through a computational query. One approach to facilitate reuse of the scientific literature is to structure this information as linked data using standardized web technologies.

In this paper we present the second version of Biotea, a semantic, linked data version of the open-access subset of PubMed Central that has been enhanced with specialized annotation pipelines that uses existing infrastructure from the National Center for Biomedical Ontology.

We expose our models, services, software and datasets. Our infrastructure enables manual and semi-automatic annotation, resulting data are represented as RDF-based linked data and can be readily queried using the SPARQL query language.

We illustrate the utility of our system with several use cases. Our datasets, methods and techniques are available at http://biotea.github.io.

URL : Biotea: semantics for Pubmed Central

DOI : https://doi.org/10.7717/peerj.4201