On the Reuse of Scientific Data

Authors : Irene V. Pasquetto, Bernadette M. Randles, Christine L. Borgman

While science policy promotes data sharing and open data, these are not ends in themselves. Arguments for data sharing are to reproduce research, to make public assets available to the public, to leverage investments in research, and to advance research and innovation.

To achieve these expected benefits of data sharing, data must actually be reused by others. Data sharing practices, especially motivations and incentives, have received far more study than has data reuse, perhaps because of the array of contested concepts on which reuse rests and the disparate contexts in which it occurs.

Here we explicate concepts of data, sharing, and open data as a means to examine data reuse. We explore distinctions between use and reuse of data.

Lastly we propose six research questions on data reuse worthy of pursuit by the community: How can uses of data be distinguished from reuses? When is reproducibility an essential goal? When is data integration an essential goal? What are the tradeoffs between collecting new data and reusing existing data? How do motivations for data collection influence the ability to reuse data? How do standards and formats for data release influence reuse opportunities?

We conclude by summarizing the implications of these questions for science policy and for investments in data reuse.

URL : On the Reuse of Scientific Data

DOI : http://doi.org/10.5334/dsj-2017-008

Research Data Reusability: Conceptual Foundations, Barriers and Enabling Technologies

Author : Costantino Thanos

High-throughput scientific instruments are generating massive amounts of data. Today, one of the main challenges faced by researchers is to make the best use of the world’s growing wealth of data. Data (re)usability is becoming a distinct characteristic of modern scientific practice.

By data (re)usability, we mean the ease of using data for legitimate scientific research by one or more communities of research (consumer communities) that is produced by other communities of research (producer communities).

Data (re)usability allows the reanalysis of evidence, reproduction and verification of results, minimizing duplication of effort, and building on the work of others. It has four main dimensions: policy, legal, economic and technological. The paper addresses the technological dimension of data reusability.

The conceptual foundations of data reuse as well as the barriers that hamper data reuse are presented and discussed. The data publication process is proposed as a bridge between the data author and user and the relevant technologies enabling this process are presented.

URL : Research Data Reusability: Conceptual Foundations, Barriers and Enabling Technologies

DOI : http://dx.doi.org/10.3390/publications5010002

Knowledge Sharing as a Social Dilemma in Pharmaceutical Innovation

Author : Daria Kim

The article addresses the problem of restricted access to industry-sponsored clinical trial data. In particular, it analyses the intersection of the competing claims that mandatory disclosure of pharmaceutical test data impedes innovation incentives, and that access facilitates new drug development.

These claims are characterised in terms of public-good and common-resource dilemmas. The analysis finds that confidentiality protection of primary research data plays an ambiguous role.

While secrecy, as such, does not solve the public-good problem in pharmaceutical innovation (in the presence of regulatory instruments that protect the originator drug against generic competition), it is likely to exacerbate the common-resource problem, in view of data as a source of verified and new knowledge.

It is argued that the claim of the research-based industry that disclosure of clinical data impedes innovation incentives is misplaced and should not be leveraged against the pro-access policies. The analysis proposes that regulation should adhere to the principle that protection should be confined to competition by imitation.

This implies that the rules of access should be designed in such a way that third-party use of data does not interfere with protection against generic competition. At the same time, the long-term collective benefit can be maximised when the ‘cooperative choice’ – i.e. when everyone shares data – becomes the ‘dominant strategy’.

This can be achieved only when access is not subject to the authorisation of the initial trial sponsors, and when primary data is aggregated, refined and managed on the collective basis.

URL : https://ssrn.com/abstract=2834493

Reproducible and reusable research: Are journal data sharing policies meeting the mark?

Author : Nicole A Vasilevsky, Jessica Minnier, Melissa A Haendel, Robin E Champieux

Background

There is wide agreement in the biomedical research community that research data sharing is a primary ingredient for ensuring that science is more transparent and reproducible.

Publishers could play an important role in facilitating and enforcing data sharing; however, many journals have not yet implemented data sharing policies and the requirements vary widely across journals. This study set out to analyze the pervasiveness and quality of data sharing policies in the biomedical literature.

Methods

The online author’s instructions and editorial policies for 318 biomedical journals were manually reviewed to analyze the journal’s data sharing requirements and characteristics.

The data sharing policies were ranked using a rubric to determine if data sharing was required, recommended, required only for omics data, or not addressed at all. The data sharing method and licensing recommendations were examined, as well any mention of reproducibility or similar concepts.

The data was analyzed for patterns relating to publishing volume, Journal Impact Factor, and the publishing model (open access or subscription) of each journal.

Results

11.9% of journals analyzed explicitly stated that data sharing was required as a condition of publication. 9.1% of journals required data sharing, but did not state that it would affect publication decisions. 23.3% of journals had a statement encouraging authors to share their data but did not require it.

There was no mention of data sharing in 31.8% of journals. Impact factors were significantly higher for journals with the strongest data sharing policies compared to all other data sharing mark categories. Open access journals were not more likely to require data sharing than subscription journals.

Discussion

Our study confirmed earlier investigations which observed that only a minority of biomedical journals require data sharing, and a significant association between higher Impact Factors and journals with a data sharing requirement.

Moreover, while 65.7% of the journals in our study that required data sharing addressed the concept of reproducibility, as with earlier investigations, we found that most data sharing policies did not provide specific guidance on the practices that ensure data is maximally available and reusable.

URL : Reproducible and reusable research: Are journal data sharing policies meeting the mark?

DOI : https://peerj.com/articles/3208/

 

State of the art report on open access publishing of research data in the humanities

Auteurs/Authors : Stefan Buddenbohm, Nathanael Cretin, Elly Dijk, Bertrand Gai e, Maaike De Jong, Jean-Luc Minel, Blandine Nouvel

Publishing research data as open data is not yet common practice for researchers in the arts and humanities, and lags behind other scientific fields, such as the natural sciences. Moreover, even when humanities researchers publish their data in repositories and archives, these data are often hard to find and use by other researchers in the field.

The goal of Work Package 7 of the the HaS (Humanities at Scale) DARIAH project is to develop an open humanities data platform for the humanities. Work in task 7.1 is a joint effort of Data Archiving and Networked Services (DANS), Centre National de la Recherche Scientifique (CNRS) and the University of Göttingen – State and University Library (UGOE-SUB).

This report gives an overview of the various aspects that are connected to open access publishing of research data in the humanities. After the introduction, where we give definitions of key concepts, we describe the research data life cycle.

We present an overview of the different stakeholders involved and we look into advantages and obstacles for researchers to share research data. Furthermore, a description of the European data repositories is given, followed by certification standards of trusted digital data repositories.

The possibility of data citation is important for sharing open data and is also described in this report. We also discuss the standards and use of metadata in the humanities. Finally, we discuss best practice example of open access research data system in the humanities: the French open research data ecosystem.

With this report we provide information and guidance on open access publishing of humanities research data for researchers. The report is the result of a desk study towards the current state of open access research data and the specific challenges for humanities. It will serve as input for Task 7.2., which will deliver a design and sustainability plan for an open humanities data platform, and for Task 7.3, which will deliver this platform.

URL : https://halshs.archives-ouvertes.fr/halshs-01357208

Sharing data increases citations

Authors: Thea Marie Drachen, Ole Ellegaard, Asger Væring Larsen, Søren Bertil Fabricius Dorch

This paper presents some indications to the existence of a citation advantage related to sharing data using astrophysics as a case. Through bibliometric analyses we find a citation advantage for astrophysical papers in core journals.

The advantage arises as indexed papers are associated with data by bibliographical links, and consists of papers receiving on average significantly more citations per paper per year, than do papers not associated with links to data.

DOI : https://www.liberquarterly.eu/article/10.18352/lq.10149/

The Journal Article as a Means to Share Data: a Content Analysis of Supplementary Materials from Two Disciplines

Authors : Jeremy Kenyon, Nancy Sprague, Edward Flathers

INTRODUCTION

The practice of publishing supplementary materials with journal articles is becoming increasingly prevalent across the sciences.

We sought to understand better the content of these materials by investigating the differences between the supplementary materials published by authors in the geosciences and plant sciences.

METHODS

We conducted a random stratified sampling of four articles from each of 30 journals published in 2013. In total, we examined 297 supplementary data files for a range of different factors.

RESULTS

We identified many similarities between the practices of authors in the two fields, including the formats used (Word documents, Excel spreadsheets, PDFs) and the small size of the files.

There were differences identified in the content of the supplementary materials: the geology materials contained more maps and machine-readable data; the plant science materials included much more tabular data and multimedia content.

DISCUSSION

Our results suggest that the data shared through supplementary files in these fields may not lend itself to reuse. Code and related scripts are not often shared, nor is much ‘raw’ data. Instead, the files often contain summary data, modified for human reading and use.

CONCLUSION

Given these and other differences, our results suggest implications for publishers, librarians, and authors, and may require shifts in behavior if effective data sharing is to be realized.

URL : The Journal Article as a Means to Share Data: a Content Analysis of Supplementary Materials from Two Disciplines

DOI : http://doi.org/10.7710/2162-3309.2112