Ten principles for machine-actionable data management plans

Authors : Tomasz Miksa, Stephanie Simms, Daniel Mietchen, Sarah Jones

Data management plans (DMPs) are documents accompanying research proposals and project outputs. DMPs are created as free-form text and describe the data and tools employed in scientific investigations. They are often seen as an administrative exercise and not as an integral part of research practice.

There is now widespread recognition that the DMP can have more thematic, machine-actionable richness with added value for all stakeholders: researchers, funders, repository managers, research administrators, data librarians, and others.

The research community is moving toward a shared goal of making DMPs machine-actionable to improve the experience for all involved by exchanging information across research tools and systems and embedding DMPs in existing workflows.

This will enable parts of the DMP to be automatically generated and shared, thus reducing administrative burdens and improving the quality of information within a DMP.

This paper presents 10 principles to put machine-actionable DMPs (maDMPs) into practice and realize their benefits. The principles contain specific actions that various stakeholders are already undertaking or should undertake in order to work together across research communities to achieve the larger aims of the principles themselves.

We describe existing initiatives to highlight how much progress has already been made toward achieving the goals of maDMPs as well as a call to action for those who wish to get involved.

URL : Ten principles for machine-actionable data management plans

DOI : https://doi.org/10.1371/journal.pcbi.1006750

Revisiting the Term Predatory Open Access Publishing

Author : Aamir Raoof Memon

Since the 1990s, scholarly publishing has been transformed from subscription print-based paradigm to an open access and digital publishing model, but this transformation has been accompanied by unethical and predatory publishing practices.

‘Pay-to-publish’ predatory journals abuse the open-access publishing model, and their main intention is to make money out of authors for their editor–owners. The defining characteristic of predatory journals is the lack of a proper peer review process, despite their claims to the contrary.

The spectrum of victims of predatory journals varies widely and includes inexperienced, early-career and naive researchers from both developing and high- to upper middle-income countries, together with experienced researchers.

To circumvent this, several black and whitelists have been created. Beall’s list of potential or probable predatory journals remained the go-to list until its sudden closure.

Later, similar lists such as the Stop Predatory Journals website (https://predatoryjournals.com), and institutional lists such as those published by the University Grants Commission (UGC) India, and several other commercial bodies and associations appeared; however, they have been criticized for several reasons, including their poor methodology and lack of transparency.

The world of scholarly publishing is not purely black and white, and there are always some grey areas; therefore, we cannot rely on any such listings.

URL : Revisiting the Term Predatory Open Access Publishing

DOI : https://doi.org/10.3346/jkms.2019.34.e99

Online division of labour: emergent structures in Open Source Software

Authors : María J. Palazzi, Jordi Cabot, Javier Luis Cánovas Izquierdo, Albert Solé-Ribalta, Javier Borge-Holthoefer

The development Open Source Software fundamentally depends on the participation and commitment of volunteer developers to progress. Several works have presented strategies to increase the on-boarding and engagement of new contributors, but little is known on how these diverse groups of developers self-organise to work together.

To understand this, one must consider that, on one hand, platforms like GitHub provide a virtually unlimited development framework: any number of actors can potentially join to contribute in a decentralised, distributed, remote, and asynchronous manner.

On the other, however, it seems reasonable that some sort of hierarchy and division of labour must be in place to meet human biological and cognitive limits, and also to achieve some level of efficiency.

These latter features (hierarchy and division of labour) should translate into recognisable structural arrangements when projects are represented as developer-file bipartite networks.

In this paper we analyse a set of popular open source projects from GitHub, placing the accent on three key properties: nestedness, modularity and in-block nestedness -which typify the emergence of heterogeneities among contributors, the emergence of subgroups of developers working on specific subgroups of files, and a mixture of the two previous, respectively.

These analyses show that indeed projects evolve into internally organised blocks. Furthermore, the distribution of sizes of such blocks is bounded, connecting our results to the celebrated Dunbar number both in off- and on-line environments.

Our analyses create a link between bio-cognitive constraints, group formation and online working environments, opening up a rich scenario for future research on (online) work team assembly.

URL : https://arxiv.org/abs/1903.03375

Responsible data sharing in international health research: a systematic review of principles and norms

Authors : Shona Kalkman, Menno Mostert, Christoph Gerlinger, Johannes J. M. van Delden, Ghislaine J. M. W. van Thiel

Background

Large-scale linkage of international clinical datasets could lead to unique insights into disease aetiology and facilitate treatment evaluation and drug development.

Hereto, multi-stakeholder consortia are currently designing several disease-specific translational research platforms to enable international health data sharing.

Despite the recent adoption of the EU General Data Protection Regulation (GDPR), the procedures for how to govern responsible data sharing in such projects are not at all spelled out yet. In search of a first, basic outline of an ethical governance framework, we set out to explore relevant ethical principles and norms.

Methods

We performed a systematic review of literature and ethical guidelines for principles and norms pertaining to data sharing for international health research.

Results

We observed an abundance of principles and norms with considerable convergence at the aggregate level of four overarching themes: societal benefits and value; distribution of risks, benefits and burdens; respect for individuals and groups; and public trust and engagement.

However, at the level of principles and norms we identified substantial variation in the phrasing and level of detail, the number and content of norms considered necessary to protect a principle, and the contextual approaches in which principles and norms are used.

Conclusions

While providing some helpful leads for further work on a coherent governance framework for data sharing, the current collection of principles and norms prompts important questions about how to streamline terminology regarding de-identification and how to harmonise the identified principles and norms into a coherent governance framework that promotes data sharing while securing public trust.

URL : Responsible data sharing in international health research: a systematic review of principles and norms

DOI : https://doi.org/10.1186/s12910-019-0359-9

Let It Flow: The Monopolization of Academic Content Providers and How It Threatens the Democratization of Information

Author : Dana Lachenmayer

The monopolization of academic journal publishers concentrates power and valuable information into the hands of a few players in the marketplace. It has detrimental effects on how information flows and is accessed.

This, in turn, has profound effects on how a nation progresses. Placed in a theoretical framework, utilizing the marketplace of ideas and the economies that coincide, this article takes a look at the history of Elsevier in order to chart this course toward monopolization.

It exhibits the effect it has already had on the academic community, while offering two models of Open Access as a much sounder option.

DOI : https://doi.org/10.1080/0361526X.2018.1556189

Assessing Data Management Support Needs of Bioengineering and Biomedical Research Faculty

Authors : Christie A. Wiley, Margaret H. Burnette

Objectives

This study explores data management knowledge, attitudes, and practices of bioengineering and biomedical researchers in the context of the National Institutes of Health-funded research projects. Specifically, this study seeks to answer the following questions:

  1. What is the nature of biomedical and bioengineering research on the Illinois campus and what kinds of data are being generated?
  2. To what degree are biomedical and bioengineering researchers aware of best practices for data management and what are the actual data management behaviors?
  3. What aspects of data management present the greatest challenges and frustrations?
  4. To what degree are biomedical and bioengineering researchers aware of data sharing opportunities and data repositories, and what are their attitudes towards data sharing?
  5. To what degree are researchers aware of campus services and support for data management planning, data sharing, and data deposit, and what is the level of interest in instruction in these areas?

Methods

Librarians on the University of Illinois at Urbana Champaign campus conducted semi-structured interviews with bioengineering and biomedical researchers to explore researchers’ knowledge of data management best practices, awareness of library campus services, data management behavior and challenges managing research data.

The topics covered during the interviews were current research projects, data types, format, description, campus repository usage, data-sharing, awareness of library campus services, data reuse, the anticipated impact of health on public and challenges (interview questions are provided in the Appendix).

Results

This study revealed the majority of researchers explore broad research topics, various file storage solutions, generate numerous amounts of data and adhere to differing discipline-specific practices. Researchers expressed both familiarity and unfamiliarity with DMP Tool.

Roughly half of the researchers interviewed reported having documented protocols for file names, file backup, and file storage. Findings also suggest that there is ambiguity about what it means to share research data and confusion about terminology such as “repository” and “data deposit”. Many researchers equate publication to data sharing.

Conclusions

The interviews reveal significant data literacy gaps that present opportunities for library instruction in the areas of file organization, project workflow and documentation, metadata standards, and data deposit options.

The interviews also provide invaluable insight into biomedical and bioengineering research in general and contribute to the authors’ understanding of the challenges facing the researchers we strive to support.

URL : Assessing Data Management Support Needs of Bioengineering and Biomedical Research Faculty

Alternative location  : https://escholarship.umassmed.edu/jeslib/vol8/iss1/1/

 

Data objects and documenting scientific processes: An analysis of data events in biodiversity data papers

Authors : Kai Li, Jane Greenberg, Jillian Dunic

The data paper, an emerging scholarly genre, describes research datasets and is intended to bridge the gap between the publication of research data and scientific articles. Research examining how data papers report data events, such as data transactions and manipulations, is limited.

The research reported on in this paper addresses this limitation and investigated how data events are inscribed in data papers. A content analysis was conducted examining the full texts of 82 data papers, drawn from the curated list of data papers connected to the Global Biodiversity Information Facility (GBIF).

Data events recorded for each paper were organized into a set of 17 categories. Many of these categories are described together in the same sentence, which indicates the messiness of data events in the laboratory space.

The findings challenge the degrees to which data papers are a distinct genre compared to research papers and they describe data-centric research processes in a through way.

This paper also discusses how our results could inform a better data publication ecosystem in the future.

URL : Data objects and documenting scientific processes: An analysis of data events in biodiversity data papers

Alternative location : https://arxiv.org/abs/1903.06215