Research Data Management Challenges in Citizen Science Projects and Recommendations for Library Support Services. A Scoping Review and Case Study

Authors: Jitka Stilund Hansen, Signe Gadegaard, Karsten Kryger Hansen, Asger Væring Larsen, Søren Møller, Gertrud Stougård Thomsen, Katrine Flindt Holmstrand

Citizen science (CS) projects are part of a new era of data aggregation and harmonisation that facilitates interconnections between different datasets. Increasing the value and reuse of CS data has received growing attention with the appearance of the FAIR principles and systematic research data management (RDM) practises, which are often promoted by university libraries.

However, RDM initiatives in CS appear diversified and if CS have special needs in terms of RDM is unclear. Therefore, the aim of this article is firstly to identify RDM challenges for CS projects and secondly, to discuss how university libraries may support any such challenges.

A scoping review and a case study of Danish CS projects were performed to identify RDM challenges. 48 articles were selected for data extraction. Four academic project leaders were interviewed about RDM practices in their CS projects.

Challenges and recommendations identified in the review and case study are often not specific for CS. However, finding CS data, engaging specific populations, attributing volunteers and handling sensitive data including health data are some of the challenges requiring special attention by CS project managers. Scientific requirements or national practices do not always encompass the nature of CS projects.

Based on the identified challenges, it is recommended that university libraries focus their services on 1) identifying legal and ethical issues that the project managers should be aware of in their projects, 2) elaborating these issues in a Terms of Participation that also specifies data handling and sharing to the citizen scientist, and 3) motivating the project manager to good data handling practises.

Adhering to the FAIR principles and good RDM practices in CS projects will continuously secure contextualisation and data quality. High data quality increases the value and reuse of the data and, therefore, the empowerment of the citizen scientists.

URL : Research Data Management Challenges in Citizen Science Projects and Recommendations for Library Support Services. A Scoping Review and Case Study

DOI : http://doi.org/10.5334/dsj-2021-025

Visual Summary Identification From Scientific Publications via Self-Supervised Learning

Authors : Shintaro Yamamoto, Anne Lauscher, Simone Paolo Ponzetto, Goran Glavaš, Shigeo Morishima

The exponential growth of scientific literature yields the need to support users to both effectively and efficiently analyze and understand the some body of research work. This exploratory process can be facilitated by providing graphical abstracts–a visual summary of a scientific publication.

Accordingly, previous work recently presented an initial study on automatic identification of a central figure in a scientific publication, to be used as the publication’s visual summary.

This study, however, have been limited only to a single (biomedical) domain. This is primarily because the current state-of-the-art relies on supervised machine learning, typically relying on the existence of large amounts of labeled data: the only existing annotated data set until now covered only the biomedical publications.

In this work, we build a novel benchmark data set for visual summary identification from scientific publications, which consists of papers presented at conferences from several areas of computer science. We couple this contribution with a new self-supervised learning approach to learn a heuristic matching of in-text references to figures with figure captions.

Our self-supervised pre-training, executed on a large unlabeled collection of publications, attenuates the need for large annotated data sets for visual summary identification and facilitates domain transfer for this task. We evaluate our self-supervised pretraining for visual summary identification on both the existing biomedical and our newly presented computer science data set.

The experimental results suggest that the proposed method is able to outperform the previous state-of-the-art without any task-specific annotations.

URL : Visual Summary Identification From Scientific Publications via Self-Supervised Learning

DOI : https://doi.org/10.3389/frma.2021.719004

Open science, the replication crisis, and environmental public health

Author : Daniel J. Hicks

Concerns about a crisis of mass irreplicability across scientific fields (“the replication crisis”) have stimulated a movement for open science, encouraging or even requiring researchers to publish their raw data and analysis code.

Recently, a rule at the US Environmental Protection Agency (US EPA) would have imposed a strong open data requirement. The rule prompted significant public discussion about whether open science practices are appropriate for fields of environmental public health.

The aims of this paper are to assess (1) whether the replication crisis extends to fields of environmental public health; and (2) in general whether open science requirements can address the replication crisis.

There is little empirical evidence for or against mass irreplicability in environmental public health specifically. Without such evidence, strong claims about whether the replication crisis extends to environmental public health – or not – seem premature.

By distinguishing three concepts – reproducibility, replicability, and robustness – it is clear that open data initiatives can promote reproducibility and robustness but do little to promote replicability.

I conclude by reviewing some of the other benefits of open science, and offer some suggestions for funding streams to mitigate the costs of adoption of open science practices in environmental public health.

URL : Open science, the replication crisis, and environmental public health

DOI : https://doi.org/10.1080/08989621.2021.1962713

Citizen-driven participatory research conducted through knowledge intermediary units. A thematic synthesis of the literature on “Science Shops

Authors : Anne-Sophie Gresle, Eduardo Urias, Rosario Scandurra, Bálint Balázs, Irene Jimeno, Leonardo de la Torre Ávila, Maria Jesus Pinazo

A Science Shop acts as a mission-oriented intermediary unit between the scientific sphere and civil society organizations. It seeks to facilitate citizen-driven open science projects that respond to the needs of civil society organizations and which, typically, include students in the work process.

We performed a thematic analysis of a systematically selected literature on Science Shops to understand how the scientific literature reflects the historical evolution of Science Shops in different settings and what factors the literature associates with the rise and fall of the Science Shop.

We used the PRISMA methodology to search for scientific papers in indexed journals in eight databases published in English, French and Spanish, and employed the thematic theory approach to extract and systematize our results. Twenty-six scientific articles met the inclusion criteria.

We identified three meta-categories and ten sub-topics which can serve as key pointers to guide the set-up and future work of Science Shops. Our results identify a major paradox: Science Shops incorporate public values in their scientific agendas but have difficulties sustaining themselves institutionally as they do not fit the current dominant research paradigm. Science shops represent a persuasive complementary approach to the way science is defined, executed and produced today.

URL : Citizen-driven participatory research conducted through knowledge intermediary units. A thematic synthesis of the literature on “Science Shops

DOI : https://doi.org/10.22323/2.20050202

How Long Can We Build It? Ensuring Usability of a Scientific Code Base

Authors : Klaus Rechert, Jurek Oberhauser, Rafael Gieschke

Software and in particular source code became an important component of scientific publications and henceforth is now subject of research data management. Maintaining source code such that it remains a usable and a valuable scientific contribution is and remains a huge task.

Not all code contributions can be actively maintained forever. Eventually, there will be a significant backlog of legacy source-code. In this article we analyse the requirements for applying the concept of long-term reusability to source code.

We use simple case study to identify gaps and provide a technical infrastructure based on emulator to support automated builds of historic software in form of source code.

URL : How Long Can We Build It? Ensuring Usability of a Scientific Code Base

DOI : https://doi.org/10.2218/ijdc.v16i1.770

Clinical trial transparency and data sharing among biopharmaceutical companies and the role of company size, location and product type: a cross-sectional descriptive analysis

Authors : Sydney A Axson, Michelle M Mello, Deborah Lincow, Catherine Yang, Cary P Gross, Joseph S Ross, Jennifer Miller

Objectives

To examine company characteristics associated with better transparency and to apply a tool used to measure and improve clinical trial transparency among large companies and drugs, to smaller companies and biologics.

Design

Cross-sectional descriptive analysis.

Setting and participants

Novel drugs and biologics Food and Drug Administration (FDA) approved in 2016 and 2017 and their company sponsors.

Main outcome measures

Using established Good Pharma Scorecard (GPS) measures, companies and products were evaluated on their clinical trial registration, results dissemination and FDA Amendments Act (FDAAA) implementation; companies were ranked using these measures and a multicomponent data sharing measure.

Associations between company transparency scores with company size (large vs non-large), location (US vs non-US) and sponsored product type (drug vs biologic) were also examined.

Results

26% of products (16/62) had publicly available results for all clinical trials supporting their FDA approval and 67% (39/58) had public results for trials in patients by 6 months after their FDA approval; 58% (32/55) were FDAAA compliant.

Large companies were significantly more transparent than non-large companies (overall median transparency score of 95% (IQR 91–100) vs 59% (IQR 41–70), p<0.001), attributable to higher FDAAA compliance (median of 100% (IQR 88–100) vs 57% (0–100), p=0.01) and better data sharing (median of 100% (IQR 80–100) vs 20% (IQR 20–40), p<0.01). No significant differences were observed by company location or product type.

Conclusions

It was feasible to apply the GPS transparency measures and ranking tool to non-large companies and biologics. Large companies are significantly more transparent than non-large companies, driven by better data sharing procedures and implementation of FDAAA trial reporting requirements.

Greater research transparency is needed, particularly among non-large companies, to maximise the benefits of research for patient care and scientific innovation.

URL : Clinical trial transparency and data sharing among biopharmaceutical companies and the role of company size, location and product type: a cross-sectional descriptive analysis

DOI : http://dx.doi.org/10.1136/bmjopen-2021-053248

Why Open Access: Economics and Business Researchers’ Perspectives

Authors : Carmen López-Vergara, Pilar Flores Asenjo, Alfonso Rosa-García

Public research policies have been promoting open-access publication in recent years as an adequate model for the dissemination of scientific knowledge. However, depending on the disciplines, its use is very diverse.

This study explores the determinants of open-access publication among academic researchers of economics and business, as well as their assessment of different economic measures focused on publication stimulus.

To do so, a survey of Spanish business and economics researchers was conducted. They reported an average of 19% of their publications in open-access journals, hybrids or fully Gold Route open access. Almost 80% of the researchers foresee a future increase in the volume of open-access publications.

When determining where to publish their research results, the main criterion for the selection of a scientific journal is the impact factor. Regarding open access, the most valued aspect is the visibility and dissemination it provides.

Although the cost of publication is not the most relevant criterion in the choice of a journal, three out of four researchers consider that a reduction in fees and an increase in funding are measures that would boost the open-access model.

URL : Why Open Access: Economics and Business Researchers’ Perspectives

DOI : https://doi.org/10.3390/publications9030037