Data Management Plans: Implications for Automated Analyses

Authors : Ngoc-Minh Pham, Heather Moulaison-Sandy, Bradley Wade Bishop, Hannah Gunderman

Data management plans (DMPs) are an essential part of planning data-driven research projects and ensuring long-term access and use of research data and digital objects; however, as text-based documents, DMPs must be analyzed manually for conformance to funder requirements.

This study presents a comparison of DMPs evaluations for 21 funded projects using 1) an automated means of analysis to identify elements that align with best practices in support of open research initiatives and 2) a manually-applied scorecard measuring these same elements.

The automated analysis revealed that terms related to availability (90% of DMPs), metadata (86% of DMPs), and sharing (81% of DMPs) were reliably supplied. Manual analysis revealed 86% (n = 18) of funded DMPs were adequate, with strong discussions of data management personnel (average score: 2 out of 2), data sharing (average score 1.83 out of 2), and limitations to data sharing (average score: 1.65 out of 2).

This study reveals that the automated approach to DMP assessment yields less granular yet similar results to manual assessments of the DMPs that are more efficiently produced. Additional observations and recommendations are also presented to make data management planning exercises and automated analysis even more useful going forward.

URL : Data Management Plans: Implications for Automated Analyses

DOI : http://doi.org/10.5334/dsj-2023-002

Prise de décision par les pouvoirs publics et partage des données de la recherche, une approche par le risque

Autrice/Author : Fleur Nadine Ndjock

Si la recherche scientifique est prioritairement financée par l’État au travers des dotations budgétaires, il est logique que les résultats de cette recherche aide (contribue) à la prise de décision efficace par les pouvoirs publics pour le développement d’un pays.

Pour ce faire, les données issues de la recherche doivent être partagées s’il est vrai que pour décider, l’on a besoin d’informations et les données de la recherche qu’elles soient d’observation, expérimentales ou de simulation, sont importantes dans le processus décisionnel stratégique.

Cet article vise un double objectif : Il s’agit d’une part d’établir une typologie des risques qu’encourt le partage de données de la recherche avec les pouvoirs publics, mais aussi, de questionner les concepts à mobiliser pour rendre compte des enjeux de ces risques, car ils sont déterminants et peuvent influencer les motivations de leur partage.

DOI : https://doi.org/10.4000/ctd.8301

 

An iterative and interdisciplinary categorisation process towards FAIRer digital resources for sensitive life-sciences data

Authors : Romain David, Christian Ohmann, Jan‑Willem Boiten, Mónica Cano Abadía, Florence Bietrix, Steve Canham, Maria Luisa Chiusano, Walter Dastrù, Arnaud Laroquette, Dario Longo, Michaela Th. Mayrhofer, Maria Panagiotopoulou, Audrey S. Richard, Sergey Goryanin, Pablo Emilio Verde

For life science infrastructures, sensitive data generate an additional layer of complexity. Cross-domain categorisation and discovery of digital resources related to sensitive data presents major interoperability challenges. To support this FAIRification process, a toolbox demonstrator aiming at support for discovery of digital objects related to sensitive data (e.g., regulations, guidelines, best practice, tools) has been developed.

The toolbox is based upon a categorisation system developed and harmonised across a cluster of 6 life science research infrastructures. Three different versions were built, tested by subsequent pilot studies, finally leading to a system with 7 main categories (sensitive data type, resource type, research field, data type, stage in data sharing life cycle, geographical scope, specific topics).

109 resources attached with the tags in pilot study 3 were used as the initial content for the toolbox demonstrator, a software tool allowing searching of digital objects linked to sensitive data with filtering based upon the categorisation system.

Important next steps are a broad evaluation of the usability and user-friendliness of the toolbox, extension to more resources, broader adoption by different life-science communities, and a long-term vision for maintenance and sustainability.

URL : An iterative and interdisciplinary categorisation process towards FAIRer digital resources for sensitive life-sciences data

DOI : https://doi.org/10.1038/s41598-022-25278-z

Data Quality Assurance at Research Data Repositories

Authors : Maxi Kindling, Dorothea Strecker

This paper presents findings from a survey on the status quo of data quality assurance practices at research data repositories. The personalised online survey was conducted among repositories indexed in re3data in 2021. It covered the scope of the repository, types of data quality assessment, quality criteria, responsibilities, details of the review process, and data quality information and yielded 332 complete responses.

The results demonstrate that most repositories perform data quality assurance measures, and overall, research data repositories significantly contribute to data quality. Quality assurance at research data repositories is multifaceted and nonlinear, and although there are some common patterns, individual approaches to ensuring data quality are diverse.

The survey showed that data quality assurance sets high expectations for repositories and requires a lot of resources. Several challenges were discovered: for example, the adequate recognition of the contribution of data reviewers and repositories, the path dependence of data review on review processes for text publications, and the lack of data quality information. The study could not confirm that the certification status of a repository is a clear indicator of whether a repository conducts in-depth quality assurance.

URL : Data Quality Assurance at Research Data Repositories

DOI : http://doi.org/10.5334/dsj-2022-018

Putting FAIR principles in the context of research information: FAIRness for CRIS and CRIS for FAIRness

Authors : Otmane Azeroual, Joachim Schöpfel, Janne Pölönen, Anastasija Nikiforova

Digitization in the research domain refers to the increasing integration and analysis of research information in the process of research data management. However, it is not clear whether it is used and, more importantly, whether the data are of sufficient quality, and value and knowledge could be extracted from them.

FAIR principles (Findability, Accessibility, Interoperability, Reusability) represent a promising asset to achieve this. Since their publication, they have rapidly proliferated and have become part of (inter-)national research funding programs.

A special feature of the FAIR principles is the emphasis on the legibility, readability, and understandability of data. At the same time, they pose a prerequisite for data for their reliability, trustworthiness, and quality. In this sense, the importance of applying FAIR principles to research information and respective systems such as Current Research Information Systems (CRIS), which is an underrepresented subject for research, is the subject of the paper.

Supporting the call for the need for a ”one-stop-shop and register-onceuse-many approach”, we argue that CRIS is a key component of the research infrastructure landscape, directly targeted and enabled by operational application and the promotion of FAIR principles.

We hypothesize that the improvement of FAIRness is a bidirectional process, where CRIS promotes FAIRness of data and infrastructures, and FAIR principles push further improvements to the underlying CRIS.

URL https://hal.archives-ouvertes.fr/hal-03836525

Deep Impact: A Study on the Impact of Data Papers and Datasets in the Humanities and Social Sciences

Authors : Barbara McGillivray, Paola Marongiu, Nilo Pedrazzini, Marton Ribary, Mandy Wigdorowitz, Eleonora Zordan

The humanities and social sciences (HSS) have recently witnessed an exponential growth in data-driven research. In response, attention has been afforded to datasets and accompanying data papers as outputs of the research and dissemination ecosystem.

In 2015, two data journals dedicated to HSS disciplines appeared in this landscape: Journal of Open Humanities Data (JOHD) and Research Data Journal for the Humanities and Social Sciences (RDJ).

In this paper, we analyse the state of the art in the landscape of data journals in HSS using JOHD and RDJ as exemplars by measuring performance and the deep impact of data-driven projects, including metrics (citation count; Altmetrics, views, downloads, tweets) of data papers in relation to associated research papers and the reuse of associated datasets.

Our findings indicate: that data papers are published following the deposit of datasets in a repository and usually following research articles; that data papers have a positive impact on both the metrics of research papers associated with them and on data reuse; and that Twitter hashtags targeted at specific research campaigns can lead to increases in data papers’ views and downloads.

HSS data papers improve the visibility of datasets they describe, support accompanying research articles, and add to transparency and the open research agenda.

URL : Deep Impact: A Study on the Impact of Data Papers and Datasets in the Humanities and Social Sciences

DOI : https://doi.org/10.3390/publications10040039