Open science in Spain: Influence of personal and contextual factors on deposit patterns

Author

Background

This study investigates factors influencing the deposit of academic publications and research data in open access repositories by Spanish researchers.

Methods

Using survey data from a sample of Spanish academics, the research examines the impact of personal attributes (e.g., gender, age, knowledge of open science) and contextual variables (e.g., academic discipline, institutional type) on deposit behaviours. Quantitative methods, including chi-square tests and regression analysis, reveal significant associations between knowledge of open science and deposit practices.

Results

Researchers familiar with open science principles were more likely to deposit multiple versions of articles and datasets, albeit with varying intensity. Key findings highlight disciplinary and institutional differences: researchers in Life Sciences and Experimental Sciences showed higher engagement with both article and data deposits, whereas Health Sciences lagged. Gender differences were also observed, with male researchers depositing articles and datasets more frequently than their female counterparts, though age showed limited impact. Public institutions exhibited lower data deposit rates despite mandates supporting open access.

Conclusions

The study underscores the need for tailored policies, including awareness campaigns, infrastructure investment, and discipline-specific strategies, to promote equitable and widespread adoption of open science practices. Findings contribute to understanding open science implementation, emphasizing the interplay of individual, institutional, and systemic factors.

URL : Open science in Spain: Influence of personal and contextual factors on deposit patterns

DOI : https://doi.org/10.12688/f1000research.160207.1

 

Data Makers and Users’ Views on Useful Paradata. Priorities in Documenting Data Creation, Curation, Manipulation and Use in Archaeology

Authors : Isto Huvila, Lisa Andersson, Olle Sköld, Ying-Hsang Liu

Understanding and making data (re)usable requires adequate documentation of the data but also information on how it has been created, curated, manipulated and used, termed in data documentation literature as paradata. This paper reports results of a survey study (N=91) of data creating and (re)using archaeologists’ views of what data creation, curation, manipulation and use related information (termed here as paradata) they consider important when they are working with data. Data makers’ and users’ perceptions align to a considerable degree.

It is important to have an explanation of the original general context of data creation and knowing the purpose, procedures and methods of data making, analysis and documentation. The findings underline that there is a need to continue developing and testing ideas how to capture and document paradata, and to find ways how to help data makers adopt proven practices to facilitate paradata making.

Simultaneously, it is crucial that the paradata aimed at facilitating data use is relevant for data users rather than, for instance, technical or administrative details considered useful primarily by data makers.

URL : Data Makers and Users’ Views on Useful Paradata. Priorities in Documenting Data Creation, Curation, Manipulation and Use in Archaeology

DOI : https://doi.org/10.2218/ijdc.v19i1.892

Research Data in Scientific Publications: A Cross-Field Analysis

Authors : Puyu Yang, Giovanni Colavizza

Data sharing is fundamental to scientific progress, enhancing transparency, reproducibility, and innovation across disciplines. Despite its growing significance, the variability of data-sharing practices across research fields remains insufficiently understood, limiting the development of effective policies and infrastructure.

This study investigates the evolving landscape of data-sharing practices, specifically focusing on the intentions behind data release, reuse, and referencing. Leveraging the PubMed open dataset, we developed a model to identify mentions of datasets in the full-text of publications. Our analysis reveals that data release is the most prevalent sharing mode, particularly in fields such as Commerce, Management, and the Creative Arts.

In contrast, STEM fields, especially the Biological and Agricultural Sciences, show significantly higher rates of data reuse. However, the humanities and social sciences are slower to adopt these practices. Notably, dataset referencing remains low across most disciplines, suggesting that datasets are not yet fully recognized as research outputs.

A temporal analysis highlights an acceleration in data releases after 2012, yet obstacles such as data discoverability and compatibility for reuse persist. Our findings can inform institutional and policy-level efforts to improve data-sharing practices, enhance dataset accessibility, and promote broader adoption of open science principles across research domains.

Arxiv : https://arxiv.org/abs/2502.01407

Researchers and Research Data: Improving and Incentivising Sharing and Archiving

Authors : Minna Ventsel, Beth Montague-Hellen

There has been a lot of discussion within the scientific community around the issues of reproducibility in research, with questions being raised about the integrity of research due to failure to reproduce or confirm the findings of some of the studies. Researchers need to adhere to the FAIR (findable, accessible, interoperable, and reusable) principles to contribute to collaborative and open science, but these open data principles can also support reproducibility and issues around ensuring data integrity.

This article uses observations and metrics from data sharing and research integrity related activities, undertaken by a Research Integrity and Data Specialist at the Francis Crick Institute, to discuss potential reasons behind a slow uptake of FAIR data practices. We then suggest solutions undertaken at the Francis Crick institute which can be followed by institutes and universities to improve the integrity of research from a data perspective.

One major solution discussed is the implementation of a data archive system at the Francis Crick Institute to ensure the integrity of data long term, comply with our funders’ data management requirements, and to safeguard our researchers against any potential research integrity allegations in the future.

URL : Researchers and Research Data: Improving and Incentivising Sharing and Archiving

DOI : https://doi.org/10.2218/v19i1.983

Evolution of the “long tail” concept for scientific data

Authors : Gretchen R. Stahlman, Inna Kouper

This review paper explores the evolution of discussions about “long-tail” scientific data in the scholarly literature. The “long-tail” concept, originally used to explain trends in digital consumer goods, was first applied to scientific data in 2007 to refer to a vast array of smaller, heterogeneous data collections that cumulatively represent a substantial portion of scientific knowledge. However, these datasets, often referred to as “long-tail data,” are frequently mismanaged or overlooked due to inadequate data management practices and institutional support.

This paper examines the changing landscape of discussions about long-tail data over time, situated within broader ecosystems of research data management and the natural interplay between “big” and “small” data.

The review also bridges discussions on data curation in Library & Information Science (LIS) and domain-specific contexts, contributing to a more comprehensive understanding of the long-tail concept’s utility for effective data management outcomes. The review aims to provide a more comprehensive understanding of this concept, its terminological diversity in the literature, and its utility for guiding data management, overall informing current and future information science research and practice.

Arxiv : https://arxiv.org/abs/2412.13307

When data sharing is an answer and when (often) it is not: Acknowledging data-driven, non-data, and data-decentered cultures

Authors : Isto HuvilaLuanne S. Sinnamon

Contemporary research and innovation policies and advocates of data-intensive research paradigms continue to urge increased sharing of research data. Such paradigms are underpinned by a pro-data, normative data culture that has become dominant in the contemporary discourse. Earlier research on research data sharing has directed little attention to its alternatives as more than a deficit. The present study aims to provide insights into researchers’ perspectives, rationales and practices of (non-)sharing of research data in relation to their research practices.

We address two research questions, (RQ1) what underpinning patterns can be identified in researchers’ (non-)sharing of research data, and (RQ2) how are attitudes and data-sharing linked to researchers’ general practices of conducting their research. We identify and describe data-decentered culture and non-data culture as alternatives and parallels to the data-driven culture, and describe researchers de-inscriptions of how they resist and appropriate predominant notions of data in their data practices by problematizing the notion of data, asserting exceptions to the general case of data sharing, and resisting or opting out from data sharing.

URL : When data sharing is an answer and when (often) it is not: Acknowledging data-driven, non-data, and data-decentered cultures

DOI : https://doi.org/10.1002/asi.24957

FAIR GPT: A virtual consultant for research data management in ChatGPT

Authors : Renat Shigapov, Irene Schumm

FAIR GPT is a first virtual consultant in ChatGPT designed to help researchers and organizations make their data and metadata compliant with the FAIR (Findable, Accessible, Interoperable, Reusable) principles. It provides guidance on metadata improvement, dataset organization, and repository selection.

To ensure accuracy, FAIR GPT uses external APIs to assess dataset FAIRness, retrieve controlled vocabularies, and recommend repositories, minimizing hallucination and improving precision. It also assists in creating documentation (data and software management plans, README files, and codebooks), and selecting proper licenses. This paper describes its features, applications, and limitations.

Arxiv : https://arxiv.org/abs/2410.07108