Étiquette : data repositories

Scientific production on data repositories and open science published in the Web of Science database: Methodi Ordinatio and content analysis

Auteur de l’article Par Hans Dillaerts
Date de l’article 22 décembre 2025

Authors : Sinval Adalberto Rodrigues-Junior, Marcelo Votto Texeira

The opening of scientific data proposed by the Open Science movement presupposes careful planning for data collection, organization, and treatment, aiming at their sharing, accessibility, and reuse. Data repositories have been conceived as structures necessary to enable open access to data.

This study aimed to analyze the influence of data repositories on the disclosure and sharing of scientific data proposed by the Open Science movement. The Methodi Ordinatio, developed to organize a portfolio of scientific publications, was adopted to analyze the subject of ‘Data Repositories’ and ‘Open Science’.

The studies were ranked using the InOrdinatio index, and the 15 best ranked studies were included and analyzed through Bardin’s content analysis. Most studies describe the structure involved in data repositories within the biological, chemical, and health areas.

Other studies addressed data reuse, data organization and analysis processes and tools, as well as data selection and classification algorithms. The units of analysis selected for the content analysis were categorized as open access, information technologies, data processing, and information retrieval.

Systems (processes and structures), metadata standards, ontologies, semantic web, data types, and their management were addressed by these studies. It is concluded that open data repositories are growing rapidly. Production with the greatest impact has occurred in the biological and biomedical/health areas, highlighting the structure involved in repositories within these fields.

Data repositories provide systems for depositing, managing, searching, accessing, and reusing data based on processes and technologies — often developed as open-source software — in alignment with the proposed Open Science model.

URL : Scientific production on data repositories and open science published in the Web of Science database: Methodi Ordinatio and content analysis

DOI : https://doi.org/10.1590/2318-0889202537e2513075

Étiquettes data repositories, Marcelo Votto Texeira, open access, open science, scientific data, Sinval Adalberto Rodrigues-Junior

A decentralized future for the open-science databases

Auteur de l’article Par Hans Dillaerts
Date de l’article 24 septembre 2025

Authors : Gaurav Sharma, Viorel Munteanu, Nika Mansouri Ghiasi, Jineta Banerjee, Susheel Varma, Luca Foschini, Kyle Ellrott, Onur Mutlu, Dumitru Ciorbă, Roel A. Ophoff, Viorel Bostan, Christopher E Mason, Jason H. Moore, Despoina Sousoni, Arunkumar Krishnan, Christopher E. Mason, Mihai Dimian, Gustavo Stolovitzky, Fabio G. Liberante, Taras K. Oleksyk, Serghei Mangul

Continuous and reliable access to curated biological data repositories is indispensable for accelerating rigorous scientific inquiry and fostering reproducible research. Centralized repositories, though widely used, are vulnerable to single points of failure arising from cyberattacks, technical faults, natural disasters, or funding and political uncertainties.

This can lead to widespread data unavailability, data loss, integrity compromises, and substantial delays in critical research, ultimately impeding scientific progress. Centralizing essential scientific resources in a single geopolitical or institutional hub is inherently dangerous, as any disruption can paralyze diverse ongoing research.

The rapid acceleration of data generation, combined with an increasingly volatile global landscape, necessitates a critical re-evaluation of the sustainability of centralized models. Implementing federated and decentralized architectures presents a compelling and future-oriented pathway to substantially strengthen the resilience of scientific data infrastructures, thereby mitigating vulnerabilities and ensuring the long-term integrity of data.

Here, we examine the structural limitations of centralized repositories, evaluate federated and decentralized models, and propose a hybrid framework for resilient, FAIR, and sustainable scientific data stewardship. Such an approach offers a significant reduction in exposure to governance instability, infrastructural fragility, and funding volatility, and also fosters fairness and global accessibility.

The future of open science depends on integrating these complementary approaches to establish a globally distributed, economically sustainable, and institutionally robust infrastructure that safeguards scientific data as a public good, further ensuring continued accessibility, interoperability, and preservation for generations to come.

DOI : https://doi.org/10.48550/arXiv.2509.19206

Open science in Spain: Influence of personal and contextual factors on deposit patterns

Auteur de l’article Par Hans Dillaerts
Date de l’article 16 février 2025

Author : Daniel de Gracia Palomera

Background

This study investigates factors influencing the deposit of academic publications and research data in open access repositories by Spanish researchers.

Methods

Using survey data from a sample of Spanish academics, the research examines the impact of personal attributes (e.g., gender, age, knowledge of open science) and contextual variables (e.g., academic discipline, institutional type) on deposit behaviours. Quantitative methods, including chi-square tests and regression analysis, reveal significant associations between knowledge of open science and deposit practices.

Results

Researchers familiar with open science principles were more likely to deposit multiple versions of articles and datasets, albeit with varying intensity. Key findings highlight disciplinary and institutional differences: researchers in Life Sciences and Experimental Sciences showed higher engagement with both article and data deposits, whereas Health Sciences lagged. Gender differences were also observed, with male researchers depositing articles and datasets more frequently than their female counterparts, though age showed limited impact. Public institutions exhibited lower data deposit rates despite mandates supporting open access.

Conclusions

The study underscores the need for tailored policies, including awareness campaigns, infrastructure investment, and discipline-specific strategies, to promote equitable and widespread adoption of open science practices. Findings contribute to understanding open science implementation, emphasizing the interplay of individual, institutional, and systemic factors.

URL : Open science in Spain: Influence of personal and contextual factors on deposit patterns

DOI : https://doi.org/10.12688/f1000research.160207.1

Étiquettes Daniel de Gracia Palomera, data repositories, data sharing, open repositories, Open Science practices, research data, scientific practices, self-archiving, Spain

FAIRness of Research Data in the European Humanities Landscape

Auteur de l’article Par Hans Dillaerts
Date de l’article 5 mars 2024

Authors : Ljiljana Poljak Bilić, Kristina Posavec

This paper explores the landscape of research data in the humanities in the European context, delving into their diversity and the challenges of defining and sharing them. It investigates three aspects: the types of data in the humanities, their representation in repositories, and their alignment with the FAIR principles (Findable, Accessible, Interoperable, Reusable).

By reviewing datasets in repositories, this research determines the dominant data types, their openness, licensing, and compliance with the FAIR principles. This research provides important insight into the heterogeneous nature of humanities data, their representation in the repository, and their alignment with FAIR principles, highlighting the need for improved accessibility and reusability to improve the overall quality and utility of humanities research data.

URL : FAIRness of Research Data in the European Humanities Landscape

DOI : https://doi.org/10.3390/publications12010006

Étiquettes data repositories, dataset, FAIR Principles, Humanities, Kristina Posavec, Ljiljana Poljak Bilić, openness, research data

Building a Trustworthy Data Repository: CoreTrustSeal Certification as a Lens for Service Improvements

Auteur de l’article Par Hans Dillaerts
Date de l’article 12 janvier 2024

Authors : Cara Key, Clara Llebot, Michael Boock

Objective

The university library aims to provide university researchers with a trustworthy institutional repository for sharing data. The library sought CoreTrustSeal certification in order to measure the quality of data services in the institutional repository, and to promote researchers’ confidence when depositing their work.

Methods

The authors served on a small team of library staff who collaborated to compose the certification application. They describe the self-assessment process, as they iterated through cycles of compiling information and responding to reviewer feedback.

Results

The application team gained understanding of data repository best practices, shared knowledge about the institutional repository, and identified areas of service improvements necessary to meet certification requirements. Based on the application and feedback, the team took measures to enhance preservation strategies, governance, and public-facing policies and documentation for the repository.

Conclusions

The university library gained a better understanding of top-notch data services and measurably improved these services by pursuing and obtaining CoreTrustSeal certification.

URL : Building a Trustworthy Data Repository: CoreTrustSeal Certification as a Lens for Service Improvements

DOI : https://doi.org/10.7191/jeslib.761

Étiquettes Cara Key, certification, Clara Llebot, CoreTrustSeal, data repositories, data sharing, institutional repositories, Michael Boock, research data

More than data repositories: perceived information needs for the development of social sciences and humanities research infrastructures

Auteur de l’article Par Hans Dillaerts
Date de l’article 18 décembre 2023

Authors : Anna Sendra, Elina Late, Sanna Kumpulainen

Introduction

The digitalization of social sciences and humanities research necessitates research infrastructures. However, this transformation is still incipient, highlighting the need to better understand how to successfully support data-intensive research.

Method

Starting from a case study of building a national infrastructure for conducting data-intensive research, this study aims to understand the information needs of digital researchers regarding the facility and explore the importance of evaluation in its development.

Analysis

Thirteen semi-structured interviews with social sciences and humanities scholars and computer and data scientists processed through a thematic analysis revealed three themes (developing a research infrastructure, needs and expectations of the research infrastructure, and an approach to user feedback and user interactions).

Results

Findings reveal that developing an infrastructure for conducting data-intensive research is a complicated task influenced by contrasting information needs between social sciences and humanities scholars and computer and data scientists, such as the demand for increased support of the former. Findings also highlight the limited role of evaluation in its creation.

Conclusions

The development of infrastructures for conducting data-intensive research requires further discussion that particularly considers the disciplinary differences between social sciences and humanities scholars and computer and data scientists. Suggestions on how to better design this kind of facilities are also raised.

URL : More than data repositories: perceived information needs for the development of social sciences and humanities research infrastructures

DOI : https://doi.org/10.47989/ir284598

Étiquettes Anna Sendra, data repositories, Elina Late, research data, research data management, Sanna Kumpulainen, scientific practices, Social Sciences and Humanities

Data Quality Assurance at Research Data Repositories

Auteur de l’article Par Hans Dillaerts
Date de l’article 23 novembre 2022

Authors : Maxi Kindling, Dorothea Strecker

This paper presents findings from a survey on the status quo of data quality assurance practices at research data repositories. The personalised online survey was conducted among repositories indexed in re3data in 2021. It covered the scope of the repository, types of data quality assessment, quality criteria, responsibilities, details of the review process, and data quality information and yielded 332 complete responses.

The results demonstrate that most repositories perform data quality assurance measures, and overall, research data repositories significantly contribute to data quality. Quality assurance at research data repositories is multifaceted and nonlinear, and although there are some common patterns, individual approaches to ensuring data quality are diverse.

The survey showed that data quality assurance sets high expectations for repositories and requires a lot of resources. Several challenges were discovered: for example, the adequate recognition of the contribution of data reviewers and repositories, the path dependence of data review on review processes for text publications, and the lack of data quality information. The study could not confirm that the certification status of a repository is a clear indicator of whether a repository conducts in-depth quality assurance.

URL : Data Quality Assurance at Research Data Repositories

DOI : http://doi.org/10.5334/dsj-2022-018

Étiquettes data repositories, Dorothea Strecker, Maxi Kindling, research data