Does the Use of Unusual Combinations of Datasets Contribute to Greater Scientific Impact?

Authors : Yulin Yu, Daniel M. Romero

Scientific datasets play a crucial role in contemporary data-driven research, as they allow for the progress of science by facilitating the discovery of new patterns and phenomena. This mounting demand for empirical research raises important questions on how strategic data utilization in research projects can stimulate scientific advancement.

In this study, we examine the hypothesis inspired by the recombination theory, which suggests that innovative combinations of existing knowledge, including the use of unusual combinations of datasets, can lead to high-impact discoveries. We investigate the scientific outcomes of such atypical data combinations in more than 30,000 publications that leverage over 6,000 datasets curated within one of the largest social science databases, ICPSR.

This study offers four important insights. First, combining datasets, particularly those infrequently paired, significantly contributes to both scientific and broader impacts (e.g., dissemination to the general public). Second, the combination of datasets with atypically combined topics has the opposite effect — the use of such data is associated with fewer citations.

Third, younger and less experienced research teams tend to use atypical combinations of datasets in research at a higher frequency than their older and more experienced counterparts.

Lastly, despite the benefits of data combination, papers that amalgamate data remain infrequent. This finding suggests that the unconventional combination of datasets is an under-utilized but powerful strategy correlated with the scientific and broader impact of scientific discoveries.

URL : https://arxiv.org/abs/2402.05024

Agile Research Data Management with Open Source: LinkAhead

Authors : Daniel Hornung, Florian Spreckelsen, Thomas Weiß

Research data management (RDM) in academic scientific environments increasingly enters the focus as an important part of good scientific practice and as a topic with big potentials for saving time and money. Nevertheless, there is a shortage of appropriate tools, which fulfill the specific requirements in scientific research.

We identified where the requirements in science deviate from other fields and proposed a list of requirements which RDM software should answer to become a viable option. We analyzed a number of currently available technologies and tool categories for matching these requirements and identified areas where no tools can satisfy researchers’ needs.

Finally we assessed the open-source RDMS (research data management system) LinkAhead for compatibility with the proposed features and found that it fulfills the requirements in the area of semantic, flexible data handling in which other tools show weaknesses.

URL : Agile Research Data Management with Open Source: LinkAhead

DOI : https://doi.org/10.48694/inggrid.3866

Building a Trustworthy Data Repository: CoreTrustSeal Certification as a Lens for Service Improvements

Authors : Cara Key, Clara Llebot, Michael Boock

Objective

The university library aims to provide university researchers with a trustworthy institutional repository for sharing data. The library sought CoreTrustSeal certification in order to measure the quality of data services in the institutional repository, and to promote researchers’ confidence when depositing their work.

Methods

The authors served on a small team of library staff who collaborated to compose the certification application. They describe the self-assessment process, as they iterated through cycles of compiling information and responding to reviewer feedback.

Results

The application team gained understanding of data repository best practices, shared knowledge about the institutional repository, and identified areas of service improvements necessary to meet certification requirements. Based on the application and feedback, the team took measures to enhance preservation strategies, governance, and public-facing policies and documentation for the repository.

Conclusions

The university library gained a better understanding of top-notch data services and measurably improved these services by pursuing and obtaining CoreTrustSeal certification.

URL : Building a Trustworthy Data Repository: CoreTrustSeal Certification as a Lens for Service Improvements

DOI : https://doi.org/10.7191/jeslib.761

The Future of Data in Research Publishing: From Nice to Have to Need to Have?

Authors : Christine L. Borgman, Amy Brand

Science policy promotes open access to research data for purposes of transparency and reuse of data in the public interest. We expect demands for open data in scholarly publishing to accelerate, at least partly in response to the opacity of artificial intelligence algorithms.

Open data should be findable, accessible, interoperable, and reusable (FAIR), and also trustworthy and verifiable. The current state of open data in scholarly publishing is in transition from ‘nice to have’ to ‘need to have.’

Research data are valuable, interpretable, and verifiable only in context of their origin, and with sufficient infrastructure to facilitate reuse. Making research data useful is expensive; benefits and costs are distributed unevenly.

Open data also poses risks for provenance, intellectual property, misuse, and misappropriation in an era of trolls and hallucinating AI algorithms. Scholars and scholarly publishers must make evidentiary data more widely available to promote public trust in research.

To make research processes more trustworthy, transparent, and verifiable, stakeholders need to make greater investments in data stewardship and knowledge infrastructures.

DOI : https://doi.org/10.1162/99608f92.b73aae77

More than data repositories: perceived information needs for the development of social sciences and humanities research infrastructures

Authors : Anna Sendra, Elina Late, Sanna Kumpulainen

Introduction

The digitalization of social sciences and humanities research necessitates research infrastructures. However, this transformation is still incipient, highlighting the need to better understand how to successfully support data-intensive research.

Method

Starting from a case study of building a national infrastructure for conducting data-intensive research, this study aims to understand the information needs of digital researchers regarding the facility and explore the importance of evaluation in its development.

Analysis

Thirteen semi-structured interviews with social sciences and humanities scholars and computer and data scientists processed through a thematic analysis revealed three themes (developing a research infrastructure, needs and expectations of the research infrastructure, and an approach to user feedback and user interactions).

Results

Findings reveal that developing an infrastructure for conducting data-intensive research is a complicated task influenced by contrasting information needs between social sciences and humanities scholars and computer and data scientists, such as the demand for increased support of the former. Findings also highlight the limited role of evaluation in its creation.

Conclusions

The development of infrastructures for conducting data-intensive research requires further discussion that particularly considers the disciplinary differences between social sciences and humanities scholars and computer and data scientists. Suggestions on how to better design this kind of facilities are also raised.

URL : More than data repositories: perceived information needs for the development of social sciences and humanities research infrastructures

DOI : https://doi.org/10.47989/ir284598

Establishing an early indicator for data sharing and reuse

Authors : Agata Piękniewska, Laurel L. Haak, Darla Henderson, Katherine McNeill, Anita Bandrowski, Yvette Seger

Funders, publishers, scholarly societies, universities, and other stakeholders need to be able to track the impact of programs and policies designed to advance data sharing and reuse. With the launch of the NIH data management and sharing policy in 2023, establishing a pre-policy baseline of sharing and reuse activity is critical for the biological and biomedical community.

Toward this goal, we tested the utility of mentions of research resources, databases, and repositories (RDRs) as a proxy measurement of data sharing and reuse. We captured and processed text from Methods sections of open access biological and biomedical research articles published in 2020 and 2021 and made available in PubMed Central.

We used natural language processing to identify text strings to measure RDR mentions. In this article, we demonstrate our methodology, provide normalized baseline data sharing and reuse activity in this community, and highlight actions authors and publishers can take to encourage data sharing and reuse practices.

URL : Establishing an early indicator for data sharing and reuse

DOI : https://doi.org/10.1002/leap.1586

Enquête quantitative sur les pratiques et les besoins des chercheurs sur la gestion des données de la recherche, algorithmes et codes sources dans les établissements du site toulousain

Authors : Danielle Brunet, Soraya Demay, Pierre Diaz, Borbala Goncz, Laure Leclerc, Flora Poupinot, Sibilla Michelle

Le Comité de réflexion pour le partage et la valorisation des données de la recherche et la coordination de la Science Ouverte (CéSO) de l’Université de Toulouse a réalisé une enquête quantitative sur la gestion des données de la recherche, algorithmes et codes sources.

Adressée à l’ensemble de la communauté scientifique du site toulousain, son objectif était de produire un état des lieux des pratiques, des connaissances et des besoins des chercheurs en matière de gestion des données de la recherche. Les résultats permettront de préciser l’offre de services proposée sur le site toulousain.

Cette enquête concerne les établissements membres de l’Université de Toulouse ainsi que les organismes de recherche partenaires : Université Toulouse Capitole, Université Toulouse – Jean Jaurès, Université Toulouse III – Paul Sabatier, Institut national polytechnique de Toulouse (Toulouse INP), Institut national des sciences appliquées de Toulouse (INSA Toulouse), Institut supérieur de l’aéronautique et de l’espace (ISAE-SUPAERO), Institut national universitaire Champollion (INU Champollion), École nationale de l’aviation civile (ENAC), École nationale d’ingénieurs de Tarbes (ENIT), École nationale supérieure d’architecture de Toulouse (ENSA Toulouse), École nationale vétérinaire de Toulouse (ENVT), École nationale supérieure de formation de l’enseignement agricole (ENSFEA), Institut catholique d’arts et métiers (ICAM), École nationale supérieure des mines d’Albi-Carmaux (IMT Mines d’Albi), Toulouse Business School (TBS), Centre national d’études spatiales (CNES), Centre national de la recherche scientifique (CNRS), Institut national de recherche pour l’agriculture, l’alimentation et l’environnement (INRAE), Institut national de l’a santé et de la recherche médicale (Inserm), Institut de recherche pour le développement (IRD) ; Office national d’études et de recherche aérospatiales (Onera), Météo-France.

URL : Enquête quantitative sur les pratiques et les besoins des chercheurs sur la gestion des données de la recherche, algorithmes et codes sources dans les établissements du site toulousain

Original location : https://ut3-toulouseinp.hal.science/hal-04262708v1/