Dataset Artefacts are the Hidden Drivers of the Declining Disruptiveness in Science

Authors : Vincent Holst, Andres Algaba, Floriano Tori, Sylvia Wenmackers, Vincent Ginis

Park et al. [1] reported a decline in the disruptiveness of scientific and technological knowledge over time. Their main finding is based on the computation of CD indices, a measure of disruption in citation networks [2], across almost 45 million papers and 3.9 million patents.

Due to a factual plotting mistake, database entries with zero references were omitted in the CD index distributions, hiding a large number of outliers with a maximum CD index of one, while keeping them in the analysis [1]. Our reanalysis shows that the reported decline in disruptiveness can be attributed to a relative decline of these database entries with zero references. Notably, this was not caught by the robustness checks included in the manuscript.

The regression adjustment fails to control for the hidden outliers as they correspond to a discontinuity in the CD index. Proper evaluation of the Monte-Carlo simulations reveals that, because of the preservation of the hidden outliers, even random citation behaviour replicates the observed decline in disruptiveness.

Finally, while these papers and patents with supposedly zero references are the hidden drivers of the reported decline, their source documents predominantly do make references, exposing them as pure dataset artefacts.

URL : Dataset Artefacts are the Hidden Drivers of the Declining Disruptiveness in Science

DOI : https://zenodo.org/doi/10.5281/zenodo.10656940

From Data Creator to Data Reuser: Distance Matters

Authors : Christine L. Borgman, Paul T. Groth

Sharing research data is complex, labor-intensive, expensive, and requires infrastructure investments by multiple stakeholders. Open science policies focus on data release rather than on data reuse, yet reuse is also difficult, expensive, and may never occur. Investments in data management could be made more wisely by considering who might reuse data, how, why, for what purposes, and when.

Data creators cannot anticipate all possible reuses or reusers; our goal is to identify factors that may aid stakeholders in deciding how to invest in research data, how to identify potential reuses and reusers, and how to improve data exchange processes.

Drawing upon empirical studies of data sharing and reuse, we develop the theoretical construct of distance between data creator and data reuser, identifying six distance dimensions that influence the ability to transfer knowledge effectively: domain, methods, collaboration, curation, purposes, and time and temporality.

These dimensions are primarily social in character, with associated technical aspects that can decrease – or increase – distances between creators and reusers. We identify the order of expected influence on data reuse and ways in which the six dimensions are interdependent.

Our theoretical framing of the distance between data creators and prospective reusers leads to recommendations to four categories of stakeholders on how to make data sharing and reuse more effective: data creators, data reusers, data archivists, and funding agencies.

URL : From Data Creator to Data Reuser: Distance Matters

arXiv : https://arxiv.org/abs/2402.07926

Additional experiments required: A scoping review of recent evidence on key aspects of Open Peer Review

Authors : Tony Ross-Hellauer, Serge P.J.M. Horbach

Diverse efforts are underway to reform the journal peer review system. Combined with growing interest in Open Science practices, Open Peer Review (OPR) has become of central concern to the scholarly community. However, what OPR is understood to encompass and how effective some of its elements are in meeting the expectations of diverse communities, are uncertain.

This scoping review updates previous efforts to summarize research on OPR to May 2022. Following the PRISMA methodological framework, it addresses the question: “What evidence has been reported in the scientific literature from 2017 to May 2022 regarding uptake, attitudes, and efficacy of two key aspects of OPR (Open Identities and Open Reports)?”

The review identifies, analyses and synthesizes 52 studies matching inclusion criteria, finding that OPR is growing, but still far from common practice. Our findings indicate positive attitudes towards Open Reports and more sceptical approaches to Open Identities.

Changes in reviewer behaviour seem limited and no evidence for lower acceptance rates of review invitations or slower turnaround times is reported in those studies examining those issues. Concerns about power dynamics and potential backfiring on critical reviews are in need of further experimentation.

We conclude with an overview of evidence gaps and suggestions for future research. Also, we discuss implications for policy and practice, both in the scholarly communications community and the research evaluation community more broadly.

URL : Additional experiments required: A scoping review of recent evidence on key aspects of Open Peer Review

DOI : https://doi.org/10.1093/reseval/rvae004

Is gold open access helpful for academic purification? A causal inference analysis based on retracted articles in biochemistry

Authors : Er-Te Zheng, Zhichao Fang, Hui-Zhen Fu

The relationship between transparency and credibility has long been a subject of theoretical and analytical exploration within the realm of social sciences, and it has recently attracted increasing attention in the context of scientific research. Retraction serves as a pivotal mechanism in addressing concerns about research integrity.

This study aims to empirically examining the relationship between open access level and the effectiveness of current mechanism, specifically academic purification centered on retracted articles. In this study, we used matching and Difference-in-Difference (DiD) methods to examine whether gold open access is helpful for academic purification in biochemistry field.

We collected gold open access (Gold OA) and non-open access (non-OA) biochemistry retracted articles as the treatment group, and matched them with corresponding unretracted articles as the control group from 2005 to 2021 based on Web of Science and Retraction Watch database.

The results showed that compared to non-OA, Gold OA is advantageous in reducing the retraction time of flawed articles, but does not demonstrate a significant advantage in reducing citations after retraction. This indicates that Gold OA may help expedite the detection and retraction of flawed articles, ultimately promoting the practice of responsible research.

DOI : https://doi.org/10.1016/j.ipm.2023.103640

The experiences of COVID-19 preprint authors: a survey of researchers about publishing and receiving feedback on their work during the pandemic

Authors : Narmin Rzayeva, Susana Oliveira Henriques, Stephen Pinfield, Ludo Waltman

The COVID-19 pandemic caused a rise in preprinting, triggered by the need for open and rapid dissemination of research outputs. We surveyed authors of COVID-19 preprints to learn about their experiences with preprinting their work and also with publishing their work in a peer-reviewed journal.

Our research had the following objectives: 1. to learn about authors’ experiences with preprinting, their motivations, and future intentions; 2. to consider preprints in terms of their effectiveness in enabling authors to receive feedback on their work; 3. to compare the impact of feedback on preprints with the impact of comments of editors and reviewers on papers submitted to journals. In our survey, 78% of the new adopters of preprinting reported the intention to also preprint their future work.

The boost in preprinting may therefore have a structural effect that will last after the pandemic, although future developments will also depend on other factors, including the broader growth in the adoption of open science practices. A total of 53% of the respondents reported that they had received feedback on their preprints. However, more than half of the feedback was received through “closed” channels–privately to the authors.

This means that preprinting was a useful way to receive feedback on research, but the value of feedback could be increased further by facilitating and promoting “open” channels for preprint feedback. Almost a quarter of the feedback received by respondents consisted of detailed comments, showing the potential of preprint feedback to provide valuable comments on research.

Respondents also reported that, compared to preprint feedback, journal peer review was more likely to lead to major changes to their work, suggesting that journal peer review provides significant added value compared to feedback received on preprints.

URL : The experiences of COVID-19 preprint authors: a survey of researchers about publishing and receiving feedback on their work during the pandemic

DOI : https://doi.org/10.7717/peerj.15864

Does the Use of Unusual Combinations of Datasets Contribute to Greater Scientific Impact?

Authors : Yulin Yu, Daniel M. Romero

Scientific datasets play a crucial role in contemporary data-driven research, as they allow for the progress of science by facilitating the discovery of new patterns and phenomena. This mounting demand for empirical research raises important questions on how strategic data utilization in research projects can stimulate scientific advancement.

In this study, we examine the hypothesis inspired by the recombination theory, which suggests that innovative combinations of existing knowledge, including the use of unusual combinations of datasets, can lead to high-impact discoveries. We investigate the scientific outcomes of such atypical data combinations in more than 30,000 publications that leverage over 6,000 datasets curated within one of the largest social science databases, ICPSR.

This study offers four important insights. First, combining datasets, particularly those infrequently paired, significantly contributes to both scientific and broader impacts (e.g., dissemination to the general public). Second, the combination of datasets with atypically combined topics has the opposite effect — the use of such data is associated with fewer citations.

Third, younger and less experienced research teams tend to use atypical combinations of datasets in research at a higher frequency than their older and more experienced counterparts.

Lastly, despite the benefits of data combination, papers that amalgamate data remain infrequent. This finding suggests that the unconventional combination of datasets is an under-utilized but powerful strategy correlated with the scientific and broader impact of scientific discoveries.

URL : https://arxiv.org/abs/2402.05024

In the Academy, Data Science Is Lonely: Barriers to Adopting Data Science Methods for Scientific Research

Authors : Gabrielle O’Brien, Jordan Mick

Data science has been heralded as a transformative family of methods for scientific discovery. Despite this excitement, putting these methods into practice in scientific research has proven challenging. We conducted a qualitative interview study of 25 researchers at the University of Michigan, all scientists who currently work outside of data science (in fields such as astronomy, education, chemistry, and political science) and wish to adopt data science methods as part of their research program.

Semi-structured interviews explored the barriers they faced and strategies scientists used to persevere. These scientists quickly identified that they lacked the expertise to confidently implement and interpret new methods.

For most, independent study was unsuccessful, owing to limited time, missing foundational skills, and difficulty navigating the marketplace of educational data science resources. Overwhelmingly, participants reported isolation in their endeavors and a desire for a greater community. Many sought to bootstrap a community on their own, with mixed results.

Based on their narratives, we provide preliminary recommendations for academic departments, training programs, campus-wide data science initiatives, and universities to build supportive communities of practice that cultivate expertise. These community relationships may be key to growing the research capacity of scientific institutions. 

DOI : https://doi.org/10.1162/99608f92.7ca04767