Data Science at the Singularity

Author : David Donoho

Something fundamental to computation-based research has really changed in the last ten years. In certain fields, progress is simply dramatically more rapid than previously. Researchers in affected fields are living through a period of profound transformation, as the fields undergo a transition to frictionless reproducibility (FR).

This transition markedly changes the rate of spread of ideas and practices, affects scientific mindsets and the goals of science, and erases memories of much that came before. The emergence of FR flows from 3 data science principles that matured together after decades of work by many technologists and numerous research communities.

The mature principles involve data sharing, code sharing, and competitive challenges, however implemented in the particularly strong form of frictionless open services. Empirical Machine Learning is today’s leading adherent field; its hidden superpower is adherence to frictionless reproducibility practices; these practices are responsible for the striking and surprising progress in AI that we see everywhere; they can be learned and adhered to by researchers in whatever research field, automatically increasing the rate of progress in each adherent field.

URL : Data Science at the Singularity

DOI : https://doi.org/10.1162/99608f92.b91339ef

Digital Scholarly Journals Are Poorly Preserved: A Study of 7 Million Articles

Author : Martin Paul Eve

Introduction

Digital preservation underpins the persistence of scholarly links and citations through the digital object identifier (DOI) system. We do not currently know, at scale, the extent to which articles assigned a DOI are adequately preserved.

Methods

We construct a database of preservation information from original archival sources and then examine the preservation statuses of 7,438,037 DOIs in a random sample.

Results

Of the 7,438,037 works examined, there were 5.9 million copies spread over the archives used in this work. Furthermore, a total of 4,342,368 of the works that we studied (58.38%) were present in at least one archive. However, this left 2,056,492 works in our sample (27.64%) that are seemingly unpreserved.

The remaining 13.98% of works in the sample were excluded either for being too recent (published in the current year), not being journal articles, or having insufficient date metadata for us to identify the source.

Discussion

Our study is limited by design in several ways. Among these are the facts that it uses only a subset of archives, it only tracks articles with DOIs, and it does not account for institutional repository coverage. Nonetheless, as an initial attempt to gauge the landscape, our results will still be of interest to libraries, publishers, and researchers.

Conclusion

This work reveals an alarming preservation deficit. Only 0.96% of Crossref members (n = 204) can be confirmed to digitally preserve over 75% of their content in three or more of the archives that we studied. (Note that when, in this article, we write “preserved,” we mean “that we were able to confirm as preserved,” as per the specified limitations of this study.) A slightly larger proportion, i.e., 8.5% (n = 1,797), preserved over 50% of their content in two or more archives.

However, many members, i.e., 57.7% (n = 12,257), only met the threshold of having 25% of their material in a single archive. Most worryingly, 32.9% (n = 6,982) of Crossref members seem not to have any adequate digital preservation in place, which is against the recommendations of the Digital Preservation Coalition.

URL : Digital Scholarly Journals Are Poorly Preserved: A Study of 7 Million Articles

DOI : https://doi.org/10.31274/jlsc.16288

Comparison of effect estimates between preprints and peer-reviewed journal articles of COVID-19 trials

Authors : Mauricia Davidson, Theodoros Evrenoglou, Carolina Graña, Anna Chaimani, Isabelle Boutron

Background

Preprints are increasingly used to disseminate research results, providing multiple sources of information for the same study. We assessed the consistency in effect estimates between preprint and subsequent journal article of COVID-19 randomized controlled trials.

Methods

The study utilized data from the COVID-NMA living systematic review of pharmacological treatments for COVID-19 (covid-nma.com) up to July 20, 2022. We identified randomized controlled trials (RCTs) evaluating pharmacological treatments vs. standard of care/placebo for patients with COVID-19 that were originally posted as preprints and subsequently published as journal articles.

Trials that did not report the same analysis in both documents were excluded. Data were extracted independently by pairs of researchers with consensus to resolve disagreements. Effect estimates extracted from the first preprint were compared to effect estimates from the journal article.

Results

The search identified 135 RCTs originally posted as a preprint and subsequently published as a journal article. We excluded 26 RCTs that did not meet the eligibility criteria, of which 13 RCTs reported an interim analysis in the preprint and a final analysis in the journal article. Overall, 109 preprint–article RCTs were included in the analysis.

The median (interquartile range) delay between preprint and journal article was 121 (73–187) days, the median sample size was 150 (71–464) participants, 76% of RCTs had been prospectively registered, 60% received industry or mixed funding, 72% were multicentric trials. The overall risk of bias was rated as ‘some concern’ for 80% of RCTs.

We found that 81 preprint–article pairs of RCTs were consistent for all outcomes reported. There were nine RCTs with at least one outcome with a discrepancy in the number of participants with outcome events or the number of participants analyzed, which yielded a minor change in the estimate of the effect. Furthermore, six RCTs had at least one outcome missing in the journal article and 14 RCTs had at least one outcome added in the journal article compared to the preprint. There was a change in the direction of effect in one RCT. No changes in statistical significance or conclusions were found.

Conclusions

Effect estimates were generally consistent between COVID-19 preprints and subsequent journal articles. The main results and interpretation did not change in any trial. Nevertheless, some outcomes were added and deleted in some journal articles.

URL : Comparison of effect estimates between preprints and peer-reviewed journal articles of COVID-19 trials

DOI : https://doi.org/10.1186/s12874-023-02136-8

Open science platforms fighting clandestine abuses of piracy and phishing: The Open Science Framework Case

Authors : Ayumi Ikeda, Fumiya Yonemitsu, Naoto Yoshimura, Kyoshiro Sasaki, Yuki Yamada

The Open Science Framework (OSF) is an important and useful platform for researchers to practice open science. However, OSF has recently been misused for criminal purposes, especially on search boards for watching pirated copyright works, leading to phishing sites.

This misuse can negatively influence the OSF server function; therefore, it is important to take appropriate measures. To protect the sound base of open science in the future, this paper reports cases where OSF has been abused for illegal activities and discusses various measures, including those already implement by OSF management.

URL : Open science platforms fighting clandestine abuses of piracy and phishing: The Open Science Framework Case

DOI : https://doi.org/10.31234/osf.io/xtuen

From Code to Tenure: Valuing Research Software in Academia

Authors : Eric A. Jensen, Daniel S. Katz

Research software is a driving force in today’s academic ecosystem, with most researchers relying on it to do their work, and many writing some of their own code. Despite its importance, research software is typically not included in tenure, promotion, and recognition policies and processes.

In this article, we invite discussions on how to value research software, integrate it into academic evaluations, and ensure its sustainability. We build on discussions hosted by the US Research Software Sustainability Institute and by the international Research Software Engineering community to outline a set of possible activities aimed at elevating the role of research software in academic career paths, recognition, and beyond.

One is a study to investigate the role of software contributions in academic promotions. Another is to document and share successful academic recognition practices for research software. A third is to create guidance documents for faculty hiring and tenure evaluations. Each of these proposed activities is a building block of a larger effort to create a more equitable, transparent, and dynamic academic ecosystem.

We’ve assembled 44 such ideas as a starting point and posted them as issues in GitHub. Our aim is to encourage engagement with this effort. Readers are invited to do this by adding potential activities or commenting on existing ideas to improve them.

The issues page can also serve to inform the community of ongoing activities so that efforts aren’t duplicated. Similarly, if someone else has already made strides in a particular area, point out their work to build collective knowledge.

Finally, the issues page is also intended to allow anyone interested in collaborating on a specific activity to indicate their willingness to do so. This living list serves as a hub for collective action and thought, with the overall aim of recognizing the value of creating and contributing research software.

URL : From Code to Tenure: Valuing Research Software in Academia

DOI : https://doi.org/10.21428/6ffd8432.8f39775d

Promoting values-based assessment in review, promotion, and tenure processes

Authors : Caitlin Carter, Michael R. Dougherty, Erin C. McKiernan, Greg Tananbaum

Criteria and guidelines for review, promotion, and tenure (RPT) processes form the bedrock of institutional and departmental policies, and are a major driver of faculty behavior, influencing the time faculty spend on different activities like outreach, publishing practices, and more.

However, research shows that many RPT guidelines emphasize quantity over quality when evaluating research and teaching, and favor bibliometrics over qualitative measures of broader impact.

RPT processes rarely explicitly recognize or reward the various public dimensions of faculty work (e.g., outreach, research sharing, science communication), or, when they do, relegate them to the service category, which is undervalued and often falls heavily on women and underrepresented groups.

There is a need to correct this mismatch between institutional missions or values—often focused on aspects like community engagement, equity, diversity, and inclusion, or public good—and the behaviors that are rewarded in academic assessments. We describe recent efforts to promote RPT reform and realign institutional incentives using a values-based approach, including an overview of workshops we ran at the 2023 Council of Graduate Departments of Psychology (COGDOP) Annual Meeting, the Association for Psychological Science (APS) Annual Convention, and the American Anthropological Association (AAA) Department Leaders Summer Institute.

These workshops were designed to guide participants through the process of brainstorming what values are important to them as departments, institutions, or more broadly as disciplines, and which faculty behaviors might embody these values and could be considered in RPT evaluations. We discuss how similar activities could promote broader culture change.

URL : Promoting values-based assessment in review, promotion, and tenure processes

DOI : https://doi.org/10.21428/6ffd8432.9eadd603

The impact of COVID-19 on the debate on open science: An analysis of expert opinion

Auteurs/Authors : Melanie Benson Marshall,  Stephen Pinfield, Pamela Abbott, Andrew Cox, Juan Pablo Alperin,  Natascha Chtena, Isabelle Dorsch, Alice Fleerackers, Monique Oliveira,
Isabella Peters

This study is an analysis of the international debate on open science that took place during the pandemic. It addresses the question, how did the COVID-19 pandemic impact the debate on open science?

The study takes the form of a qualitative analysis of a large corpus of key articles, editorials, blogs and thought pieces about the impact of COVID on open science, published during the pandemic in English, German, Portuguese, and Spanish.

The findings show that many authors believed that it was clear that the experience of the pandemic had illustrated or strengthened the case for open science, with language such as a “stress test”, “catalyst”, “revolution” or “tipping point” frequently used. It was commonly believed that open science had played a positive role in the response to the pandemic, creating a clear ‘line of sight’ between open science and societal benefits.

Whilst the arguments about open science deployed in the debate were not substantially new, the focuses of debate changed in some key respects. There was much less attention given to business models for open access and critical perspectives on open science, but open data sharing, preprinting, information quality and misinformation became most prominent in debates. There were also moves to reframe open science conceptually, particularly in connecting science with society and addressing broader questions of equity.

The impact of COVID-19 on the debate on open science: An analysis of expert opinion

DOI : https://doi.org/10.31235/osf.io/xy874