Enabling preprint discovery, evaluation, and analysis with Europe PMC

Authors : Mariia Levchenko, Michael Parkin, Johanna McEntyre, Melissa Harrison

Preprints provide an indispensable tool for rapid and open communication of early research findings. Preprints can also be revised and improved based on scientific commentary uncoupled from journal-organised peer review. The uptake of preprints in the life sciences has increased significantly in recent years, especially during the COVID-19 pandemic, when immediate access to research findings became crucial to address the global health emergency.

With ongoing expansion of new preprint servers, improving discoverability of preprints is a necessary step to facilitate wider sharing of the science reported in preprints. To address the challenges of preprint visibility and reuse, Europe PMC, an open database of life science literature, began indexing preprint abstracts and metadata from several platforms in July 2018. Since then, Europe PMC has continued to increase coverage through addition of new servers, and expanded its preprint initiative to include the full text of preprints related to COVID-19 in July 2020 and then the full text of preprints supported by the Europe PMC funder consortium in April 2022.

The preprint collection can be searched via the website and programmatically, with abstracts and the open access full text of COVID-19 and Europe PMC funder preprint subsets available for bulk download in a standard machine-readable JATS XML format. This enables automated information extraction for large-scale analyses of the preprint corpus, accelerating scientific research of the preprint literature itself.

This publication describes steps taken to build trust, improve discoverability, and support reuse of life science preprints in Europe PMC. Here we discuss the benefits of indexing preprints alongside peer-reviewed publications, and challenges associated with this process.

URL : Enabling preprint discovery, evaluation, and analysis with Europe PMC

DOI : https://doi.org/10.1101/2024.04.19.590240

Re-use of research data in the social sciences. Use and users of digital data archive

Authors : Elina LateI, Michael Ochsner

The aim of this paper is to investigate the re-use of research data deposited in digital data archive in the social sciences. The study examines the quantity, type, and purpose of data downloads by analyzing enriched user log data collected from Swiss data archive. The findings show that quantitative datasets are downloaded increasingly from the digital archive and that downloads focus heavily on a small share of the datasets.

The most frequently downloaded datasets are survey datasets collected by research organizations offering possibilities for longitudinal studies. Users typically download only one dataset, but a group of heavy downloaders form a remarkable share of all downloads. The main user group downloading data from the archive are students who use the data in their studies. Furthermore, datasets downloaded for research purposes often, but not always, serve to be used in scholarly publications.

Enriched log data from data archives offer an interesting macro level perspective on the use and users of the services and help understanding the increasing role of repositories in the social sciences. The study provides insights into the potential of collecting and using log data for studying and evaluating data archive use.

URL : Re-use of research data in the social sciences. Use and users of digital data archive

DOI : https://doi.org/10.1371/journal.pone.0303190

Text mining arXiv: a look through quantitative finance papers

Author : Michele Leonardo Bianchi

This paper explores articles hosted on the arXiv preprint server with the aim to uncover valuable insights hidden in this vast collection of research. Employing text mining techniques and through the application of natural language processing methods, we examine the contents of quantitative finance papers posted in arXiv from 1997 to 2022.

We extract and analyze crucial information from the entire documents, including the references, to understand the topics trends over time and to find out the most cited researchers and journals on this domain. Additionally, we compare numerous algorithms to perform topic modeling, including state-of-the-art approaches.

Arxiv : https://arxiv.org/abs/2401.01751

Data journals: incentivizing data access and documentation within the scholarly communication system

Author : William H. Walters

Data journals provide strong incentives for data creators to verify, document and disseminate their data. They also bring data access and documentation into the mainstream of scholarly communication, rewarding data creators through existing mechanisms of peer-reviewed publication and citation tracking.

These same advantages are not generally associated with data repositories, or with conventional journals’ data-sharing mandates. This article describes the unique advantages of data journals.

It also examines the data journal landscape, presenting the characteristics of 13 data journals in the fields of biology, environmental science, chemistry, medicine and health sciences.

These journals vary considerably in size, scope, publisher characteristics, length of data reports, data hosting policies, time from submission to first decision, article processing charges, bibliographic index coverage and citation impact.

They are similar, however, in their peer review criteria, their open access license terms and the characteristics of their editorial boards.

URL : Data journals: incentivizing data access and documentation within the scholarly communication system

DOI : http://doi.org/10.1629/uksg.510

Playing the Bullshit Game: How Empty and Misleading Communication Takes Over Organizations

Author : André Spicer

Why is bullshit so common in some organizations? Existing explanations focus on the characteristics of bullshitters, the nature of the audience, and social structural factors which encourage bullshitting.

In this paper, I offer an alternative explanation: bullshitting is a social practice that organizational members engage with to become part of a speech community, to get things done in that community, and to reinforce their identity.

When the practice of bullshitting works, it can gradually expand from a small group to take over an entire organization and industry. When bullshitting backfires, previously sacred concepts can become seen as empty and misleading talk.

URL : Playing the Bullshit Game: How Empty and Misleading Communication Takes Over Organizations

DOI : https://doi.org/10.1177/2631787720929704

Data-sharing recommendations in biomedical journals and randomised controlled trials: an audit of journals following the ICMJE recommendations

Authors : Maximilian Siebert, Jeanne Fabiola Gaba, Laura Caquelin, Henri Gouraud, Alain Dupuy, David Moher, Florian Naudet

Objective

To explore the implementation of the International Committee of Medical Journal Editors (ICMJE) data-sharing policy which came into force on 1 July 2018 by ICMJE-member journals and by ICMJE-affiliated journals declaring they follow the ICMJE recommendations.

Design

A cross-sectional survey of data-sharing policies in 2018 on journal websites and in data-sharing statements in randomised controlled trials (RCTs).

Setting

ICMJE website; PubMed/Medline.

Eligibility criteria

ICMJE-member journals and 489 ICMJE-affiliated journals that published an RCT in 2018, had an accessible online website and were not considered as predatory journals according to Beall’s list. One hundred RCTs for member journals and 100 RCTs for affiliated journals with a data-sharing policy, submitted after 1 July 2018.

Main outcome measures

The primary outcome for the policies was the existence of a data-sharing policy (explicit data-sharing policy, no data-sharing policy, policy merely referring to ICMJE recommendations) as reported on the journal website, especially in the instructions for authors.

For RCTs, our primary outcome was the intention to share individual participant data set out in the data-sharing statement.

Results

Eight (out of 14; 57%) member journals had an explicit data-sharing policy on their website (three were more stringent than the ICMJE requirements, one was less demanding and four were compliant), five (35%) additional journals stated that they followed the ICMJE requirements, and one (8%) had no policy online. In RCTs published in these journals, there were data-sharing statements in 98 out of 100, with expressed intention to share individual patient data reaching 77 out of 100 (77%; 95% CI 67% to 85%).

One hundred and forty-five (out of 489) ICMJE-affiliated journals (30%; 26% to 34%) had an explicit data-sharing policy on their website (11 were more stringent than the ICMJE requirements, 85 were less demanding and 49 were compliant) and 276 (56%; 52% to 61%) merely referred to the ICMJE requirements.

In RCTs published in affiliated journals with an explicit data-sharing policy, data-sharing statements were rare (25%), and expressed intentions to share data were found in 22% (15% to 32%).

Conclusion

The implementation of ICMJE data-sharing requirements in online journal policies was suboptimal for ICMJE-member journals and poor for ICMJE-affiliated journals.

The implementation of the policy was good in member journals and of concern for affiliated journals. We suggest the conduct of continuous audits of medical journal data-sharing policies in the future.

URL : Data-sharing recommendations in biomedical journals and randomised controlled trials: an audit of journals following the ICMJE recommendations

DOI : http://dx.doi.org/10.1136/bmjopen-2020-038887

Alter-Value in Data Reuse: Non-Designated Communities and Creative Processes

Author : Guillaume Boutard

This paper builds on the investigation of data reuse in creative processes to discuss ‘epistemic pluralism’ and data ‘alter-value’ in research data management. Focussing on a specific non-designated community, we conducted semi-structured interviews with five artists in relation to five works.

Data reuse is a critical component of all these works. The qualitative content analysis brings to light agonistic-antagonistic practices in data reuse and shows multiple deconstructions of the notion of data value as it is portrayed in the data reuse literature.

Finally, the paper brings to light the benefits of including such practices in the conceptualization of data curation.

URL : Alter-Value in Data Reuse: Non-Designated Communities and Creative Processes

DOI : http://doi.org/10.5334/dsj-2020-023