An analysis of the effects of sharing research data, code, and preprints on citations

Authors : Giovanni Colavizza, Lauren Cadwallader, Marcel LaFlamme, Grégory Dozot, Stéphane Lecorney, Daniel Rappo, Iain Hrynaszkiewicz

Calls to make scientific research more open have gained traction with a range of societal stakeholders. Open Science practices include but are not limited to the early sharing of results via preprints and openly sharing outputs such as data and code to make research more reproducible and extensible. Existing evidence shows that adopting Open Science practices has effects in several domains.

In this study, we investigate whether adopting one or more Open Science practices leads to significantly higher citations for an associated publication, which is one form of academic impact. We use a novel dataset known as Open Science Indicators, produced by PLOS and DataSeer, which includes all PLOS publications from 2018 to 2023 as well as a comparison group sampled from the PMC Open Access Subset. In total, we analyze circa 122’000 publications. We calculate publication and author-level citation indicators and use a broad set of control variables to isolate the effect of Open Science Indicators on received citations.

We show that Open Science practices are adopted to different degrees across scientific disciplines. We find that the early release of a publication as a preprint correlates with a significant positive citation advantage of about 20.2% on average. We also find that sharing data in an online repository correlates with a smaller yet still positive citation advantage of 4.3% on average.

However, we do not find a significant citation advantage for sharing code. Further research is needed on additional or alternative measures of impact beyond citations. Our results are likely to be of interest to researchers, as well as publishers, research funders, and policymakers.

Arxiv : https://arxiv.org/abs/2404.16171

Data services at the academic library: a natural history of horses and unicorns

Authors : Jeffrey Oliver, Fernando Rios, Kiriann Carin, Chun Ly

Objective

Increases in data-intensive research at colleges and universities is driving demand for data services provided by academic libraries. The current work investigates the distribution of library data services, how such services are offered, and the effect of resourcing on the amount of services offered by a library.

Methods

We used a web-based inventory of 25 academic libraries at U.S. Research 1 (R1) Carnegie institutions to assess the state of data services at university libraries. We categorized and quantified services, and tested for an effect of library resourcing on the size of library data service portfolios.

Results

Support for data management and geospatial services was relatively widespread, with increasing support in areas of data analyses and data visualization. There was significant variation among services in the modality in which they were offered (web, consult, instruction) and library resourcing had a significant effect on the number of data services a library offered.

Conclusions

While a core subset of these data services are offered at most academic libraries, more specialized topics are restricted to well-resourced libraries. In light of the influence of resource scarcity on the number of services a library can offer, intra- and inter-campus partnerships will be critical to ensure campus support for data service needs.

URL : Data services at the academic library: a natural history of horses and unicorns

DOI : https://doi.org/10.7191/jeslib.780

Assessing Quality Variations in Early Career Researchers’ Data Management Plans

Author : Jukka Rantasaari

This paper aims to better understand early career researchers’ (ECRs’) research data management (RDM) competencies by assessing the contents and quality of data management plans (DMPs) developed during a multi-stakeholder RDM course. We also aim to identify differences between DMPs in relation to several background variables (e.g., discipline, course track).

The Basics of Research Data Management (BRDM) course has been held in two multi-faculty, research-intensive universities in Finland since 2020. In this study, 223 ECRs’ DMPs created in the BRDM of 2020 – 2022 were assessed, using the recommendations and criteria of the Finnish DMP Evaluation Guide + General Finnish DMP Guidance (FDEG).

The median quality of DMPs appeared to be satisfactory. The differences in rating according to FDEG’s three-point performance criteria were statistically insignificant between DMPs developed in separate years, course tracks or disciplines. However, using content analysis, differences were found between disciplines or course tracks regarding DMP’s key characteristics such as sharing, storing, and preserving data.

DMPs that contained a data table (DtDMPs) also differed highly significantly from prose DMPs. DtDMPs better acknowledged the data handling needs of different data types and improved the overall quality of a DMP.

The results illustrated that the ECRs had learned the basic RDM competencies and grasped their significance to the integrity, reliability, and reusability of data. However, more focused, further training to reach the advanced competency is needed, especially in areas of handling and sharing personal data, legal issues, long-term preserving, and funders’ data policies.

Equally important to the cultural change when RDM is an organic part of the research practices is to merge research support services, processes, and infrastructure into the research projects’ processes. Additionally, incentives are needed for sharing and reusing data.

URL : Assessing Quality Variations in Early Career Researchers’ Data Management Plans

DOI : https://doi.org/10.2218/ijdc.v18i1.873

Between Flat-Earthers and Fitness Coaches: Who is Citing Scientific Publications in YouTube Video Descriptions?

Authors : Olga Zagovora, Katrin Weller

In this study, we undertake an extensive analysis of YouTube channels that reference research publications in their video descriptions, offering a unique insight into the intersection of digital media and academia. Our investigation focuses on three principal aspects: the background of YouTube channel owners, their thematic focus, and the nature of their operational dynamics, specifically addressing whether they work individually or in groups. Our results highlight a strong emphasis on content related to science and engineering, as well as health, particularly in channels managed by individual researchers and academic institutions.

However, there is a notable variation in the popularity of these channels, with professional YouTubers and commercial media entities often outperforming in terms of viewer engagement metrics like likes, comments, and views. This underscores the challenge academic channels face in attracting a wider audience. Further, we explore the role of academic actors on YouTube, scrutinizing their impact in disseminating research and the types of publications they reference.

Despite a general inclination towards professional academic topics, these channels displayed a varied effectiveness in spotlighting highly cited research. Often, they referenced a wide array of publications, indicating a diverse but not necessarily impact-focused approach to content selection.

Arxiv : https://arxiv.org/abs/2404.15083

The role of non-scientific factors vis-a-vis the quality of publications in determining their scholarly impact

Authors : Giovanni Abramo, Ciriaco Andrea D’Angelo, Leonardo Grilli

In the evaluation of scientific publications’ impact, the interplay between intrinsic quality and non-scientific factors remains a subject of debate. While peer review traditionally assesses quality, bibliometric techniques gauge scholarly impact. This study investigates the role of non-scientific attributes alongside quality scores from peer review in determining scholarly impact.

Leveraging data from the first Italian Research Assessment Exercise (VTR 2001-2003) and Web of Science citations, we analyse the relationship between quality scores, non-scientific factors, and publication short- and long-term impact.

Our findings shed light on the significance of non-scientific elements overlooked in peer review, offering policymakers and research management insights in choosing evaluation methodologies. Sections delve into the debate, identify non-scientific influences, detail methodologies, present results, and discuss implications.

Arxiv : https://arxiv.org/abs/2404.05345

Sentiment Analysis of Citations in Scientific Articles Using ChatGPT: Identifying Potential Biases and Conflicts of Interest

Author : Walid Hariri

Scientific articles play a crucial role in advancing knowledge and informing research directions. One key aspect of evaluating scientific articles is the analysis of citations, which provides insights into the impact and reception of the cited works. This article introduces the innovative use of large language models, particularly ChatGPT, for comprehensive sentiment analysis of citations within scientific articles.

By leveraging advanced natural language processing (NLP) techniques, ChatGPT can discern the nuanced positivity or negativity of citations, offering insights into the reception and impact of cited works. Furthermore, ChatGPT’s capabilities extend to detecting potential biases and conflicts of interest in citations, enhancing the objectivity and reliability of scientific literature evaluation.

This study showcases the transformative potential of artificial intelligence (AI)-powered tools in enhancing citation analysis and promoting integrity in scholarly research.

Arxiv : https://arxiv.org/abs/2404.01800

PubTator 3.0: an AI-powered literature resource for unlocking biomedical knowledge

Authors  : Chih-Hsuan Wei, Alexis Allot, Po-Ting Lai, Robert Leaman, Shubo Tian, Ling Luo, Qiao Jin, Zhizheng Wang, Qingyu Chen, Zhiyong Lu

PubTator 3.0 (https://www.ncbi.nlm.nih.gov/research/pubtator3/) is a biomedical literature resource using state-of-the-art AI techniques to offer semantic and relation searches for key concepts like proteins, genetic variants, diseases and chemicals. It currently provides over one billion entity and relation annotations across approximately 36 million PubMed abstracts and 6 million full-text articles from the PMC open access subset, updated weekly.

PubTator 3.0’s online interface and API utilize these precomputed entity relations and synonyms to provide advanced search capabilities and enable large-scale analyses, streamlining many complex information needs. We showcase the retrieval quality of PubTator 3.0 using a series of entity pair queries, demonstrating that PubTator 3.0 retrieves a greater number of articles than either PubMed or Google Scholar, with higher precision in the top 20 results.

We further show that integrating ChatGPT (GPT-4) with PubTator APIs dramatically improves the factuality and verifiability of its responses. In summary, PubTator 3.0 offers a comprehensive set of features and tools that allow researchers to navigate the ever-expanding wealth of biomedical literature, expediting research and unlocking valuable insights for scientific discovery.

URL : PubTator 3.0: an AI-powered literature resource for unlocking biomedical knowledge

DOI : https://doi.org/10.1093/nar/gkae235