Does the Use of Unusual Combinations of Datasets Contribute to Greater Scientific Impact?

Authors : Yulin Yu, Daniel M. Romero

Scientific datasets play a crucial role in contemporary data-driven research, as they allow for the progress of science by facilitating the discovery of new patterns and phenomena. This mounting demand for empirical research raises important questions on how strategic data utilization in research projects can stimulate scientific advancement.

In this study, we examine the hypothesis inspired by the recombination theory, which suggests that innovative combinations of existing knowledge, including the use of unusual combinations of datasets, can lead to high-impact discoveries. We investigate the scientific outcomes of such atypical data combinations in more than 30,000 publications that leverage over 6,000 datasets curated within one of the largest social science databases, ICPSR.

This study offers four important insights. First, combining datasets, particularly those infrequently paired, significantly contributes to both scientific and broader impacts (e.g., dissemination to the general public). Second, the combination of datasets with atypically combined topics has the opposite effect — the use of such data is associated with fewer citations.

Third, younger and less experienced research teams tend to use atypical combinations of datasets in research at a higher frequency than their older and more experienced counterparts.

Lastly, despite the benefits of data combination, papers that amalgamate data remain infrequent. This finding suggests that the unconventional combination of datasets is an under-utilized but powerful strategy correlated with the scientific and broader impact of scientific discoveries.

URL : https://arxiv.org/abs/2402.05024

In the Academy, Data Science Is Lonely: Barriers to Adopting Data Science Methods for Scientific Research

Authors : Gabrielle O’Brien, Jordan Mick

Data science has been heralded as a transformative family of methods for scientific discovery. Despite this excitement, putting these methods into practice in scientific research has proven challenging. We conducted a qualitative interview study of 25 researchers at the University of Michigan, all scientists who currently work outside of data science (in fields such as astronomy, education, chemistry, and political science) and wish to adopt data science methods as part of their research program.

Semi-structured interviews explored the barriers they faced and strategies scientists used to persevere. These scientists quickly identified that they lacked the expertise to confidently implement and interpret new methods.

For most, independent study was unsuccessful, owing to limited time, missing foundational skills, and difficulty navigating the marketplace of educational data science resources. Overwhelmingly, participants reported isolation in their endeavors and a desire for a greater community. Many sought to bootstrap a community on their own, with mixed results.

Based on their narratives, we provide preliminary recommendations for academic departments, training programs, campus-wide data science initiatives, and universities to build supportive communities of practice that cultivate expertise. These community relationships may be key to growing the research capacity of scientific institutions. 

DOI : https://doi.org/10.1162/99608f92.7ca04767

Exploring National Infrastructures to Support Impact Analyses of Publicly Accessible Research: A Need for Trust, Transparency and Collaboration at Scale

Authors : Jennifer Kemp, Charles Watkinson, Christina Drummond

Usage data on research outputs such as books and journals is well established in the scholarly community. However, as research impact is derived from a broader set of scholarly outputs, such as data, code and multimedia, more holistic usage and impact metrics could inform national innovation and research policy.

Usage data reporting standards, such as Project COUNTER, provide the basis for shared statistics reporting practice; however, as mandated access to publicly funded research has increased the demand for impact metrics and analytics, stakeholders are exploring how to scaffold and strengthen shared infrastructure to better support the trusted, multi-stakeholder exchange of usage data across a variety of outputs.

In April 2023, a workshop on Exploring National Infrastructure for Public Access and Impact Reporting supported by the United States (US) National Science Foundation (NSF), explored these issues. This paper contextualizes the resources shared and recommendations generated in the workshop.

DOI : https://dx.doi.org/10.7302/22166

On the Fast Track to Full Gold Open Access

Author : Robert Kudelić

The world of scientific publishing is changing; the days of an old type of subscription-based earnings for publishers seem over, and we are entering a new era. It seems as if an ever-increasing number of journals from disparate publishers are going Gold, Open Access that is, yet have we rigorously ascertained the issue in its entirety, or are we touting the strengths and forgetting about constructive criticism and careful weighing of evidence?

We will therefore present the current state of the art, in a compact review/bibliometrics style, of this more relevant than ever hot topic and suggest solutions that are most likely to be acceptable to all parties–while the performed analysis also shows there seems to be a link between trends in scientific publishing and tumultuous world events, which in turn has a special significance for the publishing environment in the current world stage.

URL : On the Fast Track to Full Gold Open Access

Arxiv : https://arxiv.org/abs/2311.08313

A framework for improving the accessibility of research papers on arXiv.org

Authors : Shamsi Brinn, Christopher Cameron, David Fielding, Charles Frankston, Alison Fromme, Peter Huang, Mark Nazzaro, Stephanie Orphan, Steinn Sigurdsson, Ryan Tay, Miranda Yang, Qianyu Zhou

The research content hosted by arXiv is not fully accessible to everyone due to disabilities and other barriers. This matters because a significant proportion of people have reading and visual disabilities, it is important to our community that arXiv is as open as possible, and if science is to advance, we need wide and diverse participation.

In addition, we have mandates to become accessible, and accessible content benefits everyone. In this paper, we will describe the accessibility problems with research, review current mitigations (and explain why they aren’t sufficient), and share the results of our user research with scientists and accessibility experts.

Finally, we will present arXiv’s proposed next step towards more open science: offering HTML alongside existing PDF and TeX formats. An accessible HTML version of this paper is also available at https://info.arxiv.org/about/accessibility_research_report.html

URL : https://arxiv.org/abs/2212.07286

Agile Research Data Management with Open Source: LinkAhead

Authors : Daniel Hornung, Florian Spreckelsen, Thomas Weiß

Research data management (RDM) in academic scientific environments increasingly enters the focus as an important part of good scientific practice and as a topic with big potentials for saving time and money. Nevertheless, there is a shortage of appropriate tools, which fulfill the specific requirements in scientific research.

We identified where the requirements in science deviate from other fields and proposed a list of requirements which RDM software should answer to become a viable option. We analyzed a number of currently available technologies and tool categories for matching these requirements and identified areas where no tools can satisfy researchers’ needs.

Finally we assessed the open-source RDMS (research data management system) LinkAhead for compatibility with the proposed features and found that it fulfills the requirements in the area of semantic, flexible data handling in which other tools show weaknesses.

URL : Agile Research Data Management with Open Source: LinkAhead

DOI : https://doi.org/10.48694/inggrid.3866

“On the ruins of seriality”: The scientific journal and the nature of the scientific life

Author : Dorien Daling

Twenty-first-century discourse on science has been marked by narratives of crisis. Science is said to be experiencing crises of public trust, of peer review and publishing, of reproducibility and replicability, and of recognition and reward.

The dominant response has been to “repair” the scientific literature and the system of scientific publishing through open science. This paper places the current predicament of scholarly communication in historical perspective by exploring the evolution of the scientific journal in the second half of the twentieth century.

I focus on a new genre of scientific journal invented by Dutch commercial publishers shortly after World War II, and on its effects on the nature of the scientific life. I show that profit-oriented publishers and discipline-building scientists worked together to make postwar science more open, while also arguing that formats of scientific publication have their own agency.

URL : “On the ruins of seriality”: The scientific journal and the nature of the scientific life

DOI : https://doi.org/10.1016/j.endeavour.2023.100885