Evolution of the “long tail” concept for scientific data

Authors : Gretchen R. Stahlman, Inna Kouper

This review paper explores the evolution of discussions about “long-tail” scientific data in the scholarly literature. The “long-tail” concept, originally used to explain trends in digital consumer goods, was first applied to scientific data in 2007 to refer to a vast array of smaller, heterogeneous data collections that cumulatively represent a substantial portion of scientific knowledge. However, these datasets, often referred to as “long-tail data,” are frequently mismanaged or overlooked due to inadequate data management practices and institutional support.

This paper examines the changing landscape of discussions about long-tail data over time, situated within broader ecosystems of research data management and the natural interplay between “big” and “small” data.

The review also bridges discussions on data curation in Library & Information Science (LIS) and domain-specific contexts, contributing to a more comprehensive understanding of the long-tail concept’s utility for effective data management outcomes. The review aims to provide a more comprehensive understanding of this concept, its terminological diversity in the literature, and its utility for guiding data management, overall informing current and future information science research and practice.

Arxiv : https://arxiv.org/abs/2412.13307

When data sharing is an answer and when (often) it is not: Acknowledging data-driven, non-data, and data-decentered cultures

Authors : Isto HuvilaLuanne S. Sinnamon

Contemporary research and innovation policies and advocates of data-intensive research paradigms continue to urge increased sharing of research data. Such paradigms are underpinned by a pro-data, normative data culture that has become dominant in the contemporary discourse. Earlier research on research data sharing has directed little attention to its alternatives as more than a deficit. The present study aims to provide insights into researchers’ perspectives, rationales and practices of (non-)sharing of research data in relation to their research practices.

We address two research questions, (RQ1) what underpinning patterns can be identified in researchers’ (non-)sharing of research data, and (RQ2) how are attitudes and data-sharing linked to researchers’ general practices of conducting their research. We identify and describe data-decentered culture and non-data culture as alternatives and parallels to the data-driven culture, and describe researchers de-inscriptions of how they resist and appropriate predominant notions of data in their data practices by problematizing the notion of data, asserting exceptions to the general case of data sharing, and resisting or opting out from data sharing.

URL : When data sharing is an answer and when (often) it is not: Acknowledging data-driven, non-data, and data-decentered cultures

DOI : https://doi.org/10.1002/asi.24957

FAIR GPT: A virtual consultant for research data management in ChatGPT

Authors : Renat Shigapov, Irene Schumm

FAIR GPT is a first virtual consultant in ChatGPT designed to help researchers and organizations make their data and metadata compliant with the FAIR (Findable, Accessible, Interoperable, Reusable) principles. It provides guidance on metadata improvement, dataset organization, and repository selection.

To ensure accuracy, FAIR GPT uses external APIs to assess dataset FAIRness, retrieve controlled vocabularies, and recommend repositories, minimizing hallucination and improving precision. It also assists in creating documentation (data and software management plans, README files, and codebooks), and selecting proper licenses. This paper describes its features, applications, and limitations.

Arxiv : https://arxiv.org/abs/2410.07108

Reproducible and Attributable Materials Science Curation Practices: A Case Study

Authors : Ye Li, Sarah Laura Wilson, Micah Altman

While small labs produce much of the fundamental experimental research in Material Science and Engineering (MSE), little is known about their data management and sharing practices and the extent to which they promote trust in, and transparency of, the published research.

In this research, we conduct a case study of a leading MSE research lab to characterize the limits of current data management and sharing practices concerning reproducibility and attribution. We systematically reconstruct the workflows, underpinning four research projects by combining interviews, document review, and digital forensics. We then apply information graph analysis and computer-assisted retrospective auditing to identify where critical research information is unavailable or at risk.

We find that while data management and sharing practices in this leading lab protect against computer and disk failure, they are insufficient to ensure reproducibility or correct attribution of work — especially when a group member withdraws before project completion.

We conclude with recommendations for adjustments to MSE data management and sharing practices to promote trustworthiness and transparency by adding lightweight automated file-level auditing and automated data transfer processes.

URL : Reproducible and Attributable Materials Science Curation Practices: A Case Study

DOI : https://doi.org/10.2218/ijdc.v18i1.940

Why academics under-share research data: A social relational theory

Authors : Janice Bially MatternJoseph KohlburnHeather Moulaison-Sandy

Despite their professed enthusiasm for open science, faculty researchers have been documented as not freely sharing their data; instead, if sharing data at all, they take a minimal approach. A robust research agenda in LIS has documented the data under-sharing practices in which they engage, and the motivations they profess.

Using theoretical frameworks from sociology to complement research in LIS, this article examines the broader context in which researchers are situated, theorizing the social relational dynamics in academia that influence faculty decisions and practices relating to data sharing.

We advance a theory that suggests that the academy has entered a period of transition, and faculty resistance to data sharing through foot-dragging is one response to shifting power dynamics. If the theory is borne out empirically, proponents of open access will need to find a way to encourage open academic research practices without undermining the social value of academic researchers.

URL : Why academics under-share research data: A social relational theory

DOI : https://doi.org/10.1002/asi.24938

To share or not to share? Image data sharing in the social sciences and humanities

Authors : Elina Late, Mette Skov, Sanna Kumpulainen

Introduction

The paper aims to investigate image data sharing within social science and humanities. While data sharing is encouraged as a part of the open science movement, little is known about the approaches and factors influencing the sharing of image data.

This information is evident as the use of image data in these fields of research is increasing, and data sharing is context dependent.

Method

The study analyses qualitative semi-structured interviews with 14 scholars who incorporate digital images as a core component of their research data.

Analysis

Content analysis is conducted to gather information about scholars’ image data sharing and motivating and impeding factors related to it.

Results

The findings show that image data sharing is not an established research practice, and when it happens it is mostly done via informal means by sharing data through personal contacts. Supporting the scientific community, the open science agenda and fulfilling research funders’ requirements motivate scholars to share their data. Impeding factors relate to the qualities of data, ownership of data, data stewardship, and research integrity.

Conclusion

Advancing image data sharing requires the development of research infrastructures and providing support and guidelines. Better understanding of the scholars’ image data practices is also needed.

URL : To share or not to share? Image data sharing in the social sciences and humanities

DOI : https://doi.org/10.47989/ir292834

Re-use of research data in the social sciences. Use and users of digital data archive

Authors : Elina LateI, Michael Ochsner

The aim of this paper is to investigate the re-use of research data deposited in digital data archive in the social sciences. The study examines the quantity, type, and purpose of data downloads by analyzing enriched user log data collected from Swiss data archive. The findings show that quantitative datasets are downloaded increasingly from the digital archive and that downloads focus heavily on a small share of the datasets.

The most frequently downloaded datasets are survey datasets collected by research organizations offering possibilities for longitudinal studies. Users typically download only one dataset, but a group of heavy downloaders form a remarkable share of all downloads. The main user group downloading data from the archive are students who use the data in their studies. Furthermore, datasets downloaded for research purposes often, but not always, serve to be used in scholarly publications.

Enriched log data from data archives offer an interesting macro level perspective on the use and users of the services and help understanding the increasing role of repositories in the social sciences. The study provides insights into the potential of collecting and using log data for studying and evaluating data archive use.

URL : Re-use of research data in the social sciences. Use and users of digital data archive

DOI : https://doi.org/10.1371/journal.pone.0303190