Authors : Gretchen R. Stahlman, Inna Kouper
This review paper explores the evolution of discussions about “long-tail” scientific data in the scholarly literature. The “long-tail” concept, originally used to explain trends in digital consumer goods, was first applied to scientific data in 2007 to refer to a vast array of smaller, heterogeneous data collections that cumulatively represent a substantial portion of scientific knowledge. However, these datasets, often referred to as “long-tail data,” are frequently mismanaged or overlooked due to inadequate data management practices and institutional support.
This paper examines the changing landscape of discussions about long-tail data over time, situated within broader ecosystems of research data management and the natural interplay between “big” and “small” data.
The review also bridges discussions on data curation in Library & Information Science (LIS) and domain-specific contexts, contributing to a more comprehensive understanding of the long-tail concept’s utility for effective data management outcomes. The review aims to provide a more comprehensive understanding of this concept, its terminological diversity in the literature, and its utility for guiding data management, overall informing current and future information science research and practice.
Arxiv : https://arxiv.org/abs/2412.13307