Evolution of the « long tail » concept for scientific data

Authors : Gretchen R. Stahlman, Inna Kouper

This review paper explores the evolution of discussions about « long-tail » scientific data in the scholarly literature. The « long-tail » concept, originally used to explain trends in digital consumer goods, was first applied to scientific data in 2007 to refer to a vast array of smaller, heterogeneous data collections that cumulatively represent a substantial portion of scientific knowledge. However, these datasets, often referred to as « long-tail data, » are frequently mismanaged or overlooked due to inadequate data management practices and institutional support.

This paper examines the changing landscape of discussions about long-tail data over time, situated within broader ecosystems of research data management and the natural interplay between « big » and « small » data.

The review also bridges discussions on data curation in Library & Information Science (LIS) and domain-specific contexts, contributing to a more comprehensive understanding of the long-tail concept’s utility for effective data management outcomes. The review aims to provide a more comprehensive understanding of this concept, its terminological diversity in the literature, and its utility for guiding data management, overall informing current and future information science research and practice.

Arxiv : https://arxiv.org/abs/2412.13307