Big data is not about size: when data transform scholarship

Authors : Jean-Christophe Plantin, Carl Lagoze, Paul N. Edwards, Christian Sandvig

“Big data” discussions typically focus on scale, i.e. the problems and potentials inherent in very large collections. Here, we argue that the most important consequences of “big data” for scholarship stem not from the increasing size of datasets, but instead from a loss of control over the sources of data.

The breakdown of the “control zone” due to the uncertain provenance of data has implications for data integrity, and can be disruptive to scholarship in multiple ways. A retrospective look at the introduction of larger datasets in weather forecasting and epidemiology shows that more data can at times be counter-productive, or destabilize already existing methods.

Based on these examples, we look at two implications of “big data” for scholarship: when the presence of large datasets transforms the traditional disciplinary structure of sciences, as well as the infrastructure for scholarly communication.