Au-delà des big data : Les sciences sociales et la multiplication des données numériques

Auteurs/Authors : Étienne Ollion, Julien Boelaert

Dans le débat public comme dans le monde académique, l’enthousiasme pour les big data n’a eu d’égal que les critiques que ce phénomène a suscité. « Opportunité empirique inouïe » vs « données pauvres » ; « révolution méthodologique » vs « fascination pour le nombre » ; « révolution scientifique » vs « dégradation du savoir produit » : les positions sont tranchées.

À partir d’une lecture de ces débats et des travaux en sciences sociales souvent regroupés sous ce label, l’article soutient que cette situation polarisée a de fortes chances de perdurer tant que la discussion s’organise autour du concept mal défini de big data. Il propose de distinguer différents types de données souvent regroupées sous ce terme.

Il montre ce faisant que les big data souvent évoquées ne sont qu’un aspect limité d’une transformation bien plus importante : la disponibilité croissante et massive de données numériques, qui pose des questions nouvelles à nos disciplines.

Quatre aspects sont plus particulièrement explorés : les réorganisations disciplinaires, les transformations des méthodes quantitatives, l’accès et la gestion des données, les objets des sciences sociales et leur rapport à la théorie.


Big data challenges for the social sciences: from society and opinion to replications

Author : Dominique Boullier

Big Data dealing with the social produce predictive correlations for the benefit of brands and web platforms. Beyond « society » and « opinion » for which the text lays out a genealogy, appear the « traces » that must be theorized as « replications » by the social sciences in order to reap the benefits of the uncertain status of entities’ widespread traceability.

High frequency replications as a collective phenomenon did exist before the digital networks emergence but now they leave traces that can be computed. The third generation of Social Sciences currently emerging must assume the specific nature of the world of data created by digital networks, without reducing them to the categories of the sciences of « society » or « opinion ».

Examples from recent works on Twitter and other digital corpora show how the search for structural effects or market-style trade-offs are prevalent even though insights about propagation, virality and memetics could help build a new theoretical framework.


New Horizons for a Data-Driven Economy : A Roadmap for Usage and Exploitation of Big Data in Europe

Editors : José María Cavanillas, Edward Curry, Wolfgang Wahlster

In this book readers will find technological discussions on the existing and emerging technologies across the different stages of the big data value chain. They will learn about legal aspects of big data, the social impact, and about education needs and requirements.

And they will discover the business perspective and how big data technology can be exploited to deliver value within different sectors of the economy.

URL : New Horizons for a Data-Driven Economy : A Roadmap for Usage and Exploitation of Big Data in Europe

Alternative location :


Cloud-Based Big Data Management and Analytics for Scholarly Resources: Current Trends, Challenges and Scope for Future Research

Authors : Samiya Khan, Kashish A. Shakil, Mansaf Alam

With the shifting focus of organizations and governments towards digitization of academic and technical documents, there has been an increasing need to use this reserve of scholarly documents for developing applications that can facilitate and aid in better management of research.

In addition to this, the evolving nature of research problems has made them essentially interdisciplinary. As a result, there is a growing need for scholarly applications like collaborator discovery, expert finding and research recommendation systems.

This research paper reviews the current trends and identifies the challenges existing in the architecture, services and applications of big scholarly data platform with a specific focus on directions for future research.


Big Data Refinement

Author : Eerke A. Boiten

« Big data » has become a major area of research and associated funding, as well as a focus of utopian thinking. In the still growing research community, one of the favourite optimistic analogies for data processing is that of the oil refinery, extracting the essence out of the raw data. Pessimists look for their imagery to the other end of the petrol cycle, and talk about the « data exhausts » of our society.

Obviously, the refinement community knows how to do « refining ». This paper explores the extent to which notions of refinement and data in the formal methods community relate to the core concepts in « big data ». In particular, can the data refinement paradigm can be used to explain aspects of big data processing?


How Does National Scientific Funding Support Emerging Interdisciplinary Research: A Comparison Study of Big Data Research in the US and China

Authors : Ying Huang, Yi Zhang, Jan Youtie, Alan L. Porter, Xuefeng Wang

How do funding agencies ramp-up their capabilities to support research in a rapidly emerging area?

This paper addresses this question through a comparison of research proposals awarded by the US National Science Foundation (NSF) and the National Natural Science Foundation of China (NSFC) in the field of Big Data.

Big data is characterized by its size and difficulties in capturing, curating, managing and processing it in reasonable periods of time. Although Big Data has its legacy in longstanding information technology research, the field grew very rapidly over a short period.

We find that the extent of interdisciplinarity is a key aspect in how these funding agencies address the rise of Big Data. Our results show that both agencies have been able to marshal funding to support Big Data research in multiple areas, but the NSF relies to a greater extent on multi-program funding from different fields.

We discuss how these interdisciplinary approaches reflect the research hot-spots and innovation pathways in these two countries.

URL : How Does National Scientific Funding Support Emerging Interdisciplinary Research: A Comparison Study of Big Data Research in the US and China

DOI : 10.1371/journal.pone.0154509

Revisiting the Data Lifecycle with Big Data Curation

Author : Line Pouchard

As science becomes more data-intensive and collaborative, researchers increasingly use larger and more complex data to answer research questions.

The capacity of storage infrastructure, the increased sophistication and deployment of sensors, the ubiquitous availability of computer clusters, the development of new analysis techniques, and larger collaborations allow researchers to address grand societal challenges in a way that is unprecedented.

In parallel, research data repositories have been built to host research data in response to the requirements of sponsors that research data be publicly available. Libraries are re-inventing themselves to respond to a growing demand to manage, store, curate and preserve the data produced in the course of publicly funded research.

As librarians and data managers are developing the tools and knowledge they need to meet these new expectations, they inevitably encounter conversations around Big Data. This paper explores definitions of Big Data that have coalesced in the last decade around four commonly mentioned characteristics: volume, variety, velocity, and veracity.

We highlight the issues associated with each characteristic, particularly their impact on data management and curation. We use the methodological framework of the data life cycle model, assessing two models developed in the context of Big Data projects and find them lacking.

We propose a Big Data life cycle model that includes activities focused on Big Data and more closely integrates curation with the research life cycle. These activities include planning, acquiring, preparing, analyzing, preserving, and discovering, with describing the data and assuring quality being an integral part of each activity.

We discuss the relationship between institutional data curation repositories and new long-term data resources associated with high performance computing centers, and reproducibility in computational science.

We apply this model by mapping the four characteristics of Big Data outlined above to each of the activities in the model. This mapping produces a set of questions that practitioners should be asking in a Big Data project

URL : Revisiting the Data Lifecycle with Big Data Curation

Alternative location :