Digitising Cultural Complexity: Representing Rich Cultural Data in a Big Data environment

Authors : Jennifer Edmond, Georgina Nugent Folan

One of the major terminological forces driving ICT integration in research today is that of « big data. » While the phrase sounds inclusive and integrative, « big data » approaches are highly selective, excluding input that cannot be effectively structured, represented, or digitised.

Data of this complex sort is precisely the kind that human activity produces, but the technological imperative to enhance signal through the reduction of noise does not accommodate this richness.

Data and the computational approaches that facilitate “big data” have acquired a perceived objectivity that belies their curated, malleable, reactive, and performative nature. In an input environment where anything can “be data” once it is entered into the system as “data,” data cleaning and processing, together with the metadata and information architectures that structure and facilitate our cultural archives acquire a capacity to delimit what data are.

This engenders a process of simplification that has major implications for the potential for future innovation within research environments that depend on rich material yet are increasingly mediated by digital technologies.

This paper presents the preliminary findings of the European-funded KPLEX (Knowledge Complexity) project which investigates the delimiting effect digital mediation and datafication has on rich, complex cultural data.

The paper presents a systematic review of existing implicit definitions of data, elaborating on the implications of these definitions and highlighting the ways in which metadata and computational technologies can restrict the interpretative potential of data.

It sheds light on the gap between analogue or augmented digital practices and fully computational ones, and the strategies researchers have developed to deal with this gap.

The paper proposes a reconceptualisation of data as it is functionally employed within digitally-mediated research so as to incorporate and acknowledge the richness and complexity of our source materials.

URL : https://hal.archives-ouvertes.fr/hal-01629459

D’abord les données, ensuite la méthode ? Big data et déterminisme en sciences sociales

Auteurs/Authors : Jean-Christophe Plantin, Federica Russo

Si les chercheurs en sciences sociales ont depuis longtemps recours à de larges quantités de données, par exemple avec les enquêtes par questionnaire, le recours à des données numériques massives et hétérogènes, ou « big data », est de plus en plus fréquent.

À travers un abandon de la théorie pour la recherche de corrélations, cette multitude de données suscite-t-elle une nouvelle forme de déterminisme ?

L’histoire des sciences sociales indique au contraire que l’accroissement des données disponibles a entraîné un rejet progressif d’une hypothèse déterministe héritée des sciences de la nature, au profit d’une autonomisation méthodologique fondée sur la modélisation statistique.

Dans ce contexte, cet article montre que l’accent mis sur la taille des big data ne signifie pas tant un retour au déterminisme, mais est davantage révélateur du désajustement actuel entre les caractéristiques de ces données massives et les méthodes et infrastructures en sciences sociales.

URL : https://socio.revues.org/2328

Learning Analytics and the Academic Library: Professional Ethics Commitments at a Crossroads

Authors : Kyle M.L. Jones, Dorothea Salo

In this paper, the authors address learning analytics and the ways academic libraries are beginning to participate in wider institutional learning analytics initiatives. Since there are moral issues associated with learning analytics, the authors consider how data mining practices run counter to ethical principles in the American Library Association’s “Code of Ethics.”

Specifically, the authors address how learning analytics implicates professional commitments to promote intellectual freedom; protect patron privacy and confidentiality; and balance intellectual property interests between library users, their institution, and content creators and vendors.

The authors recommend that librarians should embed their ethical positions in technological designs, practices, and governance mechanisms.

URL : Learning Analytics and the Academic Library: Professional Ethics Commitments at a Crossroads

Alternative location : http://crl.acrl.org/index.php/crl/article/view/16603

Big data is not about size: when data transform scholarship

Authors : Jean-Christophe Plantin, Carl Lagoze, Paul N. Edwards, Christian Sandvig

“Big data” discussions typically focus on scale, i.e. the problems and potentials inherent in very large collections. Here, we argue that the most important consequences of “big data” for scholarship stem not from the increasing size of datasets, but instead from a loss of control over the sources of data.

The breakdown of the “control zone” due to the uncertain provenance of data has implications for data integrity, and can be disruptive to scholarship in multiple ways. A retrospective look at the introduction of larger datasets in weather forecasting and epidemiology shows that more data can at times be counter-productive, or destabilize already existing methods.

Based on these examples, we look at two implications of “big data” for scholarship: when the presence of large datasets transforms the traditional disciplinary structure of sciences, as well as the infrastructure for scholarly communication.

URL : https://books.openedition.org/editionsmsh/9103

Accelerating Science: A Computing Research Agenda

Authors : Vasant G. Honavar, Mark D. Hill, Katherine Yelick

The emergence of « big data » offers unprecedented opportunities for not only accelerating scientific advances but also enabling new modes of discovery. Scientific progress in many disciplines is increasingly enabled by our ability to examine natural phenomena through the computational lens, i.e., using algorithmic or information processing abstractions of the underlying processes; and our ability to acquire, share, integrate and analyze disparate types of data.

However, there is a huge gap between our ability to acquire, store, and process data and our ability to make effective use of the data to advance discovery. Despite successful automation of routine aspects of data management and analytics, most elements of the scientific process currently require considerable human expertise and effort.

Accelerating science to keep pace with the rate of data acquisition and data processing calls for the development of algorithmic or information processing abstractions, coupled with formal methods and tools for modeling and simulation of natural processes as well as major innovations in cognitive tools for scientists, i.e., computational tools that leverage and extend the reach of human intellect, and partner with humans on a broad range of tasks in scientific discovery (e.g., identifying, prioritizing formulating questions, designing, prioritizing and executing experiments designed to answer a chosen question, drawing inferences and evaluating the results, and formulating new questions, in a closed-loop fashion).

This calls for concerted research agenda aimed at: Development, analysis, integration, sharing, and simulation of algorithmic or information processing abstractions of natural processes, coupled with formal methods and tools for their analyses and simulation; Innovations in cognitive tools that augment and extend human intellect and partner with humans in all aspects of science.

URL : https://arxiv.org/abs/1604.02006

 

Après l’Internet : le Cloud, les big data et l’Internet des objets

Auteur/Author : Vincent Mosco

Le présent article identifie les traits caractéristiques  de la prochaine phase du développement d’Internet en mettant l’accent sur l’informatique en nuage (le cloud computing) les services d’analyse des données (big datas analytics) et l’Internet des objets.

Ensemble ils étendent les possibilités de centraliser le contrôle sur les données, d’approfondir la commercialisation de l’information et d’élargir la portée d’Internet de la connexion des individus à la formation basée sur les données de réseaux d’objets.

Ils soulèvent également d’importantes questions de politique sociale, parmi lesquelles la concentration du pouvoir dans une poignée de compagnies étroitement liées au monde du renseignement militaire; les conséquences environnementales de la construction, de la mise sous influence et de la connexion des populations à un réseau mondial de centres de données en nuage (cloud computing); les conséquences de la connexion de milliards d’objets sur la vie privée et la sécurité; et l’impact des dispositifs intelligents sur l’avenir du travail.

URL : https://lesenjeux.univ-grenoble-alpes.fr/2016-dossier/09-Mosco-Fr/index.html