Social Science Data Repositories in Data Deluge: A Case Study at ICPSR Workflow and Practices

Authors :  Wei Jeng, Daqing He, Yu Chi

Design/methodology/approach

We conducted two focus group sessions and one individual interview with eight employees at the world’s largest social science data repository, the Interuniversity Consortium for Political and Social Research (ICPSR).

By examining their current actions (activities regarding their work responsibilities) and IT practices, we studied the barriers and challenges of archiving and curating qualitative data at ICPSR.

Purpose

Due to the recent surge of interest in the age of the data deluge, the importance of researching data infrastructures is increasing. The Open Archival Information System (OAIS) model has been widely adopted as a framework for creating and maintaining digital repositories.

Considering that OAIS is a reference model that requires customization for actual practice, this study examines how the current practices in a data repository map to the OAIS environment and functional components.

Findings

We observed that the OAIS model is robust and reliable in actual service processes for data curation and data archives. In addition, a data repository’s workflow resembles digital archives or even digital libraries.

On the other hand, we find that: 1) the cost of preventing disclosure risk and 2) a lack of agreement on the standards of text data files are the most apparent obstacles for data curation professionals to handle qualitative data; 3) the maturation of data metrics seems to be a promising solution to several challenges in social science data sharing.

Original value

We evaluated the gap between a research data repository’s current practices and the adoption of the OAIS model. We also identified answers to questions such as how current technological infrastructure in a leading data repository such as ICPSR supports their daily operations, what the ideal technologies in those data repositories would be, and the associated challenges that accompany these ideal technologies.

Most importantly, we helped to prioritize challenges and barriers from the data curator’s perspective, and contribute implications of data sharing and reuse in social sciences.

URL : http://d-scholarship.pitt.edu/31876/

Current Status of Chinese Open Access Institutional Repositories: A Case Study

Authors : K. C. Das, Kunwar Singh

The present study mainly focuses on the current status of Chinese Open Access Institutional Repositories: A Case Study.The present study attempts to determine the current status of open access institutional repositories in China based on the four key constraints, i.e. number of IRs, types, subjects and contents and software used.

To fulfill the specified objectives, the Open access institutional repositories in China were identified by selecting the database of Directory of Open Access Repositories (Open DOAR) and the data were collected analysed for the necessary information.

The study highlights the current status of open access institutional repositories in China and its contribution to a global knowledge base.

URL : Current Status of Chinese Open Access Institutional Repositories: A Case Study

Semantic representation and enrichment of information retrieval experimental data

Authors : Gianmaria Silvello, Georgeta Bordea, Nicola Ferro, Paul Buitelaar, Toine Bogers

Experimental evaluation carried out in international large-scale campaigns is a fundamental pillar of the scientific and technological advancement of information retrieval (IR) systems.

Such evaluation activities produce a large quantity of scientific and experimental data, which are the foundation for all the subsequent scientific production and development of new systems.

In this work, we discuss how to semantically annotate and interlink this data, with the goal of enhancing their interpretation, sharing, and reuse. We discuss the underlying evaluation workflow and propose a resource description framework model for those workflow parts.

We use expertise retrieval as a case study to demonstrate the benefits of our semantic representation approach. We employ this model as a means for exposing experimental data as linked open data (LOD) on the Web and as a basis for enriching and automatically connecting this data with expertise topics and expert profiles.

In this context, a topic-centric approach for expert search is proposed, addressing the extraction of expertise topics, their semantic grounding with the LOD cloud, and their connection to IR experimental data.

Several methods for expert profiling and expert finding are analysed and evaluated. Our results show that it is possible to construct expert profiles starting from automatically extracted expertise topics and that topic-centric approaches outperform state-of-the-art language modelling approaches for expert finding.

URL : https://aran.library.nuigalway.ie/handle/10379/5862

Penser les relations médiatiques du livre et de l’hypertexte à partir de 253 de Geoff Ryman et Luminous Airplanes de Paul La Farge

Auteur/Author : Gaëlle Debeaux

À travers deux exemples d’œuvres romanesques contemporaines à cheval sur deux médias, cet article vise à analyser les processus de remédiatisation et de transmédiatisation qui impliquent le développement narratif d’un récit à la fois dans un livre et dans un hypertexte.

Il s’agit de chercher à comprendre comment on peut transposer la logique hypertextuelle en logique livresque, et comment on raconte des histoires sur ces deux dispositifs, comment une même histoire peut être développée d’un dispositif à l’autre.

URL : https://itineraires.revues.org/3405

Experiences in integrated data and research object publishing using GigaDB

Authors : Scott C Edmunds, Peter Li, Christopher I Hunter, Si Zhe Xiao, Robert L Davidson, Nicole Nogoy, Laurie Goodman

In the era of computation and data-driven research, traditional methods of disseminating research are no longer fit-for-purpose. New approaches for disseminating data, methods and results are required to maximize knowledge discovery.

The “long tail” of small, unstructured datasets is well catered for by a number of general-purpose repositories, but there has been less support for “big data”. Outlined here are our experiences in attempting to tackle the gaps in publishing large-scale, computationally intensive research.

GigaScience is an open-access, open-data journal aiming to revolutionize large-scale biological data dissemination, organization and re-use. Through use of the data handling infrastructure of the genomics centre BGI, GigaScience links standard manuscript publication with an integrated database (GigaDB) that hosts all associated data, and provides additional data analysis tools and computing resources.

Furthermore, the supporting workflows and methods are also integrated to make published articles more transparent and open. GigaDB has released many new and previously unpublished datasets and data types, including as urgently needed data to tackle infectious disease outbreaks, cancer and the growing food crisis.

Other “executable” research objects, such as workflows, virtual machines and software from several GigaScience articles have been archived and shared in reproducible, transparent and usable formats.

With data citation producing evidence of, and credit for, its use in the wider research community, GigaScience demonstrates a move towards more executable publications. Here data analyses can be reproduced and built upon by users without coding backgrounds or heavy computational infrastructure in a more democratized manner.

URL : Experiences in integrated data and research object publishing using GigaDB

DOI : http://link.springer.com/article/10.1007/s00799-016-0174-6

Advancing research data publishing practices for the social sciences: from archive activity to empowering researchers

Authors : Veerle Van den Eynden, Louise Corti

Sharing and publishing social science research data have a long history in the UK, through long-standing agreements with government agencies for sharing survey data and the data policy, infrastructure, and data services supported by the Economic and Social Research Council.

The UK Data Service and its predecessors developed data management, documentation, and publishing procedures and protocols that stand today as robust templates for data publishing.

As the ESRC research data policy requires grant holders to submit their research data to the UK Data Service after a grant ends, setting standards and promoting them has been essential in raising the quality of the resulting research data being published. In the past, received data were all processed, documented, and published for reuse in-house.

Recent investments have focused on guiding and training researchers in good data management practices and skills for creating shareable data, as well as a self-publishing repository system, ReShare. ReShare also receives data sets described in published data papers and achieves scientific quality assurance through peer review of submitted data sets before publication.

Social science data are reused for research, to inform policy, in teaching and for methods learning. Over a 10 years period, responsive developments in system workflows, access control options, persistent identifiers, templates, and checks, together with targeted guidance for researchers, have helped raise the standard of self-publishing social science data.

Lessons learned and developments in shifting publishing social science data from an archivist responsibility to a researcher process are showcased, as inspiration for institutions setting up a data repository.

URL : Advancing research data publishing practices for the social sciences: from archive activity to empowering researchers

DOI : doi:10.1007/s00799-016-0177-3

Analyse sociologique et économique du financement participatif. Ressorts et critiques dans le cas du journalisme (2010-2015)

Auteur/Author : Guillaume Goasdoué

Le système du financement participatif met en relation des contributeurs individuels et des médias qui cherchent des fonds pour quatre raisons principales : le sauvetage, la diversification, la création, les projets ponctuels.

Comment le capital (symbolique, social) accumulé par les porteurs de projet favorise-t-il l’ampleur et l’issue des collectes ? De quelles manières la dimension sociale du mécanisme est-elle exploitée par ce type de procédé de collecte de fonds ?

D’abord, nous répondrons à ces questions en commençant par discuter des limites de la littérature internationale. Ensuite, nous présenterons nos données pour le secteur de l’information, puis nous traiterons plus spécifiquement des ressorts sociaux du phénomène.

Enfin, nous finirons par les aspects liés au travail de recherche de visibilité. Seront ainsi critiquées quelques idées reçues, puis exposées diverses formes d’inégalités entre les médias qui recourent à ce système.

URL : https://ticetsociete.revues.org/2154