Report on Integration of Data and Publications …

Report on Integration of Data and Publications :

“Scholarly communication is the foundation of modern research where empirical evidence is interpreted and communicated as published hypothesis driven research. Many current and recent reports highlight the impact of advancing technology on modern research and consequences this has on scholarly communication. As part of the ODE project this report sought to coalesce current though and opinions from numerous and diverse sources to reveal opportunities for supporting a more connected and integrated scholarly record. Four perspectives were considered, those of the Researcher who generates or reuses primary data, Publishers who provide the mechanisms to communicate research activities and Libraries & Data enters who maintain and preserve the evidence that underpins scholarly communication and the published record. This report finds the landscape fragmented and complex where competing interests can sometimes confuse and confound requirements, needs and expectations. Equally the report identifies clear opportunity for all stakeholders to directly enable a more joined up and vital scholarly record of modern research.”

URL : http://www.libereurope.eu/sites/default/files/ODE-ReportOnIntegrationOfDataAndPublication.pdf

Digitize Me, Visualize Me, Search Me : Open Science and its Discontents

[…] Digitize Me, Visualize Me, Search Me takes as its starting point the so-called ‘computational turn’ to data-intensive scholarship in the humanities.

The phrase ‘the computational turn’ has been adopted to refer to the process whereby techniques and methodologies drawn from (in this case) computer science and related fields – including science visualization, interactive information visualization, image processing, network analysis, statistical data analysis, and the management, manipulation and mining of data – are being used to produce new ways of approaching and understanding texts in the humanities; what is sometimes thought of as ‘the digital humanities’.

The concern in the main has been with either digitizing ‘born analog’ humanities texts and artifacts (e.g. making annotated editions of the art and writing of William Blake available to scholars and researchers online), or gathering together ‘born digital’ humanities texts and artifacts (videos, websites, games, photography, sound recordings, 3D data), and then taking complex and often extremely large-scale data analysis techniques from computing science and related fields and applying them to these humanities texts and artifacts – to this ‘big data’, as it has been called.

Witness Lev Manovich and the Software Studies Initiative’s use of ‘digital image analysis and new visualization techniques’ to study ‘20,000 pages of Science and Popular Science magazines… published between 1872-1922, 780 paintings by van Gogh, 4535 covers of Time magazine (1923-2009) and one million manga pages’ (Manovich, 2011), and Dan Cohen and Fred Gibb’s text mining of ‘the 1,681,161 books that were published in English in the UK in the long nineteenth century’ (Cohen, 2010).

What Digitize Me, Visualize Me, Search Me endeavours to show is that such data-focused transformations in research can be seen as part of a major alteration in the status and nature of knowledge. It is an alteration that, according to the philosopher Jean-François Lyotard, has been taking place since at least the 1950s.

It involves nothing less than a shift away from a concern with questions of what is right and just, and toward a concern with legitimating power by optimizing the social system’s performance in instrumental, functional terms. This shift has significant consequences for our idea of knowledge.

[..] In particular, Digitize Me, Visualize Me, Search Me suggests that the turn in the humanities toward datadriven scholarship, science visualization, statistical data analysis, etc. can be placed alongside all those discourses that are being put forward at the moment – in both the academy and society – in the name of greater openness, transparency, efficiency and accountability.

URL : http://livingbooksaboutlife.org/pdfs/bookarchive/DigitizeMe.pdf

An Institutional Approach to Developing Research Data Management Infrastructure

This article outlines the work that the University of Oxford is undertaking to implement a coordinated data management infrastructure. The rationale for the approach being taken by Oxford is presented, with particular attention paid to the role of each service division. This is followed by a consideration of the relative advantages and disadvantages of institutional data repositories, as opposed to national or international data centres. The article then focuses on two ongoing JISC-funded projects, ‘Embedding Institutional Data Curation Services in Research’ (Eidcsr) and ‘Supporting Data Management Infrastructure for the Humanities’ (Sudamih).

Both projects are intra-institutional collaborations and involve working with researchers to develop particular aspects of infrastructure, including: University policy, systems for the preservation and documentation of research data, training and support, software tools for the visualisation of large images, and creating and sharing databases via the Web (Database as a Service).

URL : http://www.ijdc.net/index.php/ijdc/article/view/198

Citation and Peer Review of Data Moving Towards…

Citation and Peer Review of Data: Moving Towards Formal Data Publication

“This paper discusses many of the issues associated with formally publishing data in academia, focusing primarily on the structures that need to be put in place for peer review and formal citation of datasets. Data publication is becoming increasingly important to the scientific community, as it will provide a mechanism for those who create data to receive academic credit for their work and will allow the conclusions arising from an analysis to be more readily verifiable, thus promoting transparency in the scientific process. Peer review of data will also provide a mechanism for ensuring the quality of datasets, and we provide suggestions on the types of activities one expects to see in the peer review of data. A simple taxonomy of data publication methodologies is presented and evaluated, and the paper concludes with a discussion of dataset granularity, transience and semantics, along with a recommended human-readable citation syntax.”

URL : http://www.ijdc.net/index.php/ijdc/article/view/181

Building an Open Data Repository: Lessons and Challenge

Author : Limor Peer

The Internet has transformed scholarly research in many ways. Open access to data and other research output has been touted as a crucial step toward transparency and quality in science. This paper takes a critical look at what it takes to share social science research data, from the perspective of a small data repository at Yale University’s Institution for Social and Policy Studies.

The ISPS Data Archive was built to create an open access digital collection of social science experimental data, metadata, and associated files produced by ISPS researchers, for the purpose of replication of research findings, further analysis, and teaching.

This paper describes the development of the ISPS Data Archive and discusses the inter-related challenges of replication, integration, and stewardship. It argues that open data requires effort, investment of resources, and planning. By itself, it does not enhance knowledge.

URL : http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1931048

Extracting Transforming and Archiving Scientific Data It…

Extracting, Transforming and Archiving Scientific Data :

“It is becoming common to archive research datasets that are not only large but also numerous. In addition, their corresponding metadata and the software required to analyse or display them need to be archived. Yet the manual curation of research data can be difficult and expensive, particularly in very large digital repositories, hence the importance of models and tools for automating digital curation tasks. The automation of these tasks faces three major challenges: (1) research data and data sources are highly heterogeneous, (2) future research needs are difficult to anticipate, (3) data is hard to index. To address these problems, we propose the Extract, Transform and Archive (ETA) model for managing and mechanizing the curation of research data. Specifically, we propose a scalable strategy for addressing the research-data problem, ranging from the extraction of legacy data to its long-term storage. We review some existing solutions and propose novel avenues of research.”

URL : http://arxiv.org/abs/1108.4041

Beyond the Data Deluge A Research Agenda for…

Beyond the Data Deluge: A Research Agenda for Large-Scale Data Sharing and Reuse :

“There is almost universal agreement that scientific data should be shared for use beyond the purposes for which they were initially collected. Access to data enables system-level science, expands the instruments and products of research to new communities, and advances solutions to complex human problems. While demands for data are not new, the vision of open access to data is increasingly ambitious. The aim is to make data accessible and usable to anyone, anytime, anywhere, and for any purpose. Until recently, scholarly investigations related to data sharing and reuse were sparse. They have become more common as technology and instrumentation have advanced, policies that mandate sharing have been implemented, and research has become more interdisciplinary. Each of these factors has contributed to what is commonly referred to as the “data deluge”. Most discussions about increases in the scale of sharing and reuse have focused on growing amounts of data. There are other issues related to open access to data that also concern scale which have not been as widely discussed: broader participation in data sharing and reuse, increases in the number and types of intermediaries, and more digital data products. The purpose of this paper is to develop a research agenda for scientific data sharing and reuse that considers these three areas.”

URL : http://www.ijdc.net/index.php/ijdc/article/view/163