Toward Reproducible Computational Research: An Empirical Analysis of Data and Code Policy Adoption by Journals

Journal policy on research data and code availability is an important part of the ongoing shift toward publishing reproducible computational science. This article extends the literature by studying journal data sharing policies by year (for both 2011 and 2012) for a referent set of 170 journals.

We make a further contribution by evaluating code sharing policies, supplemental materials policies, and open access status for these 170 journals for each of 2011 and 2012.

We build a predictive model of open data and code policy adoption as a function of impact factor and publisher and find higher impact journals more likely to have open data and code policies and scientific societies more likely to have open data and code policies than commercial publishers.

We also find open data policies tend to lead open code policies, and we find no relationship between open data and code policies and either supplemental material policies or open access journal status.

Of the journals in this study, 38% had a data policy, 22% had a code policy, and 66% had a supplemental materials policy as of June 2012. This reflects a striking one year increase of 16% in the number of data policies, a 30% increase in code policies, and a 7% increase in the number of supplemental materials policies.

We introduce a new dataset to the community that categorizes data and code sharing, supplemental materials, and open access policies in 2011 and 2012 for these 170 journals.

URL : http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0067111

Science as an open enterprise The Science…

Science as an open enterprise :

“The Science as an open enterprise report highlights the need to grapple with the huge deluge of data created by modern technologies in order to preserve the principle of openness and to exploit data in ways that have the potential to create a second open science revolution.
Exploring massive amounts of data using modern digital technologies has enormous potential for science and its application in public policy and business. The report maps out the changes that are required by scientists, their institutions and those that fund and support science if this potential is to be realised.”

URL : http://royalsociety.org/uploadedFiles/Royal_Society_Content/policy/projects/sape/2012-06-20-SAOE.pdf

Digitize Me, Visualize Me, Search Me : Open Science and its Discontents

[…] Digitize Me, Visualize Me, Search Me takes as its starting point the so-called ‘computational turn’ to data-intensive scholarship in the humanities.

The phrase ‘the computational turn’ has been adopted to refer to the process whereby techniques and methodologies drawn from (in this case) computer science and related fields – including science visualization, interactive information visualization, image processing, network analysis, statistical data analysis, and the management, manipulation and mining of data – are being used to produce new ways of approaching and understanding texts in the humanities; what is sometimes thought of as ‘the digital humanities’.

The concern in the main has been with either digitizing ‘born analog’ humanities texts and artifacts (e.g. making annotated editions of the art and writing of William Blake available to scholars and researchers online), or gathering together ‘born digital’ humanities texts and artifacts (videos, websites, games, photography, sound recordings, 3D data), and then taking complex and often extremely large-scale data analysis techniques from computing science and related fields and applying them to these humanities texts and artifacts – to this ‘big data’, as it has been called.

Witness Lev Manovich and the Software Studies Initiative’s use of ‘digital image analysis and new visualization techniques’ to study ‘20,000 pages of Science and Popular Science magazines… published between 1872-1922, 780 paintings by van Gogh, 4535 covers of Time magazine (1923-2009) and one million manga pages’ (Manovich, 2011), and Dan Cohen and Fred Gibb’s text mining of ‘the 1,681,161 books that were published in English in the UK in the long nineteenth century’ (Cohen, 2010).

What Digitize Me, Visualize Me, Search Me endeavours to show is that such data-focused transformations in research can be seen as part of a major alteration in the status and nature of knowledge. It is an alteration that, according to the philosopher Jean-François Lyotard, has been taking place since at least the 1950s.

It involves nothing less than a shift away from a concern with questions of what is right and just, and toward a concern with legitimating power by optimizing the social system’s performance in instrumental, functional terms. This shift has significant consequences for our idea of knowledge.

[..] In particular, Digitize Me, Visualize Me, Search Me suggests that the turn in the humanities toward datadriven scholarship, science visualization, statistical data analysis, etc. can be placed alongside all those discourses that are being put forward at the moment – in both the academy and society – in the name of greater openness, transparency, efficiency and accountability.

URL : http://livingbooksaboutlife.org/pdfs/bookarchive/DigitizeMe.pdf

Building an Open Data Repository: Lessons and Challenge

Author : Limor Peer

The Internet has transformed scholarly research in many ways. Open access to data and other research output has been touted as a crucial step toward transparency and quality in science. This paper takes a critical look at what it takes to share social science research data, from the perspective of a small data repository at Yale University’s Institution for Social and Policy Studies.

The ISPS Data Archive was built to create an open access digital collection of social science experimental data, metadata, and associated files produced by ISPS researchers, for the purpose of replication of research findings, further analysis, and teaching.

This paper describes the development of the ISPS Data Archive and discusses the inter-related challenges of replication, integration, and stewardship. It argues that open data requires effort, investment of resources, and planning. By itself, it does not enhance knowledge.

URL : http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1931048