Transparency: the emerging third dimension of Open Science and Open Data

This paper presents an exploration of the concept of research transparency. The policy context is described and situated within the broader arena of open science. This is followed by commentary on transparency within the research process, which includes a brief overview of the related concept of reproducibility and the associated elements of research integrity, fraud and retractions.

A two-dimensional model or continuum of open science is considered and the paper builds on this foundation by presenting a three-dimensional model, which includes the additional axis of ‘transparency’. The concept is further unpacked and preliminary definitions of key terms are introduced: transparency, transparency action, transparency agent and transparency tool.

An important linkage is made to the research lifecycle as a setting for potential transparency interventions by libraries. Four areas are highlighted as foci for enhanced engagement with transparency goals: Leadership and Policy, Advocacy and Training, Research Infrastructures and Workforce Development.


Reproducible Research Practices and Transparency across the Biomedical Literature

There is a growing movement to encourage reproducibility and transparency practices in the scientific community, including public access to raw data and protocols, the conduct of replication studies, systematic integration of evidence in systematic reviews, and the documentation of funding and potential conflicts of interest.

In this survey, we assessed the current status of reproducibility and transparency addressing these indicators in a random sample of 441 biomedical journal articles published in 2000–2014. Only one study provided a full protocol and none made all raw data directly available. Replication studies were rare (n = 4), and only 16 studies had their data included in a subsequent systematic review or meta-analysis. The majority of studies did not mention anything about funding or conflicts of interest.

The percentage of articles with no statement of conflict decreased substantially between 2000 and 2014 (94.4% in 2000 to 34.6% in 2014); the percentage of articles reporting statements of conflicts (0% in 2000, 15.4% in 2014) or no conflicts (5.6% in 2000, 50.0% in 2014) increased.

Articles published in journals in the clinical medicine category versus other fields were almost twice as likely to not include any information on funding and to have private funding. This study provides baseline data to compare future progress in improving these indicators in the scientific literature.

URL : Reproducible Research Practices and Transparency across the Biomedical Literature

DOI : 10.1371/journal.pbio.1002333

Reproducibility, Correctness, and Buildability: the Three Principles for Ethical Public Dissemination of Computer Science and Engineering Research

« We propose a system of three principles of public dissemination, which we call reproducibility, correctness, and buildability, and make the argument that consideration of these principles is a necessary step when publicly disseminating results in any evidence-based scientific or engineering endeavor. We examine how these principles apply to the release and disclosure of the four elements associated with computer science research: theory, algorithms, code, and data. Reproducibility refers to the capability to reproduce fundamental results from released details. Correctness refers to the ability of an independent reviewer to verify and validate the results of a paper. We introduce the new term buildability to indicate the ability of other researchers to use the published research as a foundation for their own new work. This is more broad than extensibility, as it requires that the published results have reached a level of completeness that the research can be used for its stated purpose, and has progressed beyond the level of a preliminary idea. We argue that these three principles are not being sufficiently met by current publications and proposals in computer science and engineering, and represent a goal for which publishing should continue to aim. We introduce standards for the evaluation of reproducibility, correctness, and buildability in relation to the varied elements of computer science research and discuss how they apply to proposals, workshops, conferences, and journal publications, making arguments for appropriate standards of each principle in these settings. We address modern issues including big data, data confidentiality, privacy, security, and privilege. Our examination raises questions for discussion in the community on the appropriateness of publishing works that fail to meet one, some, or all of the stated principles. »


GigaDB promoting data dissemination and reproducibility Often…

GigaDB: promoting data dissemination and reproducibility :

« Often papers are published where the underlying data supporting the research are not made available because of the limitations of making such large data sets publicly and permanently accessible. Even if the raw data are deposited in public archives, the essential analysis intermediaries, scripts or software are frequently not made available, meaning the science is not reproducible. The GigaScience journal is attempting to address this issue with the associated data storage and dissemination portal, the GigaScience database (GigaDB). Here we present the current version of GigaDB and reveal plans for the next generation of improvements. However, most importantly, we are soliciting responses from you, the users, to ensure that future developments are focused on the data storage and dissemination issues that still need resolving. »


Can Tweets Predict Citations? Metrics of Social Impact Based on Twitter and Correlation with Traditional Metrics of Scientific Impact


Citations in peer-reviewed articles and the impact factor are generally accepted measures of scientific impact. Web 2.0 tools such as Twitter, blogs or social bookmarking tools provide the possibility to construct innovative article-level or journal-level metrics to gauge impact and influence. However, the relationship of the these new metrics to traditional metrics such as citations is not known.


(1) To explore the feasibility of measuring social impact of and public attention to scholarly articles by analyzing buzz in social media, (2) to explore the dynamics, content, and timing of tweets relative to the publication of a scholarly article, and (3) to explore whether these metrics are sensitive and specific enough to predict highly cited articles.


Between July 2008 and November 2011, all tweets containing links to articles in the Journal of Medical Internet Research (JMIR) were mined. For a subset of 1573 tweets about 55 articles published between issues 3/2009 and 2/2010, different metrics of social media impact were calculated and compared against subsequent citation data from Scopus and Google Scholar 17 to 29 months later. A heuristic to predict the top-cited articles in each issue through tweet metrics was validated.


A total of 4208 tweets cited 286 distinct JMIR articles. The distribution of tweets over the first 30 days after article publication followed a power law (Zipf, Bradford, or Pareto distribution), with most tweets sent on the day when an article was published (1458/3318, 43.94% of all tweets in a 60-day period) or on the following day (528/3318, 15.9%), followed by a rapid decay. The Pearson correlations between tweetations and citations were moderate and statistically significant, with correlation coefficients ranging from .42 to .72 for the log-transformed Google Scholar citations, but were less clear for Scopus citations and rank correlations. A linear multivariate model with time and tweets as significant predictors (P < .001) could explain 27% of the variation of citations. Highly tweeted articles were 11 times more likely to be highly cited than less-tweeted articles (9/12 or 75% of highly tweeted article were highly cited, while only 3/43 or 7% of less-tweeted articles were highly cited; rate ratio 0.75/0.07 = 10.75, 95% confidence interval, 3.4–33.6). Top-cited articles can be predicted from top-tweeted articles with 93% specificity and 75% sensitivity.


Tweets can predict highly cited articles within the first 3 days of article publication. Social media activity either increases citations or reflects the underlying qualities of the article that also predict citations, but the true use of these metrics is to measure the distinct concept of social impact. Social impact measures based on tweets are proposed to complement traditional citation metrics. The proposed twimpact factor may be a useful and timely metric to measure uptake of research findings and to filter research findings resonating with the public in real time.