Making Student Research Data Discoverable: A Pilot Program Using Dataverse


The support and curation of research data underlying theses and dissertations are an opportunity for institutions to enhance their ETD collections.

This article describes a pilot data archiving service that leverages Emory University’s existing Electronic Theses and Dissertations (ETDs) program.

Description of program

This pilot service tested the appropriateness of Dataverse, a data repository, as a data archiving and access solution for Emory University using research data identified in Emory University’s ETD repository, developed the legal documents necessary for a full implementation of Dataverse on campus, and expanded outreach efforts to meet the research data needs of graduate students.

This article also situates the pilot service within the context of Emory Libraries and explains how it relates to other library efforts currently underway.

Next steps

The pilot project team plans to seek permission from alumni whose data were included in the pilot to make them available publicly in Dataverse, and the team will revise the ETD license agreement to allow this type of use.

The team will also automate the ingest of supplemental ETD research data into the data repository where possible and create a workshop series for students who are creating research data as part of their theses or dissertations.

Making Student Research Data Discoverable: A Pilot Program Using Dataverse


Disciplinary differences in opening research data

The management and widespread sharing of publicly funded research data has gained significant momentum among governments, funders, institutions, journals and data service providers around the world.

However, there is no ‘one-size-fits-all’ approach to open research data across academic disciplines. Different disciplines produce different types of data and have various procedures for analysing, archiving and publishing it.

This briefing paper presents the current state of open research data across academic disciplines. It describes disciplinary characteristics inhibiting a larger take-up of open research data mandates.

Additionally it presents the current strategies and policies established by funders, institutions, journals and data service providers alongside general data policies.

Disciplinary differences in opening research data

Assessment of and Response to Data Needs of Clinical and Translational Science Researchers and Beyond

Objective and Setting

As universities and libraries grapple with data management and “big data,” the need for data management solutions across disciplines is particularly relevant in clinical and translational science (CTS) research, which is designed to traverse disciplinary and institutional boundaries.

At the University of Florida Health Science Center Library, a team of librarians undertook an assessment of the research data management needs of CTS researchers, including an online assessment and follow-up one-on-one interviews.

Design and Methods

The 20-question online assessment was distributed to all investigators affiliated with UF’s Clinical and Translational Science Institute (CTSI) and 59 investigators responded. Follow-up in-depth interviews were conducted with nine faculty and staff members.


Results indicate that UF’s CTS researchers have diverse data management needs that are often specific to their discipline or current research project and span the data lifecycle. A common theme in responses was the need for consistent data management training, particularly for graduate students; this led to localized training within the Health Science Center and CTSI, as well as campus-wide training.

Another campus-wide outcome was the creation of an action-oriented Data Management/Curation Task Force, led by the libraries and with participation from Research Computing and the Office of Research.


Initiating conversations with affected stakeholders and campus leadership about best practices in data management and implications for institutional policy shows the library’s proactive leadership and furthers our goal to provide concrete guidance to our users in this area.

Assessment of and Response to Data Needs of Clinical and Translational Science Researchers and Beyond

Data Management Plan Requirements for Campus Grant Competitions: Opportunities for Research Data Services Assessment and Outreach


To examine the effects of research data services (RDS) on the quality of data management plans (DMPs) required for a campus-level faculty grant competition, as well as to explore opportunities that the local DMP requirement presented for RDS outreach.


Nine reviewers each scored a randomly assigned portion of DMPs from 82 competition proposals. Each DMP was scored by three reviewers, and the three scores were averaged together to obtain the final score. Interrater reliability was measured using intraclass correlation.

Unpaired t-tests were used to compare mean DMP scores for faculty who utilized RDS services with those who did not. Unpaired t-tests were also used to compare mean DMP scores for proposals that were funded with proposals that were not funded. One-way ANOVA was used to compare mean DMP scores among proposals from six broad disciplinary categories.


Analyses showed that RDS consultations had a statistically significant effect on DMP scores. Differences between DMP scores for funded versus unfunded proposals and among disciplinary categories were not significant. The DMP requirement also provided a number of both expected and unexpected outreach opportunities for RDS services.


Requiring DMPs for campus grant competitions can provide important assessment and outreach opportunities for research data services.

While these results might not be generalizable to DMP review processes at federal funding agencies, they do suggest the importance, at any level, of developing a shared understanding of what constitutes a high quality DMP among grant applicants, grant reviewers, and RDS providers.

Data Management Plan Requirements for Campus Grant Competitions


Archives Ouvertes de la Connaissance. Valoriser et diffuser les données de recherche

Projet commun de l’Université de Strasbourg, l’Université de Haute-Alsace, l’Institut National des Sciences Appliquées (INSA) et la Bibliothèque Nationale et Universitaire (BNU) de Strasbourg, les Archives Ouvertes de la Connaissance offriront aux (enseignants)-chercheurs et doctorants un service pour la valorisation de leurs données de recherche.

Ce mémoire propose, dans un premier temps, de replacer le projet dans le contexte des archives institutionnelles françaises et européennes, afin d’en dégager les spécificités ; dans un second temps, sont présentés les enjeux et les modalités de mise en forme et de diffusion des données de recherche, que produisent les établissements alsaciens partenaires et qui seront liées à l’archive ouverte.


OpenTrials: towards a collaborative open database of all available information on all clinical trials

OpenTrials is a collaborative and open database for all available structured data and documents on all clinical trials, threaded together by individual trial.

With a versatile and expandable data schema, it is initially designed to host and match the following documents and data for each trial: registry entries; links, abstracts, or texts of academic journal papers; portions of regulatory documents describing individual trials; structured data on methods and results extracted by systematic reviewers or other researchers; clinical study reports; and additional documents such as blank consent forms, blank case report forms, and protocols.

The intention is to create an open, freely re-usable index of all such information and to increase discoverability, facilitate research, identify inconsistent data, enable audits on the availability and completeness of this information, support advocacy for better data and drive up standards around open data in evidence-based medicine.

The project has phase I funding. This will allow us to create a practical data schema and populate the database initially through web-scraping, basic record linkage techniques, crowd-sourced curation around selected drug areas, and import of existing sources of structured and documents.

It will also allow us to create user-friendly web interfaces onto the data and conduct user engagement workshops to optimise the database and interface designs.

Where other projects have set out to manually and perfectly curate a narrow range of information on a smaller number of trials, we aim to use a broader range of techniques and attempt to match a very large quantity of information on all trials. We are currently seeking feedback and additional sources of structured data.

OpenTrials: towards a collaborative open database of all available information on all clinical trials

Achieving human and machine accessibility of cited data in scholarly publications

Reproducibility and reusability of research results is an important concern in scientific communication and science policy. A foundational element of reproducibility and reusability is the open and persistently available presentation of research data.

However, many common approaches for primary data publication in use today do not achieve sufficient long-term robustness, openness, accessibility or uniformity. Nor do they permit comprehensive exploitation by modern Web technologies.

This has led to several authoritative studies recommending uniform direct citation of data archived in persistent repositories. Data are to be considered as first-class scholarly objects, and treated similarly in many ways to cited and archived scientific and scholarly literature.

Here we briefly review the most current and widely agreed set of principle-based recommendations for scholarly data citation, the Joint Declaration of Data Citation Principles (JDDCP).

We then present a framework for operationalizing the JDDCP; and a set of initial recommendations on identifier schemes, identifier resolution behavior, required metadata elements, and best practices for realizing programmatic machine actionability of cited data.

The main target audience for the common implementation guidelines in this article consists of publishers, scholarly organizations, and persistent data repositories, including technical staff members in these organizations.

But ordinary researchers can also benefit from these recommendations. The guidance provided here is intended to help achieve widespread, uniform human and machine accessibility of deposited data, in support of significantly improved verification, validation, reproducibility and re-use of scholarly/scientific data.

Achieving human and machine accessibility of cited data in scholarly publications