The Modern Research Data Portal: a design pattern for networked, data-intensive science

Authors : Kyle Chard, Eli Dart, Ian Foster​, David Shifflett, Steven Tuecke, Jason Williams

We describe best practices for providing convenient, high-speed, secure access to large data via research data portals. We capture these best practices in a new design pattern, the Modern Research Data Portal, that disaggregates the traditional monolithic web-based data portal to achieve orders-of-magnitude increases in data transfer performance, support new deployment architectures that decouple control logic from data storage, and reduce development and operations costs.

We introduce the design pattern; explain how it leverages high-performance data enclaves and cloud-based data management services; review representative examples at research laboratories and universities, including both experimental facilities and supercomputer sites; describe how to leverage Python APIs for authentication, authorization, data transfer, and data sharing; and use coding examples to demonstrate how these APIs can be used to implement a range of research data portal capabilities.

Sample code at a companion web site, https://docs.globus.org/mrdp, provides application skeletons that readers can adapt to realize their own research data portals.

URL : The Modern Research Data Portal: a design pattern for networked, data-intensive science

DOI : https://doi.org/10.7717/peerj-cs.144

Interoperability and FAIRness through a novel combination of Web technologies

Authors : Mark D. Wilkinson, Ruben Verborgh, Luiz Olavo Bonino da Silva Santos, Tim Clark, Morris A. Swertz, Fleur D.L. Kelpin, Alasdair J.G. Gray, Erik A. Schultes, Erik M. van Mulligen, Paolo Ciccarese, Arnold Kuzniar, Anand Gavai, Mark Thompson, Rajaram Kaliyaperumal, Jerven T. Bolleman, Michel Dumontier

Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, Dataverse or EUDAT).

These data have widely different levels of sensitivity and security considerations. For example, clinical observations about genetic mutations in patients are highly sensitive, while observations of species diversity are generally not.

The lack of uniformity in data models from one repository to another, and in the richness and availability of metadata descriptions, makes integration and analysis of these data a manual, time-consuming task with no scalability.

Here we explore a set of resource-oriented Web design patterns for data discovery, accessibility, transformation, and integration that can be implemented by any general- or special-purpose repository as a means to assist users in finding and reusing their data holdings.

We show that by using off-the-shelf technologies, interoperability can be achieved atthe level of an individual spreadsheet cell. We note that the behaviours of this architecture compare favourably to the desiderata defined by the FAIR Data Principles, and can therefore represent an exemplar implementation of those principles.

The proposed interoperability design patterns may be used to improve discovery and integration of both new and legacy data, maximizing the utility of all scholarly outputs.

URL : Interoperability and FAIRness through a novel combination of Web technologies

DOI : https://doi.org/10.7717/peerj-cs.110

Enhancing the Research Data Management of Computer-Based Educational Assessments in Switzerland

Authors : Catharina Wasner, Ingo Barkow, Fabian Odoni

Since 2006 the education authorities in Switzerland have been obliged by the Constitution to harmonize important benchmarks in the educational system throughout Switzerland. With the development of national educational objectives in four disciplines an important basis for the implementation of this constitutional mandate was created.

In 2013 the Swiss National Core Skills Assessment Program (in German: ÜGK – Überprüfung der Grundkompetenzen) was initiated to investigate the skills of students, starting with three of four domains: mathematics, language of teaching and first foreign language in grades 2, 6 and 9. ÜGK uses a computer-based test and a sample size of 25.000 students per year.

A huge challenge for computer-based educational assessment is the research data management process. Data from several different systems and tools existing in different formats has to be merged to obtain data products researchers can utilize.

The long term preservation has to be adapted as well. In this paper, we describe our current processes and data sources as well as our ideas for enhancing the data management.

URL : Enhancing the Research Data Management of Computer-Based Educational Assessments in Switzerland

DOI : http://doi.org/10.5334/dsj-2018-018

A Conceptual Enterprise Framework for Managing Scientific Data Stewardship

Authors : Ge Peng, Jeffrey L. Privette, Curt Tilmes, Sky Bristol, Tom Maycock, John J. Bates, Scott Hausman, Otis Brown, Edward J. Kearns

Scientific data stewardship is an important part of long-term preservation and the use/reuse of digital research data. It is critical for ensuring trustworthiness of data, products, and services, which is important for decision-making.

Recent U.S. federal government directives and scientific organization guidelines have levied specific requirements, increasing the need for a more formal approach to ensuring that stewardship activities support compliance verification and reporting.

However, many science data centers lack an integrated, systematic, and holistic framework to support such efforts. The current business- and process-oriented stewardship frameworks are too costly and lengthy for most data centers to implement.

They often do not explicitly address the federal stewardship requirements and/or the uniqueness of geospatial data. This work proposes a data-centric conceptual enterprise framework for managing stewardship activities, based on the philosophy behind the Plan-Do-Check-Act (PDCA) cycle, a proven industrial concept.

This framework, which includes the application of maturity assessment models, allows for quantitative evaluation of how organizations manage their stewardship activities and supports informed decision-making for continual improvement towards full compliance with federal, agency, and user requirements.

URL : A Conceptual Enterprise Framework for Managing Scientific Data Stewardship

DOI : http://doi.org/10.5334/dsj-2018-015

Communicating data: interactive infographics, scientific data and credibility

Authors : Nan Li, Dominique Brossard, Dietram A. Scheufele, Paul H. Wilson, Kathleen M. Rose

Information visualization could be used to leverage the credibility of displayed scientific data. However, little was known about how display characteristics interact with individuals’ predispositions to affect perception of data credibility.

Using an experiment with 517 participants, we tested perceptions of data credibility by manipulating data visualizations related to the issue of nuclear fuel cycle based on three characteristics: graph format, graph interactivity, and source attribution.

Results showed that viewers tend to rely on preexisting levels of trust and peripheral cues, such as source attribution, to judge the credibility of shown data, whereas their comprehension level did not relate to perception of data credibility. We discussed the implications for science communicators and design professionals.

URL : Communicating data: interactive infographics, scientific data and credibility

DOI : https://doi.org/10.22323/2.17020206

Conceptualizing Data Curation Activities Within Two Academic Libraries

Authors : Sophia Lafferty-Hess, Julie Rudder, Moira Downey, Susan Ivey, Jennifer Darragh

A growing focus on sharing research data that meet certain standards, such as the FAIR guiding principles, has resulted in libraries increasingly developing and scaling up support for research data.

As libraries consider what new data curation services they would like to provide as part of their repository programs, there are various questions that arise surrounding scalability, resource allocation, requisite expertise, and how to communicate these services to the research community.

Data curation can involve a variety of tasks and activities. Some of these activities can be managed by systems, some require human intervention, and some require highly specialized domain or data type expertise.

At the 2017 Triangle Research Libraries Network Institute, staff from the University of North Carolina at Chapel Hill and Duke University used the 47 data curation activities identified by the Data Curation Network project to create conceptual groupings of data curation activities.

The results of this “thought-exercise” are discussed in this white paper. The purpose of this exercise was to provide more specificity around data curation within our individual contexts as a method to consistently discuss our current service models, identify gaps we would like to fill, and determine what is currently out of scope.

We hope to foster an open and productive discussion throughout the larger academic library community about how we prioritize data curation activities as we face growing demand and limited resources.

URL : Conceptualizing Data Curation Activities Within Two Academic Libraries

DOI : https://dx.doi.org/10.17605/OSF.IO/ZJ5PQ

Developing research data management services and support for researchers

Authors : Laure Perrier, Leslie Barnes

This mixed method study determined the essential tools and services required for research data management to aid academic researchers in fulfilling emerging funding agency and journal requirements. Focus groups were conducted and a rating exercise was designed to rank potential services.

Faculty conducting research at the University of Toronto were recruited; 28 researchers participated in four focus groups from June– August 2016. Two investigators independently coded the transcripts from the focus groups and identified four themes: 1) seamless infrastructure, 2) data security, 3) developing skills and knowledge, and 4) anxiety about releasing data.

Researchers require assistance with the secure storage of data and favour tools that are easy to use. Increasing knowledge of best practices in research data management is necessary and can be supported by the library using multiple strategies.

These findings help our library identify and prioritize tools and services in order to allocate resources in support of research data management on campus.

URL : Developing research data management services and support for researchers

DOI : https://doi.org/10.21083/partnership.v13i1.4115