Text Data Mining from the Author’s Perspective: Whose Text, Whose Mining, and to Whose Benefit?

Authors : Christine L. Borgman

Given the many technical, social, and policy shifts in access to scholarly content since the early days of text data mining, it is time to expand the conversation about text data mining from concerns of the researcher wishing to mine data to include concerns of researcher-authors about how their data are mined, by whom, for what purposes, and to whose benefits.

URL : https://arxiv.org/abs/1803.04552

Open Data, Grey Data, and Stewardship: Universities at the Privacy Frontier

Author : Christine L. Borgman

As universities recognize the inherent value in the data they collect and hold, they encounter unforeseen challenges in stewarding those data in ways that balance accountability, transparency, and protection of privacy, academic freedom, and intellectual property.

Two parallel developments in academic data collection are converging: (1) open access requirements, whereby researchers must provide access to their data as a condition of obtaining grant funding or publishing results in journals; and (2) the vast accumulation of ‘grey data’ about individuals in their daily activities of research, teaching, learning, services, and administration.

The boundaries between research and grey data are blurring, making it more difficult to assess the risks and responsibilities associated with any data collection. Many sets of data, both research and grey, fall outside privacy regulations such as HIPAA, FERPA, and PII.

Universities are exploiting these data for research, learning analytics, faculty evaluation, strategic decisions, and other sensitive matters. Commercial entities are besieging universities with requests for access to data or for partnerships to mine them.

The privacy frontier facing research universities spans open access practices, uses and misuses of data, public records requests, cyber risk, and curating data for privacy protection.

This paper explores the competing values inherent in data stewardship and makes recommendations for practice, drawing on the pioneering work of the University of California in privacy and information security, data governance, and cyber risk.

URL : https://arxiv.org/abs/1802.02953

On the Reuse of Scientific Data

Authors : Irene V. Pasquetto, Bernadette M. Randles, Christine L. Borgman

While science policy promotes data sharing and open data, these are not ends in themselves. Arguments for data sharing are to reproduce research, to make public assets available to the public, to leverage investments in research, and to advance research and innovation.

To achieve these expected benefits of data sharing, data must actually be reused by others. Data sharing practices, especially motivations and incentives, have received far more study than has data reuse, perhaps because of the array of contested concepts on which reuse rests and the disparate contexts in which it occurs.

Here we explicate concepts of data, sharing, and open data as a means to examine data reuse. We explore distinctions between use and reuse of data.

Lastly we propose six research questions on data reuse worthy of pursuit by the community: How can uses of data be distinguished from reuses? When is reproducibility an essential goal? When is data integration an essential goal? What are the tradeoffs between collecting new data and reusing existing data? How do motivations for data collection influence the ability to reuse data? How do standards and formats for data release influence reuse opportunities?

We conclude by summarizing the implications of these questions for science policy and for investments in data reuse.

URL : On the Reuse of Scientific Data

DOI : http://doi.org/10.5334/dsj-2017-008