file.pdf (23.91 MB)
Download file

Dataset Evolution: Challenges of reusing data for deeper research

Download (23.91 MB)
journal contribution
posted on 01.02.2016, 00:00 by Lisa ZilinskiLisa Zilinski, Kristin Briney, Abigail Goben

Over the summer of 2014, the authors began a multi-year project investigating institutional data policies at U.S. research institutions. The criteria for inclusion in this research study was for an institution to be rated as having “Very High” or “High” research activity as designated by the July 2014 Carnegie list. The authors found that 206 American universities fit the criteria, and 107 were also identified as members of the Association of Research Libraries.

Two datasets were collected. Student assistants and three researchers collected the data over the period of two months. The data points collected by the student assistants were institution type (public or private), total student population, faculty size, and research funding expenditures. The authors collected institutional data policies and information on research data services from the university websites. The policies included both Intellectual Property policies and any other policy that specifically included research data. Additional information about the University’s research data management services was also collected.

The authors first compiled this data to understand the current landscape of institutional data policies in the United States (Briney, Goben, & Zilinski, 2015a). The ensuing dataset (Briney, Goben, & Zilinski, 2015b) opened the doors to further research, including understanding the policies, what they cover, what makes a good policy, and how institutions can move forward with policies that engage stakeholders of all levels (researchers, administration, funders, and publishers).

This data paper will describe the challenges faced by the authors as they dealt with new issues in reusing their own dataset. Some of the issues to de described include fair use (releasing links to institutional policies vs. releasing full policies in the dataset) and sharing with others (not being able to share the dataset as a corpus and determining what data can be shared at what time).




Usage metrics