Understanding the Research Practices and Service Needs of Big Data Researchers at Carnegie Mellon University

There is no universally agreed upon definition of big data. However, in the most general terms, big data includes datasets with a size beyond the ability of commonly used tools to capture, process, transfer, and manage the data. In research enterprises, the use of big data is becoming increasingly common. Researchers collecting and analyzing big data have unique needs throughout each stage of the research lifecycle, from planning to publishing and communicating findings. Understanding researchers’ behavior related to big data activities and practices can provide librarians and information scientists with deeper insight into big data research limitations and help them develop a better understanding of their needs and challenges, which may in turn lead to better services. Libraries are actively seeking avenues to learn about and assist researchers in innovative focuses such as data education and data management. Accordingly, this study aims to understand researchers’ behaviors in big data research practices and their needs and identify opportunities to support those needs. We specifically focused on the researchers from STEAM (science, technology, engineering, arts, and mathematics) disciplines at Carnegie Mellon University (CMU) holding postdoctoral, staff, or faculty status. At CMU, big data research manifests across several disciplines and conceptual areas and encompasses a wide variety of research techniques and topics, as will be discussed in this report.

CMU is one of 21 universities across the United States exploring this topic under the guidance of Ithaka Strategy and Research (Ithaka S+R). Ithaka S+R is a part of ITHAKA, a nonprofit organization supporting the academic community in using digital technologies for advancing research and teaching in a sustainable manner and preserving the scholarly record. This project, titled “Supporting Big Data Research,” was launched in summer 2020 to support the organization’s goal of partnering with academic libraries to better understand faculty research support needs in the areas of big data and data science.