10.1184/R1/6397013.v1 Samridhi Choudhary Samridhi Choudhary Christopher Bogart Christopher Bogart Carolyn Rose Carolyn Rose James Herbsleb James Herbsleb Modeling Productivity in Open Source GitHub Projects: A Dataset and Codebase Carnegie Mellon University 2020 Sociotechnical congruence Burstiness collaboration productivity Github Open source HIdden Markov Model 2020-01-21 21:11:15 Dataset https://kilthub.cmu.edu/articles/dataset/Modeling_Productivity_in_Open_Source_GitHub_Projects_A_Dataset_and_Codebase/6397013 Contains events associated with 16,337 Python PyPI projects from Github from 2012 through Oct 2017. Scraped from Github (acceptable use per terms of service: <a href="https://help.github.com/en/github/site-policy/github-terms-of-service">https://help.github.com/en/github/site-policy/github-terms-of-service</a>). Also contains analysis of bursts of activity within each project, performed by (1) using a hidden markov model to identify "busy" spans of days, (2) calculating metrics of social and technical dependencies among people and artifacts involved in each burst, and (3) calculating sociotechnical congruence by comparing the social and technical dependency networks.