10.1184/R1/6397013.v1
Samridhi Choudhary
Samridhi
Choudhary
Christopher Bogart
Christopher
Bogart
Carolyn Rose
Carolyn
Rose
James Herbsleb
James
Herbsleb
Modeling Productivity in Open Source GitHub Projects: A Dataset and Codebase
Carnegie Mellon University
2020
Sociotechnical congruence
Burstiness
collaboration
productivity
Github
Open source
HIdden Markov Model
2020-01-21 21:11:15
Dataset
https://kilthub.cmu.edu/articles/dataset/Modeling_Productivity_in_Open_Source_GitHub_Projects_A_Dataset_and_Codebase/6397013
Contains events associated with 16,337 Python PyPI projects from Github from 2012 through Oct 2017. Scraped from Github (acceptable use per terms of service: <a href="https://help.github.com/en/github/site-policy/github-terms-of-service">https://help.github.com/en/github/site-policy/github-terms-of-service</a>). Also contains analysis of bursts of activity within each project, performed by (1) using a hidden markov model to identify "busy" spans of days, (2) calculating metrics of social and technical dependencies among people and artifacts involved in each burst, and (3) calculating sociotechnical congruence by comparing the social and technical dependency networks.