Modeling Productivity in Open Source GitHub Projects: A Dataset and Codebase

Contains events associated with 16,337 Python PyPI projects from Github from 2012 through Oct 2017. Scraped from Github (acceptable use per terms of service: Also contains analysis of bursts of activity within each project, performed by (1) using a hidden markov model to identify "busy" spans of days, (2) calculating metrics of social and technical dependencies among people and artifacts involved in each burst, and (3) calculating sociotechnical congruence by comparing the social and technical dependency networks.