Carnegie Mellon University
Browse
DATASET
burst_data.csv (11.99 MB)
TEXT
README.md (12 kB)
.ZIP
raw_data.zip (2.61 GB)
1/0
3 files

Modeling Productivity in Open Source GitHub Projects: A Dataset and Codebase

Contains events associated with 16,337 Python PyPI projects from Github from 2012 through Oct 2017. Scraped from Github (acceptable use per terms of service: https://help.github.com/en/github/site-policy/github-terms-of-service). Also contains analysis of bursts of activity within each project, performed by (1) using a hidden markov model to identify "busy" spans of days, (2) calculating metrics of social and technical dependencies among people and artifacts involved in each burst, and (3) calculating sociotechnical congruence by comparing the social and technical dependency networks.

Funding

BIGDATA: Collaborative Research: F: Study of a Cyber-Enabled Social Computing Framework for Improving Practice in Online Computing Communities

Directorate for Computer & Information Science & Engineering

Find out more...

CIF21 DIBBs: Building a Scalable Infrastructure for Data-Driven Discovery and Innovation in Education

Directorate for Computer & Information Science & Engineering

Find out more...

BIGDATA: Collaborative Research: IA: OSCAR - Open Source Supply Chains and Avoidance of Risk: An Evidence Based Approach to Improve FLOSS Supply Chains

Directorate for Computer & Information Science & Engineering

Find out more...

History

Date

2019-12-29

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC