Carnegie Mellon University
Browse

Trouble with the Curve: Identifying Clusters of MLB Pitchers using Improved Pitch Classification Techniques

Download (1.32 MB)
thesis
posted on 2013-05-01, 00:00 authored by Michael A. Pane

The PITCHf/x database, which records the location, velocity, and trajectory of every pitch thrown in Major League Baseball (MLB), has allowed the statistical analysis of MLB to ourish since its introduction in late 2006. Using PITCHf/x, pitches have been classified by hand, requiring considerable effort, or using neu- ral network clustering and classification, which is often difficult to interpret. We use model-based clustering with a multivariate Gaussian mixture model and an adjusted Bayesian Information Criterion to determine the number of different clusters. We verify these results via cross validation, validation by prediction strength, and through visual inspection. Furthermore, we use our method to cluster pitchers into groups with similar characteristics via k-means clustering and the Fisher-wise criterion. Our method builds a strong foundation towards addressing many open MLB research questions, including preventing pitcher in- jury.

History

Date

2013-05-01

Advisor(s)

Andrew Thomas

Department

  • Statistics

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC