Kellen R. Taylor


Since the revealing of “Moneyball”, baseball organizations have increased its focus on the importance of statistical analysis (Lewis, 2003). This study attempts to create a model using baseball statistics that can predict a team’s wins, more accurately than the Pythagorean formula. The Pythagorean formula measures actual or projected runs scored against runs allowed and projects a team’s won-loss percentage (James, 1980). While this measure is accurate within reason, it excludes traditional and newer statistical measures from the equation. The author hypothesizes that a better model can be produced from more advanced statistical analysis, since Bill James’ developed the formula through experimental observation (James, 1980). This study uses backward elimination regression analysis from batting, pitching, and fielding statistics beginning with the 2005 season through the 2014 season to create a formula. The purpose of this study is to determine whether backward elimination regression will create a model that is more accurate at predicting wins than the Pythagorean formula. An additional forced entry regression analysis finds the amount of variance accounted for by the variables included in the Pythagorean formula. R2 values from both analyses were compared and the SEE from each equation was compared. The results indicate that a better model was created W = 28.723 +(.076*runs)+(.148*OPS+)+(.437*saves)-(.065*runs allowed)+(.09*ERA+)+ (1.537*SO/W) Equation 1. This model accounted for 92.7% of the variance while the Pythagorean formula variables accounted for 86.8% of the variance. The SEE of each formula resulted in the regression model, SE=2.99, being slightly better than the Pythagorean formula, SE=4.02. This study suggests that the model created through this study is about one game more accurate at predicting wins in a season.


Amanda Paule-Koba

Second Reader

David Tobar








Sport Administration