In my quest to create a predictive set of baseball statistics, I’ve stumbled upon a statistic by which we can assess who is the greatest batter over the last 100 years. So far, by adjusting only for bases, outs and batter age, I have Babe Ruth in front with Ted Williams hot on his heels. But Mike Trout, the only current player in the running, remains within striking distance.
The next step is to bring pitchers into the equation. To do this we need to make some modifications to our new statistic, average individual run production (IRP). The IRP is basically the number of runs the team scores plus the improvement or deterioration of the game “location” during the batter’s plate appearances. This number is divided by the beginning locations (i.e. the number of runs expected to be scored by the end of the half-inning) in order to even out the advantage batters who come to the plate with lots of runners on base have.
For example, during Babe Ruth’s 8,954 plate appearances, his team scored 1,995 runs. The typical batter decreases his team’s location (i.e. increases the number of outs and decreases the number of runners on base) during a plate appearance. Babe Ruth did this also, but he only decreased his team’s locations by 597 runs over his career. The net difference is 1,398 runs. So, this was Ruth’s individual contribution to his teams’ runs. Under the same circumstances, his teams would have scored only 1,094 runs if an average batter had been at the plate.
The formula is (R + E)/B, where R is runs scored, E is the sum of the ending locations, and B is the sum of the beginning locations. For Ruth this results in an average IRP of 1.277 or 27.7% above average.
Except for a home run with no one on base, scoring in baseball is a team effort. A batter reaches base during his plate appearance and is then batted in by another batter. So, crediting all the runs scored during a batter’s plate appearance to only that batter overstates his individual contribution to his team’s score.
Traditional Pitching Statistics
Early in my series of baseball statistics postings I stated that traditional batting statistics, like batting average, on-base percentage and slugging average, were based on statistical contrivances in order to produce individual performance statistics from what is fundamentally a group effort. If anything, traditional pitching statistics are even more contrived.
Starting pitchers are credited with a “win” if they pitch at least 5 innings and leave the game ahead in the score and if the rest of the team can maintain the lead to the end of the game. Relievers are credited with a “save” if they maintain the lead and do not allow more than three runs. Both of these statistics depend severely on the performance of other players.
Earned run average is the number of earned runs allowed per 27 outs. Even if we ignore the problem of inconsistent error scoring, this statistic suffers from the same inaccuracy that the batting statistics suffer from. Preventing runs is as much a team effort as scoring them is. When a relief pitcher comes into a game, any runs scored by inherited base runners are credited to the starting pitcher. This tends to overstate the relief pitcher’s contribution to preventing runs. But relievers typically enter a game when the location, i.e. the number of outs and base runners, is very precarious. Under those conditions, it is harder to prevent subsequent batters from reaching base and scoring even more runs.
What is needed is a statistic that doesn’t rely on errors at all, gives proportional credit and blame to both starters and relievers and takes the location of the game into account.
The Pitcher Formula
Pitchers don’t typically face just one batter during a game. They face a sequence of batters. Consequently, to assess their relative performance we need to know the beginning and ending game locations for each sequence and all the runs scored in between. This allows us to calculate the number of runs the pitcher should be credited with allowing and the number of runs the average pitcher would have allowed under the same circumstances.
I’ll name the pitcher statistic the individual run allowance (IRA) and the formula is (R + E)/(B + nH) , where R, E and B are defined the same as in the IRP formula, n is the number of outs recorded by the pitcher after his beginning inning and H is the number of runs scored per out allowed by the average pitcher.
Pitching is more specialized than batting. Pitchers are normally known as being either a starter or a reliever. So, I’ve segregated pitchers into these two categories. Table 1 lists the top 25 starting pitchers by average IRA for those who have recorded at least 3,000 career outs. Table 2 lists the top 25 relief pitchers by average IRA for those who have recorded at least 1,000 career outs.
Only two active batters made the top 25 list (i.e. Mike Trout and Joey Votto). However, the top 25 list of starters includes seven active pitchers, including the top two, Clayton Kershaw and Jacob deGrom. Nine active relievers made their top 25 list, including the top one, Craig Kimbrel. None of these three pitchers is close to retirement, so their career rankings can still change. And I have yet to adjust for age and ballpark.
Twelve of the top 25 starters pitched during the 21st century, but nearly all the top 25 relievers (i.e. 23) pitched in the current century. This shows how much the role of pitcher has changed over time. The game is now dominated by closers.
Lastly, notice that 10 of the top 25 starters are left-handed, whereas only three of the top 25 relievers are southpaws. We’ll explore this factoid in later postings.
The next step is to adjust for pitcher age and tackle the lefty vs. righty dichotomy.