6-D baseball statistics rely on 288 discrete “locations” that result from the 6 dimensions of baseball: outs, first, second and third base, balls and strikes. This is kind of a “quantum” alternative to traditional baseball statistics. In fact, I might start calling it Quantum Baseball. I’ll explain this in greater detail in a later post.
Although the power of this new type of statistic is predictive – unlike traditional baseball statistics – an interesting side-benefit is that it helps answer the question: who is the greatest hitter in baseball history? Our initial results showed this to be a two-man race between Babe Ruth and Ted Williams with Lou Gehrig in third place.
There are several effect modifiers that need to be considered to provide a more refined answer to this question. I introduced age as a leading candidate in my last post. Age is particularly important because Babe Ruth and Ted Williams had very different career paths. Babe Ruth didn’t become a full-time batter until his mid-twenties and retired at age 40. Ted Williams started his career at age 21 and retired at 42, but missed several years in between serving in the U.S. Marine Corps. Ruth’s best year – the best year any batter ever had – was his first as a New York Yankee when he was only 25 years old. His productivity declined steeply in his late 30’s, while Ted Williams’ best season in his remarkable career occurred when he was 39 years old, when most players are a decade past their prime.
To adjust for age, we need to select a common weighting scheme and apply it to every batter. Figure 1 is the distribution of batter age for full-time batters from 1918-2019. That will be used as our weighting scheme.
Applying a common weighting scheme to all batters would be easy if not for one significant problem. How do we give a 10% weight to a season in which a batter didn’t play? For example, Ted Williams missed the entire 1945 season. He turned 27 during that year. His IRP wasn’t zero that year. It was null. It didn’t exist.
Therefore, the next step is to, in effect, estimate what Ted Williams’ individual runs production (IRP) statistic would have been if he had played baseball in his mid-twenties and what Babe Ruth’s IRP would have been in his early twenties if he had not been a pitcher at that age.
From my previous post, I postulated that baseball batters share a common trajectory over their careers. They improve from their rookie season, reach their prime somewhere in the middle and slowly decline as they approach retirement. The result of this model specification is Figure 2.
What this arc says is that batters typically reach their prime during ages 27-29. Most retire by the time they reach their late 30’s. Only extraordinary batters are good enough to play into their 40’s.
The steps to calculating a batter’s age-adjusted average IRP are
- Estimate a batter’s yearly average IRP for his missing years
- Calculate a weighted average IRP using the common weights illustrated in Figure 1.
First, we identify every season of the batter’s career in which he had at least 300 plate appearances. This is to remove seasons in which the player didn’t play enough to establish his batting ability at that age. Taking the relative IRP-by-age percentages from Figure 2, we estimate the average IRP for each missing season to be the average of the observed yearly IRPs weighted by the relative percentage at that age.
For example, Ted Williams missed the entire 1945 season – a year in which he would have been in his prime. For the 17 seasons he did play, his weighted IRP (relative to his prime years) was 1.292 or 29.2% greater than the IRP for the average batter. So, if Ted Williams had played that year, we estimate that his IRP would have been 1.292.
I’ll illustrate how this calculation works for Ruth and Williams. Figure 3 shows the estimated and actual average IRP values over Babe Ruth’s career. The orange dots represent the actual IRPs for seasons in which he had at least 300 plate appearances. [The season when he was age 24 is missing due to incomplete data.] The blue dots are the estimated values based on the age trajectory from other batters and Ruth’s actual IRPs.
Notice that the orange dot at age 25 corresponds to a 1.406 IRP. That’s a record. The highest observed for any batter since 1918. Also notice that at age 32, Ruth’s average IRP was 1.304. For almost any other batter that would have been a career best, but it was only an average year for Ruth in terms of IRP. It also happens to be the year in which he hit 60 homeruns – an awe-inspiring baseball record that stood for 35 years – further evidence that homeruns are overrated as a measure of hitting ability.
Figure 4 shows the actual vs. estimated average annual IRPs by age for Ted Williams. Williams’ actual IRPs are in green and his estimated IRPs are in red. Notice that the highest green dot appears at age 39. Williams’ IRP that year was 1.352. That is one of the highest annual IRPs ever recorded and doubly impressive since less than one percent of batters even play at that age.
Also notice that at age 23 Williams’s average IRP was 1.328 – excellent – but only the third best of his career. That was the year his batting average was .406 and is frequently cited as one of the greatest achievements in baseball history. This is further evidence that batting average is a woefully inadequate statistic for measuring batting ability.
If we put the estimated age-trajectories for both players in the same graph, we get Figure 5. The two trajectories are almost coincident, that is, they are virtually identical. Ruth just barely edges out Williams.
Figure 6 summarizes the age-adjusted average IRP’s for the top 25 batters plus Pete Rose. Ruth, Williams and Gehrig still top the list. However, at 0.4%, the difference between Ruth and Williams is barely visible. Ty Cobb jumps from 47th, according to on-base plus slugging percentage (OSP), to 4th, according to age-adjusted IRP. This is especially impressive since this average is based on only one fourth of his lifetime plate appearances that occurred after his physical prime. If I had more complete data on Cobb, I might conclude that he was the greatest batter ever, but we’ll probably never know.
As expected, two famous batters whose career averages suffered because they played well into their 40’s benefitted the most from age-adjustment. Carl Yastrzemski jumped from 184th, according to OSP, to 24th, according to age-adjusted IRP. Pete Rose jumped from 532nd to 88th according to the same measures. Not bad for a self-proclaimed singles hitter.
Also as expected, the two active players on the list declined slightly in rank due to age adjustment. Mike Trout falls from 6th to 8th and Joey Votto falls from 16th to 17th. At 37 this year, Votto is past his prime, but still the highest ranked batter not born in the U.S. (he’s from Canada).
Greatest of All Time and Still in His Prime?
I’ll end this post with something for you to ponder. There are only seven batters ranked higher than Mike Trout. All of them batted left-handed except for Mickey Mantle, a switch-hitter, and Rogers Hornsby, for whom we only have partial data and who played during an era of exceptionally weak pitching. This means that Mike Trout is quite possibly the greatest right-handed batter to ever play baseball and at age 29 this August, he’s still in his prime. And get this, since most pitchers are right-handed, batting right-handed is a disadvantage. Once I adjust for handedness and pitching quality, Mr. Trout might just be revealed to be the greatest batter of all time, period.
I know of only a handful of all-time-greatest-players in their respective sports who played during my lifetime. Michael Jordan, Tiger Woods and Serena Williams are three that come to mind. I regret that I failed to see any of them play in person during their primes when I had the chance. Now that I realize Trout’s place in baseball history, I am precluded from seeing him in person due to the coronavirus pandemic.
If you ever have the chance, you really should make an effort to see him in the flesh. It’s something we can tell our grandkids about.