The Ultimate Baseball Question

How does one assess individual production within a group activity, like manufacturing? There’s overhead and sunk costs, the law of diminishing marginal product and substitutability of labor and capital to contend with. Accountants and economists have struggled with this problem since the invention of money itself.

So have baseball statisticians, which brings me to today’s topic. How can we measure an individual player’s contribution to his team’s score? It’s not just the runs he scores himself, because that often relies on who batted him in. And it’s not just the runs he bats in, because that depends on who batted previously.

To solve this riddle, statisticians invented many of the terms and concepts we associate with the fundamental parts of baseball today. For example, the idea of a “base hit” has nothing to do with the design or execution of the game of baseball. If the batter puts the ball into play on the ground and beats the ball to first base, he is safe. Whether it’s a “single” or an “error” or a “fielder’s choice” is irrelevant.  These designations are merely statistical contrivances to facilitate measuring the productivity of an individual batter.

With that history in mind, we need to construct an individual batting statistic that is congruent with the goals of this study, that is, to predict scoring based on the six discrete dimensions discussed in previous posts plus several effect modifiers, like who is batting, pitching, etc. Even if the traditional individual batting performance measures, e.g. batting average (BA), on-base percentage (OBP) and slugging percentage (SLP), did not suffer from many flaws, they would not serve this purpose well. So, a completely new type of statistic is called for.

Traditional Batting Statistics

The flaws of these statistics are as well-known as their many proposed remedies. BA weights all base hits equally but ignores bases on balls and advancing base runners (i.e. sacrifice bunts and fly balls). OBP counts bases on balls but still counts a single as much as a homerun. Like BA, SLP only counts base hits but does weight doubles more than singles and so on. However, the weights (i.e. 4 for a homerun, 3 for a triple, etc.) are arbitrarily derived.

Modern statistics, such as Pete Palmer’s and John Thorn’s Linear Weights or Weighted Runs Created (wRC), improve upon the traditional batter performance measures, yet still rely on the same flawed contrivances, like base hits and sacrifice flies. Adding OBP and SLP together (aka OSP) is also a popular remedy, but this literally compounds the flaws rather than eliminates them.

A common flaw of all these statistics is that they suffer from confounding, the assignment of a spurious causal association between two variables due to missing information. Let me explain via anecdote.

When I was 13 years old, I started wearing a hat, because I thought it would look “cool”. This was before I realized that I was genetically incapable of judging what other people consider cool. My father saw me and said “Take that hat off! Don’t you know it will make you go bald?”

I thought about this for a while. Bald people wear hats to protect their bare scalps from the sun. Ergo, most bald people wear hats and most people with hair do not. My father had observed this and correctly deduced that there was a causal relationship between wearing a hat and going bald. Only, he got the causal direction wrong. Bald people are not bald because they wear hats. They wear hats because they are bald. His analysis suffered from confounding.

In baseball statistics, confounding results in a batter’s relative productivity being over or under measured. For example, some stadiums are easier to score in than others. A lot of work has gone into trying to figure out how many more homeruns Babe Ruth hit because Yankee Stadium had a short right field fence or how many fewer homeruns Willie Mays hit because he played in wintery Candlestick Park.

Adjusting for stadium effects is commendable, but the old and new statistics fail to adjust for the most reliable determinant of scoring of all, the multi-dimensional location of the ball game. A batter’s BA, OBP, and SLP all improve dramatically with runners on base. Some batters slogged it out during eras when scoring was relatively hard to do. I call this baseball’s Death Valley Days (another 1950’s TV show reference!), – I’m talking about you Mickey Mantle, Willie Mays and Hank Aaron – while others enjoyed the bountiful 1920’s and 30’s. The average number of runners on base when the terrific trio were at the plate was 0.64, 0.63 and 0.66, respectively. These are all close to the overall average of 0.64, but each of these guys batted third in the order. Their number-of-runners-on-base averages should have been higher. The corresponding numbers for Babe Ruth, Lou Gehrig and Rogers Hornsby were 0.73, 0.78 and 0.74, about 17% higher.

Mantle, Mays and Aaron played against a pervasive headwind that is not accounted for by the traditional statistics. What is needed is a statistic that recognizes how the game is designed and played and does not rely on subjectively determined events, like errors and base hits.

What’s that you hear? Why, it’s the William Tell Overture signaling the Lone Economist coming to the rescue. Hiyo Silfur!

Individual Run Production

If you recall from previous posts, baseball has six dimensions. At the individual pitch level there are exactly 288 discrete “locations” the game can find itself in. These range from the very start of the half-inning (i.e. no balls or strikes, no outs and the bases are empty) to a full count (i.e. three balls and two strikes), two outs and the bases are loaded. Some of these locations are more propitious for scoring than others, i.e. the Baseball Red-Zone.

At the individual plate appearance (PA) level, each PA starts with a zero count (i.e. no balls and no strikes). So, when we assess the change in the team’s prospects for scoring from the beginning of one PA to the beginning of the next, we need only consider four dimensions (i.e. outs and the three bases) and 25 possible outcomes (i.e. eight configurations of the three bases (2 x 2 x 2) times three values of outs, plus one for the end of the half-inning).

That last paragraph may be hard to understand at first, so let me explain via example. Below is a heat map of the average additional runs scored for the 24 starting locations. These figures cover the 102-year period from 1918 to 2019. My source, Retrosheet, provides data on all games played from 1932 to 2019, but from 1918 to 1931, only 75% of major league baseball games are covered. So, some of the games played by the likes of Babe Ruth and Ty Cobb are missing. But as a professional statistician, the Lone Economist has an aversion to discarding useful data. So, I left those years in.

Now suppose a batter is first up in a half-inning. The game’s “location” is at the bottom right-hand cell, there are no outs and no one is on base. Under average conditions, the batting team would be expected to score 0.49 runs by the end of the half-inning. Just what impact on the team’s expected runs can the individual batter make? There are exactly five possible outcomes by the end of this PA. He can reach first, second or third safely; score a run or be out. That’s it. None of the remaining 19 locations are possible.

If he reaches first base safely, he increases expected additional runs from 0.49 to 0.86, i.e. 0.37. Reaching second increases expected runs by 0.61 and reaching third increases it by 0.83. If he is out, expected additional runs decrease from 0.49 to 0.27 or by 0.22. A homerun doesn’t change expected additional runs at all (i.e. the next batter starts at 0.49 also), but a run is scored so that is the best possible outcome from the PA. The following table lists the possible individual run production (IRP) outcomes when there are no outs and no one is on base.

Notice that the IRP difference between a homerun and an out (1.22) is approximately double the difference between a single and an out (0.59). Remember that slugging percentage assumes this ratio is four, not two. Of course, this is true only for the special case when the bases are empty and there are no outs. But even under different conditions, a homerun is worth far less than four singles, usually less than two.

How about when the bases are loaded and there are no outs? That’s more complicated because there are a lot more than five possible outcomes in that scenario. The exact number of possible outcomes is 24.  These include all 24 PA locations except for one, the bases loaded and two outs. If the batter hits into a double play, there can be no more than two runners left on base.  Plus, there is the outcome of the triple play which ends the half-inning. Here is a heat map of 23 of those possible outcomes.

I won’t explain the value of every cell, but I can explain a couple of examples. The top right-hand cell is the outcome where the batter either walks, is hit by a pitch, reaches first due to a fielding error or hits a single and each baserunner advances one base. One run is scored and there is no change in the expected runs specified in Figure 1, the game is at the same location when the next batter comes to the plate. So, the IRP is one run exactly.

The bottom right-hand cell is the outcome when the batter hits a grand slam, i.e. the bases are cleared and there are still no outs. Four runs are scored, but the average additional runs from the start of the PA to the start of the next PA falls from 2.27 to 0.49. So, the IRP of that cell is 4 – 2.27 + 0.49 = 2.22.

The only possible outcome missing from Figure 3 is the triple play. If no runs are scored, the average additional runs from Figure 1 drops from 2.27 to zero.  Therefore, the IRP would be -2.27. It is possible that a run scores before the third out is recorded. In that case the IRP would be -1.27. The average value over the past 102 years is -1.63.

From Figure 3, we can see that relative to an out where no one scores (i.e. -0.70), a homerun is worth 2.92 (2.22 + 0.70) runs and a single is worth 1.7 (1.00 + 0.70) runs. The ratio of a homerun to a single is less than 2. Consequently, we can see how much slugging percentage over-values homeruns relative to singles.

Average IRP

The above discussion establishes the basis for our new statistic. Every time a batter comes to the plate, he is at one of the 24 locations. What he does with this opportunity depends on his ability and chance. He is credited with any runs that score from his plate appearance, i.e. Runs Batted-In plus any runs scored due to fielding errors (RBI+), and the change in the game location.

For example, suppose the game location is the bases are loaded and there is one out. According to Figure 1, expected runs are 1.57 (top row, middle column). This is a very favorable location, the third highest out of 24.

Suppose the batter hits a fly ball to right field, the runners on third and second tag up. The third base runner scores a run and the runner on second advances to third base. The batter is out, but one runner scores and another is closer to home. This might be considered a good outcome for the batter, but is it really?

The game location moves from 1.57 in Figure 1 to 0.51, two outs and runners on first and third. The IRP of that PA is therefore 1 (the run batted-in) + 0.51 – 1.57 (the change in the game location) = -0.06.

The negative value seems to indicate that this was not a good outcome, but we need to consider what the alternatives are to put this outcome into context. The batter could have hit a grand slam with an IRP of 2.7 (4 + .027 – 1.57) or into an inning-ending double play with an IRP of -1.57. Compared to that worst-case scenario, the -0.06 IRP is an improvement of 1.51 runs. An above-average batter might be disappointed with the outcome, but a below-average batter would be happy to hit the fly ball to right field.

The creators of the traditional statistics didn’t have a good solution to measuring this outcome. They labeled it a “sacrifice” and excluded it from batting average and slugging percentage, thus violating a cardinal tenet of statistical analysis to count all useful information. On-base percentage is even worse. A sacrifice is counted in the denominator, so it is just as bad as a strike out. And it counts double plays the same as single outs.

We take the number of IRPs and divide it by the number of plate appearances to calculate the average IRP. Although I used Figure 1 to explain the IRP concept, actual IRPs should be calculated using annual averages. So, when calculating Barry Bonds’ IRP in 2004, for example, I used the average runs for each game location in 2004.

Notice there is no reliance on base hits, errors, sacrifice flies or fielder’s choices. All plate appearances count. Nothing is excluded.  Batters that hit into double and triple plays are fully penalized. Batters who advance base runners are given proportional credit.

From an economist/statistician viewpoint, the beauty of this statistic is that it adheres to the “adding up” constraint. When devising a system of equations – in this case each batter’s statistic represents one equation – the sum of the individual parts should equal the total. By the way this statistic is defined, at the end of the year the sum of all players’ IRPs will be equal to the sum of runs scored during the season.

Career Average IRP vs. OPS

Figure 4 lists the top 25 players by average IRP. Only players with at least 3,000 plate appearances during the 1918-2019 time-span are ranked. For comparison sake, the right-hand column ranks each player’s OSP (on-base percentage plus slugging percentage) statistics.

The first thing to note is how similar the two rankings are. The top seven players are the same in both rankings. Babe Ruth, Ted Williams and Lou Gehrig are at the top of both lists. Several other familiar names also appear in both Top 25 lists: Joe DiMaggio, Mickey Mantle, Stan Musial, Willie Mays, etc.

The next things to notice are the players that fair much better with this new ranking as compared to that by OSP. Hank Aaron rises from 33rd by OSP to 22nd by average IRP. Ty Cobb jumps from 47th to 15th. And this was for only some of his games played after his prime years.

There are a few players who fair relatively poorly and are not shown in Figure 4. For example, Vladimir Guerrero drops from 27th by OSP to 121st by average IRP. Alex Rodriguez drops from 32nd to 47th

RBI+ vs. Change in Game Location

For anyone who thinks this is simply an RBI per plate appearance statistic, lets break average IRP into its two separate parts, RBI+ and the change in game location (ΔGL). In the aggregate, RBI+ equals the negative value of ΔGL. Figure 4 shows this to be 0.118 vs. -.117. This happens because when runners on base are batted in, the game location usually becomes less favorable.  Therefore, players who tend to have above average RBI+ per plate appearance will have below average ΔGL.

Hank Greenberg is an extreme example of this relationship. He has the highest RBI+ average over the last 102 years, but ranks only 1,312th (out of 1,598) in average ΔGL. I doubt that it is just a coincidence that he also had the highest average number of runners on base (0.82) when he came to bat. The 102-year average of this statistic is 0.64.

Despite his high number of runners on base, Greenberg did not enjoy the highest average game location when he came to bat. He ranked 30th in this department, below the king of cleanup hitters himself, Lou Gehrig.

Greenberg was known as a slugger, not for his speed around the bases. He was called Hammerin’ Hank before Aaron went by that nom-de-guerre.  So, a high average RBI+ and a low average ΔGL might be a marker for a power hitter. If so, then Gehrig, DiMaggio, Ramirez, McGwire and Aaron fit this description.

But what about Bonds and Mantle? They rank 127th and 164th, respectively, in average RBI+ and 36th and 33rd in average ΔGL. If those two weren’t power hitters, then nobody was. So, the power vs. average divide does not explain this relationship.

But to prove that a high average RBI+ does not always result in a low average ΔGL (and vice-versa), look at Babe Ruth and Ted Williams. Ruth’s average RBI+ was second only to Greenberg’s, but his average ΔGL was ranked 34th. Ted Williams had the 9th best average RBI+, but his average ΔGL was ranked 10th. Those two guys were great no matter the situation.

Who Was the Greatest Batter?

This is the ultimate question for a baseball statistician. However, the objective of 6-D statistics is to change the analytical focus (i.e. perspective) from comparing players’ abilities to predicting outcomes during a game. It is therefore ironic that a by-product of this refocusing is a statistic that attempts to answer the ultimate question. The Lone Economist admits that he would love to discover some new nugget of information that sheds light on the answer.

Average IPR is just the start. I plan to look at many other factors that affect baseball productivity. But before I end this post, I want to address an obvious source of confounding that even average IPR suffers from. I am referring to the low ranking of batters who played during the 1960’s (e.g. Mantle, Mays and Aaron) relative to those of the 1920’s and 1930’s (e.g. Ruth, Gehrig, and Hornsby) and 1990’s and 2000’s (e.g. Bonds, Ramirez and McGwire).

A change in the way baseballs were manufactured and the banning of the spitball in 1920 likely inflated the batting statistics of the 20’s and 30’s. Performance-enhancing drugs fueled the scoring surge of the 90’s and 2000’s.  So how can we reduce the effects of these confounding missing factors and level the playing field?

Notice that the average beginning game location was lower for Mantle, Mays and Aaron (i.e. 0.471, 0.459, and 0.464) than for any other member of the Top 25 club except for Mike Trout, the only active player to make the list. This happened not because they were on poor-hitting clubs, but because they played during a poor-hitting era.

One way to correct for this difference is to calculate the average IRP as the percentage change from the average beginning game location. Figure 5 does this and recalculates the rankings. Notice that Mays and Aaron rise from 18th and 22nd, respectively, to 13th and 14th. Mickey Mantle jumps from 9th to 4th and Mike Trout climbs in the rankings from 11th to 6th. And I’m happy to see Frank Robinson, Dick Allen and Wille Stargell climb into the Top 25. The traditional statistics never treated these great hitters fairly.

But we still have far to go to answer the ultimate question.

Published by TheLoneEconomist

I am a PhD economist who studies just about anything and proudly specializes in nothing.

4 thoughts on “The Ultimate Baseball Question

  1. I like the personal touch. Did your father really tell you not to wear a hat? I don’t think I can picture you in a hat.

    I notice Mel Ott in the last table. Anyone who does crosswords knows he was a Giant.

    I’ll have to take your word about the statistical parts?? ________________________________


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: