A long time ago near the start of my career as an economist,
my mentor at that time sat me down to tell me one word. “Specialize” he advised.
I outwardly acknowledged his wisdom while secretly knowing that if
specialization was necessary for success in our profession, then I was doomed.
Since that conversation I’ve published peer-reviewed
articles on a bizarrely diverse set of subjects. I started with an article on how
to accurately measure consumer welfare in the face of non-linear prices. I then
segued to the effects that unexpected changes in wheat harvests have on net
exports in Australia. Next I published evidence that professional baseball
players with long-term contracts don’t work as hard as other players. I
followed that up with an oft-cited article that proved gamblers do not randomly
select lottery numbers even though they would be better off if they did. My
seemingly random list of articles also includes subjects such as the effect of
physician fee capitation on consumer satisfaction, a test of the arbitrator
exchangeability hypothesis using final-offer arbitration data from Major League
Baseball, and how to estimate personal consumption using a statistical modelling
technique called “instrumental variables”.
The closest I have come to a “specialty” is a series of
articles dealing with cancer care, including cervical cancer in the
Vietnamese-American population, the incidence of colorectal cancer by bowel
section and the effect of distance to provider on the incidence of breast
cancer. I’ve also published articles about the effectiveness of an
employer-sponsored weight-management program and the rate of moral hazard
caused by health insurance.
I should mention that I published these articles and much more while working at four different universities, a state agency, a county government and four different consulting companies. I’ve testified as an expert witness in both state and federal courts of law on things as unrelated as the government’s efforts to eradicate an infectious plant disease, anti-trust behavior in the stretch limousine market and hospital payment rates by Workers Compensation. Twenty-five years after earning my PhD in economics, I went back to school to earn a Masters’ degree in health services and completed my post-doctoral training at a well-known cancer research center.
Someone unfamiliar with the economics profession might be impressed with these accomplishments, but I can assure you that other economists are not. When they see my long array of seemingly random academic pursuits, they invariably wrinkle their noses in bewilderment. That mentor who advised me in my youth to specialize stopped talking to me nearly twenty years ago when he realized I was a lost cause.
I would like to claim that all this was part of some well-reasoned plan, but the truth is that I struggle with a restlessness that I am incapable of controlling. I have learned to embrace my weirdness. Ironically, my inability to specialize makes me special.
Hence the name of my blog: The Lone Economist. I plan to pursue a data-driven analysis of topics about which I am well-versed, such as Medicare-For-All vs. the Public Option, and a new type of baseball statistic.
Of course, given my history, who knows what I might talk
In my last post, I stated that extrapolation is how we fill in gaps in the data created by the Curse of Dimensionality. It is a blending of observation and theory, a tradeoff between accuracy and simplicity. As much art as it is science, its partial reliance on subjective judgement can make it susceptible to manipulation. That is why Mark Twain once said, “There are three kinds of lies: lies, damned lies and statistics.” Four, if you count economic impact studies.
It is, therefore, with great care and transparency that one should approach extrapolation. Economists, in particular, are guilty of creating arcane statistical models that no one outside our profession can understand, much less believe.
This is the reason why medical journals tend to have a low tolerance for studies that rely on complex statistical modeling. The more complex the model, the more difficult it is to be sure one is drawing the correct inference. Unlike economics, medical research is an experimental science. A well-designed experiment need not rely on a confusing array of equations.
For statistical analyses, the medical profession has, in effect, adopted the KISS principle: keep it simple, stupid. In keeping with this principle, the Lone Economist designed the 6-D baseball statistics using simple ratios, like runs per plate appearance, and a relatively simple rule for extrapolation.
I divided the observed data into six dimensions that results in 288 separate game locations. But every baseball fan knows there are many other bits of information by which to predict outcomes. There is the identity of the batter, the pitcher, and the stadium. There are even individual characteristics of the player that are useful predictors, such as handedness (i.e. which side of the plate the batter swings his bat) and age. Some players might be better at night, some during daylight. The list is quite lengthy.
The Curse of Dimensionality prevents the data from being subdivided into all possible combinations of predictors, so extrapolation must be used for the non-dimensional factors. I call them effect modifiers.
The objective of 6-D Baseball is to predict what happens next during a live game. I want to make it easy for the casual spectator to know when a team is most likely to score without resorting to a hand calculator. The first step is to look at the separate effect of the current batter.
In an earlier post, I introduced the Individual Run Production (IRP) statistic. This statistic provides a value by which we can assess the scoring potential during a plate appearance for each of the 288 game positions. Although it can be used to compare batters across time, its main purpose is to measure the likelihood of scoring in a live game.
Every batter has two objectives. The first is to drive in runs during his plate appearance. The second is to setup the situation for the next batter. Consequently, the individual run production statistic (IRP) is the sum of two components: the runs scored during the plate appearance (RBI) and the value of the change in the game location.
The average or expected number of runs scored until the end of the half-inning is a function of the IRPs of the current and subsequent batters. In equation form it looks like this:
I’ll explain what these equations mean, one by one.
L represents “location” and refers to the count of balls and strikes, the disposition of each base and the number of outs. Specific values are six digits long in the following order: outs, 3rd, 2nd, 1st, Balls, and Strikes. For example, at the start of each half-inning, there are no outs, the bases are empty and the count of balls and strikes are 0 and 0. Therefore the value of L would be 000000. If there were two outs, a man on 1st and the count of balls and strikes was three and two, the value of L would be 200132.
Hb(L) is the average number of runs scored by the end of the half-inning when the game location is L. He(L) is the average number of runs scored after the current plate appearance.
Equations 1 and 2 say that Hb(L) is simply the average number of runs scored during the current PA, R(L), plus He(L), all the runs scored after the current PA.
Equation 3 says the average number of runs when the current batter is j and location is L is in part Hb(L) times the ratio of the average runs scored during the current PA regardless of the batter, R(L), to Hb(L). This ratio is denoted s(L) and is divided by the overall average value of s regardless of L, A(s). Since there are 288 values of L, there are 288 values of s(L). They range from 116.3% for L = 211130 to 3.1% for L = 000002. The overall average value of s(L), A(s), is 35.8%.
The last component of equation 3, sj, is specific to batter j. It is the batter’s average ratio of runs scored during the PA to Hb(L), the first part of the IRP statistic.
Equation 4 determines the average number of runs scored after the PA, the situation the current batter sets up for the next batter. d(L) is the average ratio of He(L) to Hb(L) at location L. A(d) is the average value of d(L) over all values of L and dj is the average ratio for player j. d(L) ranges from 160.2% for L = 200030 to 12.5% for L = 211102. The value of A(d) is 64.9%.
Equations 5 – 8 show that the expected number of runs scored is a function of the current location, L, and the components of the IRP’s for the current and subsequent batters.
Next time I’ll provide some illustrative examples using the IRP’s of active batters.
Napoleon once said, “An army marches on its stomach”. Or was that Frederick the Great? I get my 18th century warmongers mixed up. No matter, his meaning was that the quantity and quality of food for one’s troops is of critical importance when fighting a war.
In the case of the Lone Economist, the quantity and quality of data — preferably free and publicly-available, not every masked vigilante can be as loaded as Bruce Wayne — is the vital fuel for crusading against bias, confounding and all other manner of inappropriate statistical inference. And don’t even get me started about post-randomization, sub-group analyses!
Fortunately, baseball (and for that matter healthcare) generates a huge volume of free data. I have data, courtesy of Retrosheet, covering 173,947 MLB games that date back to 1918. These games include 13,561,443 plate appearances by 13,196 different batters. I even have data on every pitch thrown since 1988, all 21,734,609 of them.
And yet, it isn’t enough.
Every masked vigilante has his or her nemesis and the Lone Economist is no exception. Mine is the Curse of Dimensionality. My fists clench at the mere thought of it. Curse you Curse of Dimensionality!
Can one curse a curse?
What am I saying? I’m a masked vigilante. I can do anything.
This curse refers to a data scarcity problem encountered often in statistical analysis. No matter how much data one possesses, dividing the data into even a small number of dimensions will quickly exhaust the supply.
For example, I have identified only six dimensions of baseball games: outs, three bases, and called balls and strikes. Since each of these dimensions has but a few discrete values, there are just 288 possible “locations” a ballgame can find itself in.
Whenever the count is three balls and less than two strikes and the bases are loaded, the pitcher is at a distinct disadvantage. He can’t afford to throw another pitch outside of the strike zone. The batter knows this and can count on the next pitch being where it can be hit hard. It’s known as a cripple pitch. The worst cripple pitch from the pitcher’s perspective is when there are no strikes and no outs. Put these two situations together and we get the Ultimate Cripple Pitch (UCP).
For the average batter facing the average pitcher, the number of runs scored from that point until the end of the half-inning is 2.88, higher than any other game location. So, if the batter were say Mike Trout, the best hitter in the major leagues today and arguably the sixth best hitter over the last hundred years, the average number of runs scored from that location would be even higher. Right? I imagine this nightmare scenario has caused many pitchers to wake up screaming in the night.
There’s only one problem with this assessment. Mike Trout has never faced a UCP and he probably never will. Out of over 21 million pitches thrown since 1988, only 781 of them have been UCPs. That’s less than four thousandths of one percent.
If I added just one more dimension to my list of six, e.g. the identity of the batter, a huge number of gaps would appear in the data. Are we then to conclude that when, if ever, Mike Trout finds himself in an UCP situation, that it is completely unknown what is likely to happen next? Is he just as likely to strike out as the average batter?
Of course not. Just because it has never happened before, doesn’t mean we know nothing about what is likely to happen. We know how well lesser hitters do in that situation and we know how well Mike Trout does in situations less advantageous to the batter. It doesn’t take a crystal ball to conclude that the average pitcher would be in extremely deep doodoo.
Well, Curse of Dimensionality, the Lone Economist has a silver bullet with your name on it and it’s called “extrapolation”.
Although, if the bullet is called “extrapolation”, shouldn’t that be written on it? These mythology idioms can be so confusing.
I’ll explain how extrapolation works in my next post.
My last post was devoted to answering the ultimate baseball question: who was the greatest hitter? I plan to answer similar questions in the future, such as ‘who was the greatest pitcher?’ and so forth. But all of that is a side excursion from our main path, predicting what happens next in a live baseball game.
Anticipation of what is about to happen next is a primary cause of interest in observing a sporting event. Most spectator sports consist of numerous small skirmishes between the opposing sides. Winning skirmishes leads to winning more significant contests, battles. And winning battles leads to winning the war, the game. For American football, seeking a first down is a skirmish, a series of downs is a battle. For baseball, a plate appearance is a skirmish and an inning is a battle.
For today’s post, I want to return to the concept of a baseball Red-Zone. In a previous post, I showed a heat map of the 288 locations of a baseball game that can occur within its six dimensions: balls, strikes, outs and three bases. That Red-Zone was calculated at the inning level or in other words for battles. When I say I want to predict what happens next, I mean by the end of the plate appearance and even on the next pitch.
Consequently, a baseball Red-Zone would apply to a plate appearance as well as an inning. To see how this is done, look at Figure 1, the half-inning scoring heat map below. Illustrating six dimensions in a two-dimensional space is hard to do. I used one axis to measure outs and balls (the east-west dimension) and the other axis to measure bases and strikes (the north-south dimension). This resulted in a 12 by 24 matrix of 288 cells.
Figure 1. Half-Inning Scoring Zones (1988-2019)
Note: The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at “www.retrosheet.org”. Retrosheet provides play-by-play data on most MLB baseball games from 1918 to 1931 and all games since 1932; however, data on every pitch in every MLB game dates back only to 1988.
Notice that there are three rows for each combination of occupied bases and eight rows for each value of strikes. There are four columns for each value of outs and three columns for each value of balls. This juxtaposition of dimensions is dictated by the need to produce a rectangular matrix.
What is not dictated though is the order in which the axes are sorted. The north-south axis is sorted first by bases and then by strikes. The east-west axis is sorted first by outs and then by balls. I could have sorted them differently, but I chose to sort them in this way to achieve a visual effect.
I wanted to illustrate the fact that the more balls and the less strikes that are called, the greater is the potential for scoring. And I wanted to illustrate that the fewer outs and the more runners on base closer to home there is, the greater is the potential for scoring. Since outs and bases dominate in the determination of scoring at the inning-level, it was necessary to sort by those dimensions first. This led to a clear picture of three relatively intact scoring zones: a blue zone (i.e. low-scoring) in the southwestern cells, a red zone (i.e. high-scoring) in the northeastern cells and a yellow zone that clearly delineates them.
If, instead, I had sorted the axes primarily by balls and strikes, the picture would have looked like Figure 2.
Figure 2. Half-Inning Scoring Zones Resorted (1988-2019)
The relationships between balls vs. strikes and outs vs. bases is the same as in Figure 1. But due to the different sorting of the axes, it is harder to discern this from looking at the heat map.
The Plate Appearance Red-Zone
For long-time baseball fans, it should come as no shock to learn that when it comes to determining the outcomes of plate appearances, balls and strikes — not outs and bases — dominate. What might come as more of a surprise though, is the relationship between positive outcomes and outs vs. bases.
Figure 3 shows a heat map of the average plate appearance outcomes for each of the 288 baseball game locations. The north-south axis is sorted first by strikes and then by bases. The east-west axis is sorted by balls and then by outs. If you recall from my last post, a plate appearance outcome is the number of runs
scored during the PA (RBI+) plus the change in game location (ΔGL). The outcomes range from negative 0.31 to positive 0.62 and average zero. The lower the value, the bluer the background and the higher the value, the redder the background.
A matrix of 288 numbers can be very tedious to contemplate, so Figure 4 presents a trichotomized version that drops the Arabic numbers. Although the axes of Figure 4 are sorted in the same way as the axes in Figure 2, the color pattern in Figure 4 is similar to that of Figure 1. Blue cells cluster around the southwestern region and red cells cluster around the northeastern region. Yellow cells form a diagonal from the northwest to the southeast.
The pattern confirms the theory that positive outcomes generally increase with the count of balls and the number of runners on base and decrease with the count of strikes and the number of outs, but there are several exceptions. For example, according to this theory, the worst location for the batter should be two outs, no runners on third, second or first, no balls and two strikes (location 200002). According to Figure 3 however, the worst location for an individual batter is one out, the bases loaded, no balls and two strikes (111102). The reason is that with two strikes and no balls, the batter can’t afford to not swing at the next pitch unless it is far outside the strike zone. This makes a strike out or even worse, a double play, quite likely. Since the bases are loaded, the opportunity cost of that outcome would be very high.
Another exception is that this theory predicts location 011130 (i.e. no outs, the bases loaded, three balls and no strikes) would be the best location for the batter. But the best outcome is at 211130 (i.e. two outs) instead. As to why, I can only conjecture. When there are two outs, a double-play is not possible. But I suspect the main reason is that with two outs, the batter is less likely to try for a homerun and therefore is more likely to put the ball into play or walk in a run.
From this matrix of plate appearance outcomes, we can estimate the impact an individual batter has, based on his past history, as well as other effect modifiers like who is pitching, the stadium in which the game is played, etc.
How does one assess individual production within a group activity, like manufacturing? There’s overhead and sunk costs, the law of diminishing marginal product and substitutability of labor and capital to contend with. Accountants and economists have struggled with this problem since the invention of money itself.
So have baseball statisticians, which brings me to today’s topic. How can we measure an individual player’s contribution to his team’s score? It’s not just the runs he scores himself, because that often relies on who batted him in. And it’s not just the runs he bats in, because that depends on who batted previously.
To solve this riddle, statisticians invented many of the terms and concepts we associate with the fundamental parts of baseball today. For example, the idea of a “base hit” has nothing to do with the design or execution of the game of baseball. If the batter puts the ball into play on the ground and beats the ball to first base, he is safe. Whether it’s a “single” or an “error” or a “fielder’s choice” is irrelevant. These designations are merely statistical contrivances to facilitate measuring the productivity of an individual batter.
With that history in mind, we need to construct an individual batting statistic that is congruent with the goals of this study, that is, to predict scoring based on the six discrete dimensions discussed in previous posts plus several effect modifiers, like who is batting, pitching, etc. Even if the traditional individual batting performance measures, e.g. batting average (BA), on-base percentage (OBP) and slugging percentage (SLP), did not suffer from many flaws, they would not serve this purpose well. So, a completely new type of statistic is called for.
Traditional Batting Statistics
The flaws of these statistics are as well-known as their many proposed remedies. BA weights all base hits equally but ignores bases on balls and advancing base runners (i.e. sacrifice bunts and fly balls). OBP counts bases on balls but still counts a single as much as a homerun. Like BA, SLP only counts base hits but does weight doubles more than singles and so on. However, the weights (i.e. 4 for a homerun, 3 for a triple, etc.) are arbitrarily derived.
Modern statistics, such as Pete Palmer’s and John Thorn’s Linear Weights or Weighted Runs Created (wRC), improve upon the traditional batter performance measures, yet still rely on the same flawed contrivances, like base hits and sacrifice flies. Adding OBP and SLP together (aka OSP) is also a popular remedy, but this literally compounds the flaws rather than eliminates them.
A common flaw of all these statistics is that they suffer from confounding, the assignment of a spurious causal association between two variables due to missing information. Let me explain via anecdote.
When I was 13 years old, I started wearing a hat, because I thought it would look “cool”. This was before I realized that I was genetically incapable of judging what other people consider cool. My father saw me and said “Take that hat off! Don’t you know it will make you go bald?”
I thought about this for a while. Bald people wear hats to protect their bare scalps from the sun. Ergo, most bald people wear hats and most people with hair do not. My father had observed this and correctly deduced that there was a causal relationship between wearing a hat and going bald. Only, he got the causal direction wrong. Bald people are not bald because they wear hats. They wear hats because they are bald. His analysis suffered from confounding.
In baseball statistics, confounding results in a batter’s relative productivity being over or under measured. For example, some stadiums are easier to score in than others. A lot of work has gone into trying to figure out how many more homeruns Babe Ruth hit because Yankee Stadium had a short right field fence or how many fewer homeruns Willie Mays hit because he played in wintery Candlestick Park.
Adjusting for stadium effects is commendable, but the old and new statistics fail to adjust for the most reliable determinant of scoring of all, the multi-dimensional location of the ball game. A batter’s BA, OBP, and SLP all improve dramatically with runners on base. Some batters slogged it out during eras when scoring was relatively hard to do. I call this baseball’s Death Valley Days (another 1950’s TV show reference!), – I’m talking about you Mickey Mantle, Willie Mays and Hank Aaron – while others enjoyed the bountiful 1920’s and 30’s. The average number of runners on base when the terrific trio were at the plate was 0.64, 0.63 and 0.66, respectively. These are all close to the overall average of 0.64, but each of these guys batted third in the order. Their number-of-runners-on-base averages should have been higher. The corresponding numbers for Babe Ruth, Lou Gehrig and Rogers Hornsby were 0.73, 0.78 and 0.74, about 17% higher.
Mantle, Mays and Aaron played against a pervasive headwind that is not accounted for by the traditional statistics. What is needed is a statistic that recognizes how the game is designed and played and does not rely on subjectively determined events, like errors and base hits.
What’s that you hear? Why, it’s the William Tell Overture signaling the Lone Economist coming to the rescue. Hiyo Silfur!
Individual Run Production
If you recall from previous posts, baseball has six dimensions. At the individual pitch level there are exactly 288 discrete “locations” the game can find itself in. These range from the very start of the half-inning (i.e. no balls or strikes, no outs and the bases are empty) to a full count (i.e. three balls and two strikes), two outs and the bases are loaded. Some of these locations are more propitious for scoring than others, i.e. the Baseball Red-Zone.
At the individual plate appearance (PA) level, each PA starts with a zero count (i.e. no balls and no strikes). So, when we assess the change in the team’s prospects for scoring from the beginning of one PA to the beginning of the next, we need only consider four dimensions (i.e. outs and the three bases) and 25 possible outcomes (i.e. eight configurations of the three bases (2 x 2 x 2) times three values of outs, plus one for the end of the half-inning).
That last paragraph may be hard to understand at first, so let me explain via example. Below is a heat map of the average additional runs scored for the 24 starting locations. These figures cover the 102-year period from 1918 to 2019. My source, Retrosheet, provides data on all games played from 1932 to 2019, but from 1918 to 1931, only 75% of major league baseball games are covered. So, some of the games played by the likes of Babe Ruth and Ty Cobb are missing. But as a professional statistician, the Lone Economist has an aversion to discarding useful data. So, I left those years in.
Now suppose a batter is first up in a half-inning. The game’s “location” is at the bottom right-hand cell, there are no outs and no one is on base. Under average conditions, the batting team would be expected to score 0.49 runs by the end of the half-inning. Just what impact on the team’s expected runs can the individual batter make? There are exactly five possible outcomes by the end of this PA. He can reach first, second or third safely; score a run or be out. That’s it. None of the remaining 19 locations are possible.
If he reaches first base safely, he increases expected additional runs from 0.49 to 0.86, i.e. 0.37. Reaching second increases expected runs by 0.61 and reaching third increases it by 0.83. If he is out, expected additional runs decrease from 0.49 to 0.27 or by 0.22. A homerun doesn’t change expected additional runs at all (i.e. the next batter starts at 0.49 also), but a run is scored so that is the best possible outcome from the PA. The following table lists the possible individual run production (IRP) outcomes when there are no outs and no one is on base.
Notice that the IRP difference between a homerun and an out (1.22) is approximately double the difference between a single and an out (0.59). Remember that slugging percentage assumes this ratio is four, not two. Of course, this is true only for the special case when the bases are empty and there are no outs. But even under different conditions, a homerun is worth far less than four singles, usually less than two.
How about when the bases are loaded and there are no outs? That’s more complicated because there are a lot more than five possible outcomes in that scenario. The exact number of possible outcomes is 24. These include all 24 PA locations except for one, the bases loaded and two outs. If the batter hits into a double play, there can be no more than two runners left on base. Plus, there is the outcome of the triple play which ends the half-inning. Here is a heat map of 23 of those possible outcomes.
I won’t explain the value of every cell, but I can explain a couple of examples. The top right-hand cell is the outcome where the batter either walks, is hit by a pitch, reaches first due to a fielding error or hits a single and each baserunner advances one base. One run is scored and there is no change in the expected runs specified in Figure 1, the game is at the same location when the next batter comes to the plate. So, the IRP is one run exactly.
The bottom right-hand cell is the outcome when the batter hits a grand slam, i.e. the bases are cleared and there are still no outs. Four runs are scored, but the average additional runs from the start of the PA to the start of the next PA falls from 2.27 to 0.49. So, the IRP of that cell is 4 – 2.27 + 0.49 = 2.22.
The only possible outcome missing from Figure 3 is the triple play. If no runs are scored, the average additional runs from Figure 1 drops from 2.27 to zero. Therefore, the IRP would be -2.27. It is possible that a run scores before the third out is recorded. In that case the IRP would be -1.27. The average value over the past 102 years is -1.63.
From Figure 3, we can see that relative to an out where no one scores (i.e. -0.70), a homerun is worth 2.92 (2.22 + 0.70) runs and a single is worth 1.7 (1.00 + 0.70) runs. The ratio of a homerun to a single is less than 2. Consequently, we can see how much slugging percentage over-values homeruns relative to singles.
The above discussion establishes the basis for our new statistic. Every time a batter comes to the plate, he is at one of the 24 locations. What he does with this opportunity depends on his ability and chance. He is credited with any runs that score from his plate appearance, i.e. Runs Batted-In plus any runs scored due to fielding errors (RBI+), and the change in the game location.
For example, suppose the game location is the bases are loaded and there is one out. According to Figure 1, expected runs are 1.57 (top row, middle column). This is a very favorable location, the third highest out of 24.
Suppose the batter hits a fly ball to right field, the runners on third and second tag up. The third base runner scores a run and the runner on second advances to third base. The batter is out, but one runner scores and another is closer to home. This might be considered a good outcome for the batter, but is it really?
The game location moves from 1.57 in Figure 1 to 0.51, two outs and runners on first and third. The IRP of that PA is therefore 1 (the run batted-in) + 0.51 – 1.57 (the change in the game location) = -0.06.
The negative value seems to indicate that this was not a good outcome, but we need to consider what the alternatives are to put this outcome into context. The batter could have hit a grand slam with an IRP of 2.7 (4 + .027 – 1.57) or into an inning-ending double play with an IRP of -1.57. Compared to that worst-case scenario, the -0.06 IRP is an improvement of 1.51 runs. An above-average batter might be disappointed with the outcome, but a below-average batter would be happy to hit the fly ball to right field.
The creators of the traditional statistics didn’t have a good solution to measuring this outcome. They labeled it a “sacrifice” and excluded it from batting average and slugging percentage, thus violating a cardinal tenet of statistical analysis to count all useful information. On-base percentage is even worse. A sacrifice is counted in the denominator, so it is just as bad as a strike out. And it counts double plays the same as single outs.
We take the number of IRPs and divide it by the number of plate appearances to calculate the average IRP. Although I used Figure 1 to explain the IRP concept, actual IRPs should be calculated using annual averages. So, when calculating Barry Bonds’ IRP in 2004, for example, I used the average runs for each game location in 2004.
Notice there is no reliance on base hits, errors, sacrifice flies or fielder’s choices. All plate appearances count. Nothing is excluded. Batters that hit into double and triple plays are fully penalized. Batters who advance base runners are given proportional credit.
From an economist/statistician viewpoint, the beauty of this statistic is that it adheres to the “adding up” constraint. When devising a system of equations – in this case each batter’s statistic represents one equation – the sum of the individual parts should equal the total. By the way this statistic is defined, at the end of the year the sum of all players’ IRPs will be equal to the sum of runs scored during the season.
Career Average IRP vs. OPS
Figure 4 lists the top 25 players by average IRP. Only players with at least 3,000 plate appearances during the 1918-2019 time-span are ranked. For comparison sake, the right-hand column ranks each player’s OSP (on-base percentage plus slugging percentage) statistics.
The first thing to note is how similar the two rankings are. The top seven players are the same in both rankings. Babe Ruth, Ted Williams and Lou Gehrig are at the top of both lists. Several other familiar names also appear in both Top 25 lists: Joe DiMaggio, Mickey Mantle, Stan Musial, Willie Mays, etc.
The next things to notice are the players that fair much better with this new ranking as compared to that by OSP. Hank Aaron rises from 33rd by OSP to 22nd by average IRP. Ty Cobb jumps from 47th to 15th. And this was for only some of his games played after his prime years.
There are a few players who fair relatively poorly and are not shown in Figure 4. For example, Vladimir Guerrero drops from 27th by OSP to 121st by average IRP. Alex Rodriguez drops from 32nd to 47th.
RBI+ vs. Change in Game Location
For anyone who thinks this is simply an RBI per plate appearance statistic, lets break average IRP into its two separate parts, RBI+ and the change in game location (ΔGL). In the aggregate, RBI+ equals the negative value of ΔGL. Figure 4 shows this to be 0.118 vs. -.117. This happens because when runners on base are batted in, the game location usually becomes less favorable. Therefore, players who tend to have above average RBI+ per plate appearance will have below average ΔGL.
Hank Greenberg is an extreme example of this relationship. He has the highest RBI+ average over the last 102 years, but ranks only 1,312th (out of 1,598) in average ΔGL. I doubt that it is just a coincidence that he also had the highest average number of runners on base (0.82) when he came to bat. The 102-year average of this statistic is 0.64.
Despite his high number of runners on base, Greenberg did not enjoy the highest average game location when he came to bat. He ranked 30th in this department, below the king of cleanup hitters himself, Lou Gehrig.
Greenberg was known as a slugger, not for his speed around the bases. He was called Hammerin’ Hank before Aaron went by that nom-de-guerre. So, a high average RBI+ and a low average ΔGL might be a marker for a power hitter. If so, then Gehrig, DiMaggio, Ramirez, McGwire and Aaron fit this description.
But what about Bonds and Mantle? They rank 127th and 164th, respectively, in average RBI+ and 36th and 33rd in average ΔGL. If those two weren’t power hitters, then nobody was. So, the power vs. average divide does not explain this relationship.
But to prove that a high average RBI+ does not always result in a low average ΔGL (and vice-versa), look at Babe Ruth and Ted Williams. Ruth’s average RBI+ was second only to Greenberg’s, but his average ΔGL was ranked 34th. Ted Williams had the 9th best average RBI+, but his average ΔGL was ranked 10th. Those two guys were great no matter the situation.
Who Was the Greatest Batter?
This is the ultimate question for a baseball statistician. However, the objective of 6-D statistics is to change the analytical focus (i.e. perspective) from comparing players’ abilities to predicting outcomes during a game. It is therefore ironic that a by-product of this refocusing is a statistic that attempts to answer the ultimate question. The Lone Economist admits that he would love to discover some new nugget of information that sheds light on the answer.
Average IPR is just the start. I plan to look at many other factors that affect baseball productivity. But before I end this post, I want to address an obvious source of confounding that even average IPR suffers from. I am referring to the low ranking of batters who played during the 1960’s (e.g. Mantle, Mays and Aaron) relative to those of the 1920’s and 1930’s (e.g. Ruth, Gehrig, and Hornsby) and 1990’s and 2000’s (e.g. Bonds, Ramirez and McGwire).
A change in the way baseballs were manufactured and the banning of the spitball in 1920 likely inflated the batting statistics of the 20’s and 30’s. Performance-enhancing drugs fueled the scoring surge of the 90’s and 2000’s. So how can we reduce the effects of these confounding missing factors and level the playing field?
Notice that the average beginning game location was lower for Mantle, Mays and Aaron (i.e. 0.471, 0.459, and 0.464) than for any other member of the Top 25 club except for Mike Trout, the only active player to make the list. This happened not because they were on poor-hitting clubs, but because they played during a poor-hitting era.
One way to correct for this difference is to calculate the average IRP as the percentage change from the average beginning game location. Figure 5 does this and recalculates the rankings. Notice that Mays and Aaron rise from 18th and 22nd, respectively, to 13th and 14th. Mickey Mantle jumps from 9th to 4th and Mike Trout climbs in the rankings from 11th to 6th. And I’m happy to see Frank Robinson, Dick Allen and Wille Stargell climb into the Top 25. The traditional statistics never treated these great hitters fairly.
But we still have far to go to answer the ultimate question.
In a previous post, I stated that goal-line sports, like football and soccer, are basically one-dimensional. The closer the ball is to the goal, the greater the chance of scoring. If I calculated the probability of scoring as a function of distance to the goal, I imagine it would look something like this:
I believe this simplicity helps explain the broad appeal these games have for the general public.
The scoring in a baseball game is not so simple to predict or to illustrate, however. The chance of scoring is a function of six discrete variables or dimensions, not just one. These are the counts of balls and strikes, the dispositions of three separate bases and the number of outs. Since there are four values for balls, three for strikes, two for each of the three bases and three outs, there are 288 (i.e. 4 × 3 × 2 × 2 × 2 × 3) discrete values or locations that determine the probability of scoring during a half-inning.
Thinking in six dimensions is hard enough. But illustrating six dimensions on a two-dimensional space is even harder. For example, if the probability of scoring a touchdown were a function of even one more dimension than the distance to the goal, the above graph would need a third axis; one that is perpendicular to the two axes that already exist. In short, a three-dimensional drawing would be necessary.
But how do we illustrate five more dimensions when we can only see a total of three? The answer is to rely on another aspect of our visual senses, color. The frequency spectrum of visible light waves ranges from the lowest (i.e. blue) to the highest (i.e. red). So, if we associate light wave frequency with scoring probability, the range is from blue to green, yellow, orange and finally red. Think of it like temperature. Blue is for cold and red is for hot.
Balls vs. Strikes
If you recall from an earlier post, the average runs scored per inning is almost exactly one. So, at the start of each half-inning, when the balls-strikes count is 0-0, the average runs scored by the end of the half-inning is one half of a run.
Each additional ball should favor the batter and each additional strike should favor the pitcher. Do the data bear this out? Look at the following color-coded table (aka heat map) and see.
These numbers cover years 1988 to 2019. The upper left-hand cell of the table represents the average additional runs scored by the end of the half-inning when the balls-strikes count is 0-0. This value is near the middle of the range from 0.39 to 0.73, so it has a yellow color. The highest value, when there are 3 balls and no strikes, is colored red. Conversely, the lowest value, when there are no balls and 2 strikes, is colored blue. The axes are ordered so that the highest values are in the upper right-hand cells and the lowest values are in the lower left-hand cells.
From this table we can infer three conclusions:
Each additional ball favors the batter
Each additional strike favors the pitcher
Balls and strikes have a measurable, but small causal impact on scoring.
To find a larger causal impact on scoring, we look at outs and bases.
Outs and Bases
Here is a heat map of the average additional runs scored by the end of the half-inning for the 24 unique values of outs and bases.
Again, the axes are ordered so that the higher values are in the northeast cells and the lowest are in the southwest cells. Clearly, average runs decrease with the number of outs and increase with the number of runners on base and when they are closer to home.
Another conclusion is that the range is much greater for outs and bases (0.10 to 2.24) than it is for balls and strikes (0.39 to 0.73). In other words, outs and bases dominate balls and strikes in the determination of runs scored.
The Baseball Red-Zone
It is now time to combine all six dimensions into one 288-cell heat map.
Notice the familiar shading from the blue end of the spectrum to the red end as we move in a northeasterly direction. Also notice that the range is even greater, 0.06 to 2.88.
The detail of this heat map can be useful, if overwhelming. For example, it shows that average runs when there are no outs and a runner on first and there are no balls or strikes is 0.88. Directing the batter to perform a sacrifice bunt in order to advance the runner to second base would result in a decrease in average runs to 0.72. So, under average conditions, this would be a bad decision. However, when the batter is below-average, like a pitcher at bat in the National League, it can be a good decision.
The normal spectator will find this heat map tedious. So, I trichotomized it into three scoring zones: cold (blue), medium (yellow) and hot (red).
This is the baseball red-zone for the average batter facing the average pitcher in the average ballpark and so on. The 3-zone heat map for an above average batter would have more red cells and fewer blue ones. I imagine the red-zone for Mike Trout would be very large, unless he is facing Clayton Kershaw.
My mission is to determine the red-zones for all batters, pitchers, stadiums, etc. This is just the beginning of a long and interesting road.
Productivity is measured by rates, i.e. quantities of outputs relative to quantities of inputs. So, our first step is to choose the appropriate outputs and inputs. Let’s start with the choice of outputs.
For all types of prediction, there is usually one main outcome. For healthcare, it’s death. When a new cancer drug is tested, we want to know how many lives it will potentially save (the output) versus the lives lost if resources are diverted from some other use (the input). But death is thankfully a rare event and a clinical trial that uses death as the main outcome could take years to complete.
This is why so many clinical trials of potentially life-saving cancer drugs choose an intermediary outcome like the resumption of disease progression rather than just death. If they waited for all the trial participants to die before concluding that the drug is safe and effective compared to a placebo, several decades might pass.
From the baseball spectator’s perspective, the main outcome of interest is obviously which team wins the game. But we don’t want to wait to the end of the ball game to see who won. We want to anticipate which team is likely to win using intermediate outcomes. For baseball fans, there are two intermediate outcomes that build upon each other: reaching base and scoring.
Reaching base is not the team’s end objective, but it is corelated with scoring runs which is corelated with the end objective, i.e. winning the game. A game of baseball typically lasts three hours. To maintain spectator interest over such a long time-span, one must be able to appreciate when your side is about to win a small contest (the plate appearance) that may lead to winning a larger contest (scoring during the inning) that may lead to winning the game.
The little/medium/big contest strategy for maintaining spectator interest is not unique to baseball. For example, American football’s small contest is making a first down. If the team achieves that, then it might win the medium contest (score a touchdown) and ultimately the game.
Each output measure is associated with an input measure in order to calculate the rate of production. The input measure for reaching base is the plate appearance. For scoring, the input measure is the team’s side of the inning, aka the half-inning.
Runs Per Inning
A rate is the ratio between the amount of output and the amount of input. For example, from 1918 to 2019, a 102-year time span, 1,545,462 runs were scored by major league baseball teams in 1,598,551 innings. That comes to almost 1 run per inning (RPI) (i.e. 0.967/inning).
So, if you went to an MLB game tomorrow, would you expect nine runs to be scored? If both teams were more or less average, the answer is yes. But that happens to be a lucky guess because when it comes to scoring per inning, MLB is highly episodic. The present day just happens to be in concordance with the long-run average. Here is a graph of the yearly runs per innings from 1918 to 2019. Notice that there have been many peaks and valleys, but the latest year was close to the long-run average of one run per inning.
Note: The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at “www.retrosheet.org”.
This graph makes RPI look highly variable by year, but that is more a result of my choice of the upper and lower bounds of the vertical axis than its actual variability. If I had chosen the bounds of the vertical axis to be 3 and 0, for example, the graph would have looked like this.
Relying on how the graph looks, can lead to mistaken inferences. The standard deviation, a measure of variability, is only 0.089, less than one tenth of the mean RPI. But what makes runs per inning episodic is the serial correlation of the annual figures, not its standard deviation.
What I mean by serial correlation is correlation of consecutive deviations from the mean. In other words, annual runs per inning are not independent random events. A higher than average year is likely to be followed by another higher than average year. The serial correlation coefficient for this time series (represented by the Greek letter “rho”, ρ) is 0.733. This is a high value for this statistic. If annual RPI was independent, ρ would be close to zero instead of close to 1.
The broad takeaway from this graphic is that the 1920’s, 30’s and 90’s were epochs of relatively high-scoring, while the 1960’s experienced relatively low-scoring, but that there is no time trend. The time trend coefficient is practically zero. Annual runs per inning that deviate from the long-run average tend to regress to the mean in the following years. This has led to a fairly stable RPI value for over a century.
Probability of Reaching Base
Since 1918, there have been 13,266,945 plate appearances which resulted in the batter reaching base 4,426,673 times. The rate is therefore 0.334. This means that for the last century the odds of the batter reaching base has been 1 to 2. Conversely, the odds of getting out have been 2 to 1.
This might seem high since a .300 batting average is considered quite high, but please keep in mind this is not the batting average and it isn’t even the on-base percentage. Both traditional statistics exclude reaching base by fielding error and batting average excludes bases on balls and hit by pitch. From the spectator’s perspective, the credit or blame for why the batter reached base is irrelevant. A walk is as good as a single and a two-base error is as good as a double. Like RPI, the PRB varies from year to year. The following graph illustrates the vicissitudes of PRB from 1918 to 2019. It looks quite similar to the graph of RPI. And just like RPI, it is serially correlated with multi-year peaks and valleys, but no long-term time trend. In fact, PRB’s variability is less than that of RPI (SD = 0.012, ρ = 0.83).
A Time Trend for Baseball
Before I end this post, I want to show that there is at least one long-run time trend in MLB. The following graph illustrates homeruns per plate appearance since 1918. It shows that in 1918, only 0.39% of plate appearances resulted in a homerun. In 2019, that percentage was 3.63%, a nearly tenfold increase.
Some see this as progress. The Lone Economist does not. I’ll explain why in a later post.
My plan is to introduce a new class of statistics that are not only statistically valid (unlike some traditional baseball statistics), but also heighten interest in watching the games. These posts will be rather technical and not intended for the statistically uninitiated.
I hope to someday publish a book easily understandable to the average baseball fan. For now, think of these posts as the technical appendix of that future book.
The Perspective of Baseball Statistics
A while ago I wrote about the importance of being mindful of one’s choice of perspective when analyzing the economics of the U.S. healthcare system (It’s the Perspective Stupid). Many public health experts claim they have the indisputable weight of science on their side when they compare U.S. health outcomes to those of other countries. Yet when we see that they have chosen a purely global perspective – as opposed to an individual perspective, for example – their patina of objectivity is revealed to be largely subjective, something less than pure science.
Most analysts are unaware of the choice of perspective they have made. It is one of the reasons partisan arguments rage on and on without resolution. Both sides think they are talking about the same thing when they aren’t even talking about the same plane.
The subjective and often unconscious choice of perspective affects other areas of quantitative analysis as well. Almost all baseball statistics assume the perspective of an individual player. Player statistics, like earned run average (ERA) for pitchers and on base percentage (OBP) for batters, have been standards for over a hundred years. The Sabermetrics revolution in the 1970’s greatly refined and expanded these types of measures which ultimately led to their use in the management of teams (e.g., the Moneyball phenomenon) and fantasy sports leagues.
However, the player perspective is not the only perspective one can choose. One could choose to take the spectator perspective. From the spectator’s perspective, the drama and entertainment generated during a game is from the suspense of not knowing which team will win until the end. This is what makes the game fun to watch. When the spectator anticipates that a team is about to score, interest is at its highest and maximum attention is paid. At other times, spectator attention wanes. That is why close games are more interesting than lopsided games, because even when one team is likely to score during a lopsided game, it is unlikely to make any difference in which team wins. There is little suspense in the game’s outcome. From the spectator’s perspective, the most interesting statistic is the one that predicts what will happen next, i.e. how likely is the batter going to reach base or the batting team to score.
In other sports, one doesn’t need fancy statistics to figure this out, it’s obvious. In baseball, not so much. To see what I am getting at, consider the argument that there are only two types of team sports. There is baseball and its variations like softball and cricket and then there are the goal-line sports, i.e., football, soccer, basketball, hockey, rugby, polo (regular and water), lacrosse, ultimate frisbee, and quidditch.
O.K., that last one only exists in Harry Potter novels. But their commonality is obvious. These are just variations of the same game. Each team tries to put an object in a goal on either side of a rectangular space within a fixed time limit. Anticipating when a team is likely to score is quite simple. The closer the object is to the goal, the more likely the team is going to score. Besides time, these sports are one-dimensional.
I believe this simplicity is key to the popularity of the goal-line sports. Every American football fan knows their team is likely to score when they have the ball within 20 yards of their opponent’s goal line (aka the Red-Zone). They don’t need sophisticated math skills or knowledge of complex rules to appreciate the situation.
Baseball, however, is not so simple. A baseball in play can go anywhere within the ballpark and even out of it. Scoring doesn’t depend on the location of the ball and there is no clock.
In contrast to one-dimensional goal-line sports, baseball is multi-dimensional and consequently much harder to predict. It is generally understood that the closer the runner is to home the more likely he is to score, other things the same. However, is the team more likely to score when the bases are loaded or when only third and second are occupied?
Each additional ball that the home plate umpire calls is believed to improve the batter’s chances of reaching base and each additional strike should have the opposite effect, but what is the net effect of a full count (i.e. 3 balls and 2 strikes)?
Then there are the effect modifiers, such as the innate abilities of the pitchers and batters, the proportions of the ballpark, and the size and location of the home plate umpire’s strike zone.
Only long-time students of the game are likely to have the experience and knowledge to make informed estimates of these scoring possibilities. What this series of posts provides is a systematic explanation of how to sort through all the facts, figures and formulas as well as a list of simple statistics that can be observed throughout the game to indicate when a batter is likely to reach base and when he isn’t and when a team is likely to score and when it isn’t. In short, it identifies a Red Zone for baseball.
The emergency fiscal and monetary policies recently announced remind me of the saying: Generals always fight the last war and economists always fight the last recession. The Great Recession of 2007-2009 was a demand shock recession. The banking system collapsed which required large purchases of bonds by the Federal Reserve and fiscal stimulus to prop up aggregate demand and employment.
But the Covid-19 recession is of an entirely different nature. It is a supply shock recession which was caused by a disruption in the productive capacity of the US and our trading partners, especially China. Although temporary, this supply shock will likely last for several months.
Flooding the markets with cash will prevent the banks from insolvency and it might allow stock market speculators to cover their margin calls, but that won’t stock grocery store shelves or fill our ports with loaded ships.
With so much new money chasing fewer goods I would expect inflation to finally return from its long slumber. Fighting the oil supply shock of 1973-74 with expansionary monetary and fiscal policy was a major cause of inflation in the 1970’s. Once started, inflation is a difficult malady to eradicate in an economy.
A better policy would be to attempt to ameliorate the temporary disruption to our supply chain with rationing of essential goods. I know that solution sounds bad (because it is) but its better than doing nothing and even better than throwing money at a drop in real output.
We’re still talking about the high price of drugs and how to lower them while not decreasing incentives for research and development. There is a mechanism for achieving this that was explored over a hundred years ago by an Italian engineer, statistician and economist named Vilfredo Pareto.
Government economic policies often contemplate exchanges of wealth where one side benefits at the expense of the other, such as a wealth tax or land reform. One of Pareto’s contributions was that markets are forums for mutually beneficial exchange, the special case of transactions where both sides benefit.
Naturally, involuntary wealth redistribution is difficult to achieve politically. Reallocations where at least one person benefits and no one loses are always preferable. Under the right conditions, a market will solve an economic problem through only mutually-beneficial exchanges.
The policy implication is that if the right conditions do not exist, create them in order to find the politically easy way to solve the economic problem.
The economic problem with the international pharmaceutical industry is that people are dying who could be saved at a cost of only a few dollars. The missing condition is that the marginal cost of producing a drug is much less than the price necessary to make its development feasible. For a market to function well, the price of the good must equal the marginal cost of producing the good (P = MC).
The existence of unexploited mutually beneficial exchanges is a clear sign that a condition for an efficient market solution is missing. Such is the case with today’s pharmaceutical industry.
Today’s Pharmaceutical Industry
To see how this might work today, let’s establish some broadly summarized facts.
There are several patented drugs that fit the following general description.
Of all the people world-wide who suffer from a treatable condition, US residents represent no more than 10%.
The multinational pharmaceutical company that has patent protection for a drug that effectively treats the condition sets a price that will maximize the return on its investment, a price much greater than the marginal cost of production.
Private and public insurers in the US — representing the majority of US residents with the condition — pay the high price.
Payers in many other countries decide that at such a high price they can save more lives by not purchasing the patented drug at all or rationing a limited quantity of it.
Each point makes perfect sense separately. It is only the final result that is a real head scratcher: a large percentage of the people in the world with a treatable condition do not receive the effective drug even though the marginal cost of producing it is almost zero.
Many prescription drugs sold by pharmacies fit this description. For example, there is Harvoni, a treatment for hepatitis C, an often-fatal infection, produced by Gilead Sciences. In 2017, Medicare and Medicaid (M&M) paid more than $3.7 billion dollars to treat 50,000 beneficiaries. M&M pays for the treatment for any beneficiary diagnosed with hepatitis C. Even more U.S. residents with hepatitis C are covered by private insurers. These insurers have a similar policy and pay similar prices. In contrast, the UK’s National Health Service (NHS) spent a comparable amount per beneficiary, but for only 10,000 beneficiaries. There are 210,000 people in the UK diagnosed with hepatitis C.
Revlimid (generic name: Lenalidomide) is another example. M&M paid over $3.5 billion to treat over 40,000 beneficiaries for this cancer fighting drug in 2017. The NHS covers this drug for multiple myeloma, but only for patients who cannot take the current thalidomide-based standard of care or are not able to have a stem cell transplant. Coverage is denied when used to treat myelodysplastic syndrome (MDS), a rare blood disorder. Consequently, only 2,100 patients are covered for this drug in the UK.
Clinically administered drugs, usually injected into the patient in a clinical setting, also fit this pattern. For example, Opdivo (generic name: Nivolumab) is used to treat cancer. In 2017 M&M spent $1.6 billion to treat 33,000 patients. The UK’s NHS only approved this drug in late 2017 and only for 1,300 cancer patients.
A study published in 2009 found that 26% of cancer drugs evaluated by the NHS were either denied coverage or only covered provisionally due to a lack of effectiveness relative to cost. Another study published in 2017 concluded that the U.S. paid the highest average price for cancer drugs compared to six other countries, but the authors could find complete price information in each country for only eight patented drugs out of 99 approved by the Federal Drug Administration. Many of these drugs are very hard to obtain in these other countries because of rationing, even if their prices are relatively low.
A Pareto Solution
The world pharmaceutical market exists in what economists call a Pareto Inferior solution. This is the situation where there are mutually beneficial trades that are left unconsummated. For illustrative purposes, here is a hypothetical life-threatening medical condition and its hypothetical effective drug treatment.
There are 10,000 insured patients in the US with the medical condition. At $250,000 per patient, the producer receives $2.5 billion as compensation for the cost of developing the drug.
World-wide there are 100,000 people who suffer from this medical condition, but at such a high price, other countries either ration the drug or deny coverage entirely.
It doesn’t take high-level math skills to see that if all countries agreed to pay $25,000 for each patient, a 90% reduction in the price, the producer would still earn its $2.5 billion reward (100,000 × $25,000 = $2.5 billion). Remember that the marginal cost of producing additional units of the drug is zero.
This is a Pareto improvement because at least one person benefits and no one suffers a loss. Specifically, the U.S. treats the same number of patients, but spends less money and the rest of the world increases the number of patients treated at a favorable price while the producer is unaffected.
The Bottom Line
Since large pharmaceutical companies are multinational monopolies, the best way to negotiate with them is vis-à-vis an international organization deputized to represent the participating governments of the world. A lone government simply setting the price it pays to be no larger than those paid by other countries will achieve little. The pharmaceutical giants already set their prices globally. The occasional exception to this rule will disappear if the suppliers see that a price concession to one country brings down the world-wide price.
By widening the base, the number of patients covered, this international agency could offer a deal to the pharmaceutical companies too good to refuse. A single, international price would eliminate arbitrage, the very phenomenon pharmaceutical companies fear so much. Research and development costs would be covered. The U.S. would save money and the rest of the world would save lives.
As Vilfredo Pareto might have said, vincono tutti (everybody wins).
In previous posts I asserted that the Centers for Medicare and Medicaid Services (CMS) pays much less than private insurers for the same healthcare services because price discrimination by healthcare providers is easy to achieve (here and here). In general, price discrimination gets a bad rap in the economic literature and is even illegal in many instances. But in the case of healthcare, it serves a useful purpose. This is an underappreciated strength of the US healthcare system.
Such is not the case for the pharmaceutical drug market, however. Pharmaceutical producers’ inability to prevent arbitrage severely limits their ability to price discriminate. If they could perfectly price discriminate, the prices charged to poor, uninsured people would be much lower than they are currently, perhaps only a few dollars for a round of treatments.
This is a bold claim, so I should support them with some evidence.
Price Discrimination in the Pharmaceutical Drug Market
Perfect or “first degree” price discrimination would allow the producer to sell each unit to each customer at a different price. Each customer would have to pay the absolute maximum that they individually could and would pay. A millionaire would pay perhaps a million dollars for a single pill. A homeless vagabond, maybe only ten.
As long as the price the customer pays is greater than the marginal cost of producing it, the transaction adds to the producer’s profit. Even if the cost of the research to bring the drug to market was several billion dollars, the marginal cost of producing an extra pill is only a few dollars at most. Refusing to sell a pill to a customer for anything less than $1,000, for example, would lower the producer’s profit if the most the customer can pay is a few dollars. Consequently, even a profit-maximizing pharmaceutical producer has an incentive to sell its drug to poor customers at a very low price. This is not just theoretical conjecture. Real pharmaceutical companies do this in a limited way. If you go to almost any pharmaceutical company website, you will see a statement that says something like, “If you are unable to afford ____, we can help”. The picture below is from the website for Harvoni, a very expensive hepatitis C treatment.
Of course, anybody can claim to be uninsured, poor and suffer from Hepatitis C. Pharmaceutical companies must contend with the same information asymmetry that private insurers face. It is difficult for them to be sure their product is not being resold to customers who could otherwise pay more.
Pharmaceutical Companies Set Their Prices Globally
The “resale” problem also applies to sales to healthcare systems in other countries. It would make perfect sense for pharmaceutical companies to negotiate a different price for each country. Countries with single-payer systems could negotiate a much lower price than fractious systems like the one in the U.S. But the incentive to purchase the drug in the low-price country and resell it to the high-price country at a small markup would be very great. Consequently, pharmaceutical companies normally hold the line when negotiating with single-payer countries and refuse to lower their price.
The legislation would create a maximum price to aid negotiations called the Average International Market price. Drawing on the idea of an international pricing index — which Trump has said he supports but that many Republicans dislike — the so-called AIM would be the average price of a drug in six countries (Australia, Canada, France, Germany, Japan and the United Kingdom) weighted on the basis of sales volume.
The underlying assumption here is that these other countries pay significantly less than the US for the same drugs. But as I have asserted before, this doesn’t happen to any significant degree. Pharmaceutical companies don’t price discriminate in the way this legislation assumes they do.
Here is a passage from an article in the Guardian about the UK’s decision to ration Harvoni, the treatment for hepatitis C mentioned above:
An estimated 215,000 people in the UK have chronic hepatitis C infection (160,000 in England), which new but costly drugs can cure. Addaction, a charity that helps people overcome drug and alcohol abuse, says the decision to treat [only] 10,000 people a year is “manifestly unfair”.
The article also reveals that the cost of Harvoni ranges from £26,000 to £78,000 in the UK depending on the length of treatment. These prices are consistent with those paid in the U.S.
The marginal cost to the producer of a round of treatments for one patient are only a few dollars. The producers of Harvoni forego profits by not selling the drug to the UK at a much lower price for the nearly 200,000 people who will not receive any treatments. The reason why is because they fear it would lower revenues from the patients who can pay the high price, especially from the U.S.
A policy to simply pay the average paid by other wealthy countries will achieve nothing. This Democratic proposal to lower drug prices is very similar to many other proposals. It sounds good. Its proponents can crow about how they are sticking it to the big, bad pharmaceutical companies. But it is unlikely to have any real effect. And if it did have an effect, it would probably do more harm than good.
Harvoni and all the other hepatitis C drugs that are now on the market only exist because their producers had a realistic expectation that they would recoup their investments selling at very high prices. If we arbitrarily decrease these prices, we decrease the incentive to develop new drugs.
The AIM component of the Democratic proposal would, by itself, be ineffective, but it is a step in the right direction. What’s missing is a coordinated effort by the wealthy countries to find a price that each can afford to pay. A price that adequately compensates the pharmaceutical companies for their costs of investment in research and development.
The solution to this kind of international problem was outlined long ago by an Italian genius named Vilfredo Pareto.