Productivity is measured by rates, i.e. quantities of outputs relative to quantities of inputs. So, our first step is to choose the appropriate outputs and inputs. Let’s start with the choice of outputs.
For all types of prediction, there is usually one main outcome. For healthcare, it’s death. When a new cancer drug is tested, we want to know how many lives it will potentially save (the output) versus the lives lost if resources are diverted from some other use (the input). But death is thankfully a rare event and a clinical trial that uses death as the main outcome could take years to complete.
This is why so many clinical trials of potentially life-saving cancer drugs choose an intermediary outcome like the resumption of disease progression rather than just death. If they waited for all the trial participants to die before concluding that the drug is safe and effective compared to a placebo, several decades might pass.
From the baseball spectator’s perspective, the main outcome of interest is obviously which team wins the game. But we don’t want to wait to the end of the ball game to see who won. We want to anticipate which team is likely to win using intermediate outcomes. For baseball fans, there are two intermediate outcomes that build upon each other: reaching base and scoring.
Reaching base is not the team’s end objective, but it is corelated with scoring runs which is corelated with the end objective, i.e. winning the game. A game of baseball typically lasts three hours. To maintain spectator interest over such a long time-span, one must be able to appreciate when your side is about to win a small contest (the plate appearance) that may lead to winning a larger contest (scoring during the inning) that may lead to winning the game.
The little/medium/big contest strategy for maintaining spectator interest is not unique to baseball. For example, American football’s small contest is making a first down. If the team achieves that, then it might win the medium contest (score a touchdown) and ultimately the game.
Each output measure is associated with an input measure in order to calculate the rate of production. The input measure for reaching base is the plate appearance. For scoring, the input measure is the team’s side of the inning, aka the half-inning.
Runs Per Inning
A rate is the ratio between the amount of output and the amount of input. For example, from 1918 to 2019, a 102-year time span, 1,545,462 runs were scored by major league baseball teams in 1,598,551 innings. That comes to almost 1 run per inning (RPI) (i.e. 0.967/inning).
So, if you went to an MLB game tomorrow, would you expect nine runs to be scored? If both teams were more or less average, the answer is yes. But that happens to be a lucky guess because when it comes to scoring per inning, MLB is highly episodic. The present day just happens to be in concordance with the long-run average. Here is a graph of the yearly runs per innings from 1918 to 2019. Notice that there have been many peaks and valleys, but the latest year was close to the long-run average of one run per inning.
Note: The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at “www.retrosheet.org”.
This graph makes RPI look highly variable by year, but that is more a result of my choice of the upper and lower bounds of the vertical axis than its actual variability. If I had chosen the bounds of the vertical axis to be 3 and 0, for example, the graph would have looked like this.
Relying on how the graph looks, can lead to mistaken inferences. The standard deviation, a measure of variability, is only 0.089, less than one tenth of the mean RPI. But what makes runs per inning episodic is the serial correlation of the annual figures, not its standard deviation.
What I mean by serial correlation is correlation of consecutive deviations from the mean. In other words, annual runs per inning are not independent random events. A higher than average year is likely to be followed by another higher than average year. The serial correlation coefficient for this time series (represented by the Greek letter “rho”, ρ) is 0.733. This is a high value for this statistic. If annual RPI was independent, ρ would be close to zero instead of close to 1.
The broad takeaway from this graphic is that the 1920’s, 30’s and 90’s were epochs of relatively high-scoring, while the 1960’s experienced relatively low-scoring, but that there is no time trend. The time trend coefficient is practically zero. Annual runs per inning that deviate from the long-run average tend to regress to the mean in the following years. This has led to a fairly stable RPI value for over a century.
Probability of Reaching Base
Since 1918, there have been 13,266,945 plate appearances which resulted in the batter reaching base 4,426,673 times. The rate is therefore 0.334. This means that for the last century the odds of the batter reaching base has been 1 to 2. Conversely, the odds of getting out have been 2 to 1.
This might seem high since a .300 batting average is considered quite high, but please keep in mind this is not the batting average and it isn’t even the on-base percentage. Both traditional statistics exclude reaching base by fielding error and batting average excludes bases on balls and hit by pitch. From the spectator’s perspective, the credit or blame for why the batter reached base is irrelevant. A walk is as good as a single and a two-base error is as good as a double. Like RPI, the PRB varies from year to year. The following graph illustrates the vicissitudes of PRB from 1918 to 2019. It looks quite similar to the graph of RPI. And just like RPI, it is serially correlated with multi-year peaks and valleys, but no long-term time trend. In fact, PRB’s variability is less than that of RPI (SD = 0.012, ρ = 0.83).
A Time Trend for Baseball
Before I end this post, I want to show that there is at least one long-run time trend in MLB. The following graph illustrates homeruns per plate appearance since 1918. It shows that in 1918, only 0.39% of plate appearances resulted in a homerun. In 2019, that percentage was 3.63%, a nearly tenfold increase.
Some see this as progress. The Lone Economist does not. I’ll explain why in a later post.