In a previous post, I stated that goal-line sports, like football and soccer, are basically one-dimensional. The closer the ball is to the goal, the greater the chance of scoring. If I calculated the probability of scoring as a function of distance to the goal, I imagine it would look something like this:
I believe this simplicity helps explain the broad appeal these games have for the general public.
The scoring in a baseball game is not so simple to predict or to illustrate, however. The chance of scoring is a function of six discrete variables or dimensions, not just one. These are the counts of balls and strikes, the dispositions of three separate bases and the number of outs. Since there are four values for balls, three for strikes, two for each of the three bases and three outs, there are 288 (i.e. 4 × 3 × 2 × 2 × 2 × 3) discrete values or locations that determine the probability of scoring during a half-inning.
Thinking in six dimensions is hard enough. But illustrating six dimensions on a two-dimensional space is even harder. For example, if the probability of scoring a touchdown were a function of even one more dimension than the distance to the goal, the above graph would need a third axis; one that is perpendicular to the two axes that already exist. In short, a three-dimensional drawing would be necessary.
But how do we illustrate five more dimensions when we can only see a total of three? The answer is to rely on another aspect of our visual senses, color. The frequency spectrum of visible light waves ranges from the lowest (i.e. blue) to the highest (i.e. red). So, if we associate light wave frequency with scoring probability, the range is from blue to green, yellow, orange and finally red. Think of it like temperature. Blue is for cold and red is for hot.
Balls vs. Strikes
If you recall from an earlier post, the average runs scored per inning is almost exactly one. So, at the start of each half-inning, when the balls-strikes count is 0-0, the average runs scored by the end of the half-inning is one half of a run.
Each additional ball should favor the batter and each additional strike should favor the pitcher. Do the data bear this out? Look at the following color-coded table (aka heat map) and see.
These numbers cover years 1988 to 2019. The upper left-hand cell of the table represents the average additional runs scored by the end of the half-inning when the balls-strikes count is 0-0. This value is near the middle of the range from 0.39 to 0.73, so it has a yellow color. The highest value, when there are 3 balls and no strikes, is colored red. Conversely, the lowest value, when there are no balls and 2 strikes, is colored blue. The axes are ordered so that the highest values are in the upper right-hand cells and the lowest values are in the lower left-hand cells.
From this table we can infer three conclusions:
- Each additional ball favors the batter
- Each additional strike favors the pitcher
- Balls and strikes have a measurable, but small causal impact on scoring.
To find a larger causal impact on scoring, we look at outs and bases.
Outs and Bases
Here is a heat map of the average additional runs scored by the end of the half-inning for the 24 unique values of outs and bases.
Again, the axes are ordered so that the higher values are in the northeast cells and the lowest are in the southwest cells. Clearly, average runs decrease with the number of outs and increase with the number of runners on base and when they are closer to home.
Another conclusion is that the range is much greater for outs and bases (0.10 to 2.24) than it is for balls and strikes (0.39 to 0.73). In other words, outs and bases dominate balls and strikes in the determination of runs scored.
The Baseball Red-Zone
It is now time to combine all six dimensions into one 288-cell heat map.
Notice the familiar shading from the blue end of the spectrum to the red end as we move in a northeasterly direction. Also notice that the range is even greater, 0.06 to 2.88.
The detail of this heat map can be useful, if overwhelming. For example, it shows that average runs when there are no outs and a runner on first and there are no balls or strikes is 0.88. Directing the batter to perform a sacrifice bunt in order to advance the runner to second base would result in a decrease in average runs to 0.72. So, under average conditions, this would be a bad decision. However, when the batter is below-average, like a pitcher at bat in the National League, it can be a good decision.
The normal spectator will find this heat map tedious. So, I trichotomized it into three scoring zones: cold (blue), medium (yellow) and hot (red).
This is the baseball red-zone for the average batter facing the average pitcher in the average ballpark and so on. The 3-zone heat map for an above average batter would have more red cells and fewer blue ones. I imagine the red-zone for Mike Trout would be very large, unless he is facing Clayton Kershaw.
My mission is to determine the red-zones for all batters, pitchers, stadiums, etc. This is just the beginning of a long and interesting road.