My last post was devoted to answering the ultimate baseball question: who was the greatest hitter? I plan to answer similar questions in the future, such as ‘who was the greatest pitcher?’ and so forth. But all of that is a side excursion from our main path, predicting what happens next in a live baseball game.
Anticipation of what is about to happen next is a primary cause of interest in observing a sporting event. Most spectator sports consist of numerous small skirmishes between the opposing sides. Winning skirmishes leads to winning more significant contests, battles. And winning battles leads to winning the war, the game. For American football, seeking a first down is a skirmish, a series of downs is a battle. For baseball, a plate appearance is a skirmish and an inning is a battle.
For today’s post, I want to return to the concept of a baseball Red-Zone. In a previous post, I showed a heat map of the 288 locations of a baseball game that can occur within its six dimensions: balls, strikes, outs and three bases. That Red-Zone was calculated at the inning level or in other words for battles. When I say I want to predict what happens next, I mean by the end of the plate appearance and even on the next pitch.
Consequently, a baseball Red-Zone would apply to a plate appearance as well as an inning. To see how this is done, look at Figure 1, the half-inning scoring heat map below. Illustrating six dimensions in a two-dimensional space is hard to do. I used one axis to measure outs and balls (the east-west dimension) and the other axis to measure bases and strikes (the north-south dimension). This resulted in a 12 by 24 matrix of 288 cells.
Figure 1. Half-Inning Scoring Zones (1988-2019)
Note: The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at “www.retrosheet.org”. Retrosheet provides play-by-play data on most MLB baseball games from 1918 to 1931 and all games since 1932; however, data on every pitch in every MLB game dates back only to 1988.
Notice that there are three rows for each combination of occupied bases and eight rows for each value of strikes. There are four columns for each value of outs and three columns for each value of balls. This juxtaposition of dimensions is dictated by the need to produce a rectangular matrix.
What is not dictated though is the order in which the axes are sorted. The north-south axis is sorted first by bases and then by strikes. The east-west axis is sorted first by outs and then by balls. I could have sorted them differently, but I chose to sort them in this way to achieve a visual effect.
I wanted to illustrate the fact that the more balls and the less strikes that are called, the greater is the potential for scoring. And I wanted to illustrate that the fewer outs and the more runners on base closer to home there is, the greater is the potential for scoring. Since outs and bases dominate in the determination of scoring at the inning-level, it was necessary to sort by those dimensions first. This led to a clear picture of three relatively intact scoring zones: a blue zone (i.e. low-scoring) in the southwestern cells, a red zone (i.e. high-scoring) in the northeastern cells and a yellow zone that clearly delineates them.
If, instead, I had sorted the axes primarily by balls and strikes, the picture would have looked like Figure 2.
Figure 2. Half-Inning Scoring Zones Resorted (1988-2019)
The relationships between balls vs. strikes and outs vs. bases is the same as in Figure 1. But due to the different sorting of the axes, it is harder to discern this from looking at the heat map.
The Plate Appearance Red-Zone
For long-time baseball fans, it should come as no shock to learn that when it comes to determining the outcomes of plate appearances, balls and strikes — not outs and bases — dominate. What might come as more of a surprise though, is the relationship between positive outcomes and outs vs. bases.
Figure 3 shows a heat map of the average plate appearance outcomes for each of the 288 baseball game locations. The north-south axis is sorted first by strikes and then by bases. The east-west axis is sorted by balls and then by outs. If you recall from my last post, a plate appearance outcome is the number of runs
scored during the PA (RBI+) plus the change in game location (ΔGL). The outcomes range from negative 0.31 to positive 0.62 and average zero. The lower the value, the bluer the background and the higher the value, the redder the background.
A matrix of 288 numbers can be very tedious to contemplate, so Figure 4 presents a trichotomized version that drops the Arabic numbers. Although the axes of Figure 4 are sorted in the same way as the axes in Figure 2, the color pattern in Figure 4 is similar to that of Figure 1. Blue cells cluster around the southwestern region and red cells cluster around the northeastern region. Yellow cells form a diagonal from the northwest to the southeast.
The pattern confirms the theory that positive outcomes generally increase with the count of balls and the number of runners on base and decrease with the count of strikes and the number of outs, but there are several exceptions. For example, according to this theory, the worst location for the batter should be two outs, no runners on third, second or first, no balls and two strikes (location 200002). According to Figure 3 however, the worst location for an individual batter is one out, the bases loaded, no balls and two strikes (111102). The reason is that with two strikes and no balls, the batter can’t afford to not swing at the next pitch unless it is far outside the strike zone. This makes a strike out or even worse, a double play, quite likely. Since the bases are loaded, the opportunity cost of that outcome would be very high.
Another exception is that this theory predicts location 011130 (i.e. no outs, the bases loaded, three balls and no strikes) would be the best location for the batter. But the best outcome is at 211130 (i.e. two outs) instead. As to why, I can only conjecture. When there are two outs, a double-play is not possible. But I suspect the main reason is that with two outs, the batter is less likely to try for a homerun and therefore is more likely to put the ball into play or walk in a run.
From this matrix of plate appearance outcomes, we can estimate the impact an individual batter has, based on his past history, as well as other effect modifiers like who is pitching, the stadium in which the game is played, etc.