MLB Strikeout Distribution Analysis


As you settle into your seat, the crack of the bat echoing around the stadium, the unmistakable energy of a Major League Baseball game surrounds you. Among the multitude of plays, hits, and runs, there’s one particular event that stands out in today’s era of baseball: the strikeout. Whether you’re a seasoned baseball aficionado or a curious newcomer, prepare to embark on a journey through baseball’s most electric three-pitch sequence.

With the ever growing number of player props in both daily fantasy games and sports betting, understanding the distribution of strikeouts is very important. Let’s delve deep into the statistics, dsitribution theory, and strategies that make up MLB’s strikeout narrative.

The Model That Wasn’t Used

Modeling this strikeout distribution did not come without it’s challenges. The first thing that I looked at was if it was possible to use the Poission distribution in this model. Is the Poission suitable? The short answer is: no!

The long answer: The Poisson distribution is often used to model the number of times an event occurs within a fixed interval of time or space. It assumes that these events are rare and occur independently of each other. For example, the Poisson distribution is sometimes used to model the number of phone calls an office receives in an hour, or the number of accidents occurring on a stretch of road in a day. So, why doesn’t a Poisson work? A few reasons. One, the results do not occur independent of each other which breaks the first assumption of the Poisson model. Also, there is no fixed interval which breaks a second assumption of the model. Plotting Poisson expectations compared to real-life metrics shows a high level of underdispersion which happens when data exhibit less variation than you would expect. This was clearly not the way, so I moved on.

The Distribution Model

The solution that gives the most accurate results can be found in the Beta-Binomial. In the realm of statistics, the Beta-Binomial distribution elegantly captures the intricacies of observing binary outcomes with underlying uncertainty in probability. At its core, this distribution can be thought of as a blend between the Binomial and the Beta distributions.

Imagine flipping a coin multiple times. If the coin is fair, the chances of getting heads or tails is constant—represented by a Binomial distribution. But what if our coin’s fairness is uncertain, and the actual probability of landing heads can itself vary? Enter the Beta distribution – a continuous distribution capturing the uncertainty in the probability of a binary event. When we mix the Binomial distribution, which captures the number of successes in a fixed number of trials, with the Beta distribution, which models the variability in success probability, we get the Beta-Binomial distribution.

Application of Strikeout Data to the Beta-Binomial

Now that we’ve gotten all of the technical stuff of out the way, here is what our strikeout distribution chart looks like…

Strikeout Distribution
Median K’s Mean K’s 0 1 2 3 4 5 6 7 8 9 >9
3 3.224 5.98% 14.48% 19.7% 19.68% 15.94% 11.01% 6.66% 3.59% 1.74% 0.76% 0.46%
3.5 3.721 3.73% 10.62% 16.68% 18.97% 17.31% 13.34% 8.95% 5.32% 2.82% 1.35% 0.91%
4 4.194 2.35% 7.69% 13.68% 17.43% 17.67% 15.03% 11.06% 7.17% 4.14% 2.14% 1.64%
4.5 4.686 1.44% 5.36% 10.75% 15.29% 17.17% 16.08% 12.97% 9.17% 5.76% 3.23% 2.79%
5 5.162 8.25% 12.97% 16% 16.38% 14.37% 11.02% 7.48% 4.52% 4.41%
5.5 5.652 6.11% 10.59% 14.31% 15.99% 15.25% 12.68% 9.3% 6.06% 6.7%
6 6.13 8.42% 12.4% 15.03% 15.51% 13.91% 10.98% 7.69% 9.65%
6.5 6.618 6.47% 10.36% 13.62% 15.19% 14.69% 12.48% 9.39% 13.44%
7 7.098 8.43% 11.95% 14.36% 14.93% 13.61% 10.97% 17.96%
7.5 7.585 6.63% 10.14% 13.11% 14.64% 14.32% 12.36% 23.33%
8 8.065 8.37% 11.61% 13.9% 14.55% 13.43% 29.29%
8.5 8.551 6.69% 9.95% 12.76% 14.29% 14.11% 35.9%
9 9.033 8.28% 11.36% 13.59% 14.33% 42.84%
9.5 9.517 6.68% 9.79% 12.52% 14.09% 50%

There are a few very important things to note about this tables. First and foremost, I’ve split up the two columns that describe “expected strikeouts” into median and mean K’s. When there is a skewed distribution like this, it’s incredibly important to understand the difference. For instance, if a pitcher averages (his mean strikeouts per game) 4.194 K’s per game, then his over/under (median K’s) is exactly 4. So, when analyzing numbers on your own, it is crucial to understand the dichotomy between the mean and median strikeouts per pitcher. Second, you’ll notice that there are missing values starting with a median of five and these increase as the median K’s go higher. There is element of this distribution that the beta-binomial doesn’t accurately capture and that is that the lower numbers don’t accurately fit the distribution curve. For example, at a median of 9.5 strikeouts, we’d expect 0 strikeouts to occur 0.01% (or one in 10,000) of the time. This simply isn’t the case as pitchers can get injured, pulled early, etc.

Applying a Strategy to Daily Fantasy or Sports Betting

Generally, a bettor has to win around 55% of the time to be profitable with the daily fantasy apps and about 54% of the time at sportsbooks (30-cent lines). As a general rule, if your projection is 0.5 strikeouts different than the sportsbook, then it is a worthwhile bet. Obviously, this assumes that your projection is better and more accurate than the sportsbooks projection. This is much easier to accomplish with player props as sportsbooks and daily fantasy apps can have a hard time keeping up with all the events and players.

Hope that this helps in your journey to become a smarter sports bettor! Good luck!