The Z Files: It's Early, But ...

Written by

Updated on August 6, 2020 1:21PM EST

It's early, but …

Those words don't belong together. To be fair, I'm paraphrasing Tom Tango who once said the same about small sample sizes.

Unfortunately, due to the nature of the truncated season and the need to make roster decisions, many are basing analysis on, "It's early, but …"

One of the means many recommend to make decisions based more on data, less on a whim is the notion of stability points. The premise is that new skill levels are displayed at varying rates for different metrics. The common threshold is 50 percent. That is, the stabilization point is the time for when the metric is driven half by skill, half by perceived luck.

Since the outcomes are half skill, regression is necessary when determining the new skill level. For instance, the stabilization point for a batter's strikeout rate is usually reported to be 60 plate appearances. Ergo, a hitter's new strikeout rate is the average of what was initially projected and the mark after 60 trips to the dish, a mere 12-15 games – essentially two weeks into the season.

When this concept was first introduced over 10 years ago by Russell Carleton, then known as Pizza Cutter, I jumped all over it, using the stabilization points of all pertinent metrics to drive my rest-of-season projections. Man, I felt like I was on the cutting edge of next-level analysis.

The same process is being utilized today. The problem is this isn't a humblebrag decreeing I've been doing this

It's early, but …

Those words don't belong together. To be fair, I'm paraphrasing Tom Tango who once said the same about small sample sizes.

Unfortunately, due to the nature of the truncated season and the need to make roster decisions, many are basing analysis on, "It's early, but …"

The same process is being utilized today. The problem is this isn't a humblebrag decreeing I've been doing this for a dozen years. I stopped several years ago when Carleton explained we're misusing the concept of stabilization points. In fact, I've been talking about this ever since, but yet there are still some prominent analysts championing the application of stabilization points without noting the flaw.

Carleton has explained while the skill level may be 50 percent at whatever the stabilization point is for that metric, it may not be the same level in the next "X" plate appearances, or balls in play – whatever is used as the measuring stick. All the stabilization points confer is the skill level with "X" is half real. It doesn't mean the level will be the same for each "X". The computation described above assumes the player's skill level is repeated every "X".

It may be those utilizing stabilization points understand this but don't do an adequate job conveying it and/or aren't transparent with their method. My advice is two-fold. First, if you have a solid grasp of statistics, spend some time researching stabilization points -- while I'm not afraid to admit I have some knowledge of the subject and usually understand enough to apply advance statistical research, conducting said research isn't my strong suit. I'm good with some of the more basic applications; I'm limited with advanced methods. Second, if you prefer to go by a trusted analyst, at least decide if they appear to have a handle on the described flaw and adjust accordingly.

To that end, in the spirit of transparency, I still incorporate stabilization points in my rest-of-season projections. Even though a batter may not own (or half own) his strikeout rate after 60 plate appearances, it seems to me a change in contact skills is real sooner than metrics with longer stabilization points. As such, my rest-of-season projection engine for hitters and pitchers utilize regression of current skill to the expected rate, but the extent of the regression is softened so that the expected skill carries more of the weight for a longer period than strictly dictated by the 50/50 split at the stabilization points. Admittedly, my regressions are empirical, but the relative degrees are fueled by the stabilization points.

Changing the subject, a common, "It's early, but…" topic is the 2020 hitting landscape, specific to how the ball is playing. Personal research indicates it takes 350 games to get a read, and after Wednesday's play, 166 are in the books so we're not quite halfway there. Furthermore, the research is based on each team playing close to the same number of games, and the delays incurred by the Marlins, Cardinals and Phillies among others could be skewing the data.

As such, the following is simply informational, not meant to be an interpretation as to how the rest of the season will play out. As David Letterman used to say in introducing Stupid Pet Tricks, "This is only an exhibition. This is not a competition. Please, no wagering."

Here is some data from 2017-2019. What is shown is the monthly home run rate, expressed as homers per plate appearance.

Season	Mar/Apr	May	Jun	Jul	Aug	Sep	Season
2019	3.44%	3.58%	3.67%	3.74%	3.85%	3.55%	3.63%
2018	2.82%	3.09%	3.07%	2.99%	3.15%	2.98%	3.02%
2017	3.08%	3.28%	3.53%	3.25%	3.47%	3.14%	3.29%

Note the remarkably consistent rate over the three-year period. The seasonal rate is 0.19 to 0.22 percent higher than in March/April the past three seasons. Therefore, had the season played out normally, on around April 21, or 350 games in, the home run rate at that time would have been about 0.2 percent lower than the expected level for the entire 2020 campaign.

Currently, after 166 games the HR rate is 3.2 percent, about halfway between 2017 and the happy fun ball from last year. Even if we were at 350 games, comparing to the above is problematic. The main reason March/April numbers are lower is temperature. This is supported by the lesser drop in September.

The obvious fix is comparing to July and August and assuming a drop next month. However, the unbalanced nature of the schedules and the way the clubs held a three-week summer camp before returning to action render the comparison sketchy, at best. It's really not an apples to apples scenario.

Let's look from another perspective. Research first published by Mike Podhorzer from Fangraphs shows home run rate correlates very well to average fly ball distance. Please note this is average fly ball, not average home run distance. This quirk is what makes the finding so illuminating.

Here's data from July 24 through August 5 over the last four seasons.

Season	%HR	Ave FB Dist	Games
2020	3.20%	316.9 ft.	164
2019	3.70%	324.2 ft.	174
2018	3.30%	318.4 ft.	176
2017	3.30%	319.0 ft.	176

There have been a dozen or so fewer games this season, but the total is adequate to compare without the difference being an issue. Using the same time frame should mitigate, if not eliminate, temperature and the like from the equation. What's left is how the ball is playing, along with the unbalanced nature of the early schedule and the unique nature of playing with summer camp as the main preparation. The latter two render any conclusion as still speculation. It may appear as if the ball isn't experiencing the lack of wind resistance incurred last season, but the drop in distance could also be due to the fact that, Aaron Judge notwithstanding, batters' timing isn't there yet and they aren't making the same authoritative contact they will once they've caught up to the pitchers.

Something not considered is that the 2020 data is devoid of pitchers batting performance. This is relevant, since both the HR rate and average fly ball distance should increase without pitchers stepping into the batters box.

Here is more evidence pitching is way ahead of hitting, or that batters have been unduly unlucky.

Season	K%	BB%	BABIP	ERA
2020	23.80%	9.30%	0.277	4.12
2019	23.00%	8.50%	0.296	4.51
2018	22.30%	8.50%	0.293	4.15
2017	21.60%	8.50%	0.297	4.36

The fact that strikeouts are up and homers are down affects batting average, but it shouldn't influence BABIP (batting average on balls in play). Can over 15 points below normal be solely due to the entire league having a streak of bad luck, or is there something else in play?

Here is the BABIP from the last three years from July 24 to August 5:

Season	BABIP
2019	0.297
2018	0.299
2017	0.297

This may be the same time frame, but by this point in prior campaigns, hitters had found their groove.

Admittedly, the best research would be rolling 366-game slices over the past few years to determine if there was any slice around .277. In lieu of that, here is the data from a comparable number of games beginning each season:

Season	BABIP
2019	0.290
2018	0.288
2017	0.285

At minimum, there's evidence to suggest hitter's contact out of the gate isn't as solid as later in the season. The same could easily be true now, and even exaggerated.

As such, I'm personally taking any analysis conducted on what's happened to date with a grain of salt. There's no way to quantify the effect of summer camp instead of a standard array of Grapefruit and Cactus League games. Be it the ball, or quality of pitching, it's too early -- no ifs, ands or buts about it.