I Resurrected a Dead Relief Pitcher Stat (Part One: A Re-Introduction to Goose Eggs and gWAR)
In April 2017, Nate Silver of the poll analysis, politics, economics, and occasional sports blogging wing of the Disney World Empire known as FiveThirtyEight, published an article on the creation of a new baseball stat called the Goose Egg.
The Goose Egg serves the purpose of being a replacement for the outdated “save” stat, which had long been determined to be the benchmark for reliever success, and the statistical basis for the “closer” role.
The save stat certainly doesn’t hold the same weight in modern baseball discourse that it did even ten or fifteen years ago, so for those of you who need a refresher course, these are the stipulations for the save stat, according to MLB.com:
A save is awarded to the relief pitcher who finishes a game for the winning team, under certain circumstances. A pitcher cannot receive a save and a win in the same game.
A relief pitcher recording a save must preserve his team's lead while doing one of the following:
Enter the game with a lead of no more than three runs and pitch at least one inning.
Enter the game with the tying run in the on-deck circle, at the plate or on the bases.
Pitch at least three innings.
Right away, you might notice a couple stipulations that don’t make sense. Specifically the first and third ones.
Enter the game with a lead of no more than three runs and pitch at least one inning.
While this might seem alright to some, after all, crediting a reliever with protecting a narrow lead is kind of the point, it does prompt a question: What even are the highest leverage situations for a reliever to come into? Well, thanks to Tom Tango’s Leverage Index, we actually know.
Leverage Index tells us that a pitcher coming into the seventh inning or later with either a tie game or a lead no higher than two runs constitute the highest leverage situation (or a “goose situation”). Leading by three runs isn’t actually seen as that serious of a situation until it gets really out of hand.
Oh, and this?
Pitch at least three innings.
It feels a little silly to credit a long reliever with protecting a large lead over three innings of low leverage ball. And the stat as a whole hinging on the reliever finishing the game ignores the fact that sometimes, the batters a reliever is slated to face in the bottom of the ninth aren’t necessarily going to be the opposition’s best. If I’m facing, oh let’s say the 2016 Toronto Blue Jays, three is a massive difference between facing the bottom three of the order (Justin Smoak, Kevin Pillar, Ezequiel Carrera) and facing the top three (Devon Travis, Josh Donaldson, Edwin Encarnación). If I were, say, 2016 Baltimore Orioles manager Buck Showalter in a crucial do-or-die game, I would bring in my elite closer Zach Britton in to face those elite hitters instead of holding out for the save and calling in, let’s say, a bad starting pitcher such as Ubaldo Jiménez. Just a hypothetical experiment.
The save is, frankly, kind of a garbage stat because it fails at what it seeks out to do: Measure relief pitcher value. Brad Hand led MLB with 16 saves in 2020 despite being quite mediocre overall. Never mind the damage it did to managers’ brains for decades, convincing them to hold on to their best relief pitchers to wait out a potential save situation instead of using them in the highest leverage situations possible. What would be more useful is a stat detailing how good relief pitchers perform in actual high leverage situations. Enter the Goose Egg.
The Goose Egg is named for Hall of Fame relief pitcher Rich “Goose” Gossage, who enjoyed tremendous success in the late 70s and 80s as a “fireman”, a reliever who would come into pressure situations to quash them. I.E. to “put out the fires”. Indeed, as of the original article’s publishing in 2017, Gossage actually leads MLB in career Goose Eggs.
He’s also only less insane and shitty of a person than Curt Schilling by the thinnest of possible margins, but that’s neither here nor there.
But what actually constitutes a “Goose Egg”? A Goose Egg occurs when, as Silver puts it:
A relief pitcher records a goose egg for each inning in which:
It’s the seventh inning or later;
At the time the pitcher faces his first batter of the inning:
His team leads by no more than two runs, or
The score is tied, or
The tying run is on base or at bat
No runs (earned or unearned) are charged to the pitcher in the inning and no inherited runners score while the pitcher is in the game; and
The pitcher either:
Records three outs (one inning pitched), or
Records at least one out, and the number of outs recorded plus the number of inherited runners totals at least three.
The goose egg is superior to the save because it’s applied in consistent high leverage situations, in which the opposition is only a handful of mistakes at most away from tying the game or taking the lead. A save can still technically be won by a closer who comes in with a four-run lead and promptly gives up back-to-back home runs, walks four consecutive batters, and gives up three terrifying lineouts. There isn’t as much margin for error for Goose Eggs.
Of course, there’s a companion stat for the opposite, when a reliever can’t get a handle on a high-leverage situation. Enter the “broken egg”. Or if you prefer, a “blown goose”.
A relief pitcher records a broken egg for each inning in which:
He could have gotten a goose egg if he’d recorded enough outs;
At least one earned run is charged to the pitcher; and
The pitcher does not close out the win for his team.
This does create some situations where the last relief pitcher can give up a run, provided he gets the last out of the game and secures the win for his team, in which he doesn’t get rewarded with a goose egg, but doesn’t get handed a blown goose either. Ditto if a pitcher gives up an unearned run at any point.
You can take goose eggs and divide them by “goose opportunities” (goose eggs plus blown geese) to get a conversion rate (“goose percentage”), which usually hovers around 75% from year to year. Once you get a given player’s conversion rate, the league-average conversion rate, and their team’s park factor, you can calculate yet another stat: gWAR (Goose Wins Above Replacement, not the cult shock metal band from the 90s). Because if there’s one thing baseball discourse needed, it was another goddamn version of WAR.
The formula for gWAR is described by Silver as follows:
GWAR = .52 * (GOPP) * (pitcher’s GPCT – replacement-level GPCT)
In the formula, GOPP is goose opportunities (goose eggs + broken eggs) and GPCT is goose percentage (goose eggs divided by goose opportunities).
Replacement-level GPCT, which adjusts for park and league effects, is calculated as follows:
Replacement-level GPCT = league GPCT + .105 – .0014 * PPF
… where league GPCT is the leaguewide goose percentage (that is, for the American League or the National League, rather than for the major leagues combined) and PPF is the Baseball-Reference.com pitching park factor for the pitcher’s home stadium.
So with gWAR, we have an advanced stat that shows how much value a reliever has provided for their team in high leverage situations, rewarding them for repeated usage and success in those situations, while park-adjusting the results for a slightly more accurate assessment of how well they pitched.
Goose eggs and gWAR have a higher correlation with Win Probability Added (WPA) than saves, which is a pretty good indicator that they should stick around while saves should go straight to the garbage. And it looked for a while that FiveThirtyEight would make a concerted effort to push it as a legitimate, respectable baseball stat, with updates in May and August 2017 to that season’s goose egg and gWAR standings, with appropriate time taken to gush over Kenley Jansen’s insane season.
And then we promptly never heard from gWAR again. Not a single honk. While FiveThirtyEight and Silver were busy debating how Kirsten Gillibrand or Pete Buttigieg or whoever the fuck were polling among the ever crucial “STEMlord warmonger” voting demographic, goose eggs faded from whatever tiny niche of popular baseball consciousness that they occupied.
Except for mine. Because I truly have nothing better to do with my time.
I’ve played with the idea of keeping track of gWAR myself basically each season since I read the original article. Each season, I would start with a spreadsheet and, with my rudimentary-at-best math skills and Excel knowledge, manually insert values for each regular season game of Major League Baseball. And, inevitably, I would fall off at some point. Until 2020 when, thanks in huge part to the shortened season, I was able to keep track of the whole ride. Here are the results.
For 2021, I decided to try to make it through a full season. And, despite a few frustrating afternoons spent catching up on weeks’ worth of box scores that I had been too lazy to record the day of, I accomplished just that! Throughout the next couple weeks, as the postseason rages on in the background, we here at JAYSLAM are going to be taking a look at the results in several posts. The first will take a look at the Toronto Blue Jays specifically because I have a brand to keep up. The second and third ones will look at the American and National Leagues respectively, with my spreadsheets for each being available in the articles, and with the NL article being exclusive to paid subscribers.
Oh right, I knew I forgot something!
if you’ve been enjoying Jayslam, want to see more of it, or want to support its continued existence, consider getting a paid subscription! Paid subscriptions get you access to exclusive posts such as the NL one in a week or two.
Before we get into it though, I would be remiss if I didn’t mention some flaws and problems I have with the goose egg as a stat. For all their positive aspects, the goose family of stats is not the definitive stat for relief pitchers. I would point towards WPA and Fangraphs’ related Shutdown/Meltdown stats (which do basically the same thing as goose eggs) as being better encapsulations of a relief pitcher’s value. SIERA and Baseball Prospectus’ DRA (Deserved Run Average) are probably my favourite stats for measuring any pitchers’ ability to prevent runs. But none of these are perfect either.
To say that Tyler Chatwood (0.7 gWAR) was a better reliever in 2021 than Adam Cimber (0.0 gWAR) because of their gWARs is not only probably factually suspect, but it shuts out a lot of the nuances of what makes these pitchers good or otherwise. And the same goes for a lot of the stats I mentioned above. For a holistic look at each pitchers’ particularities, it’s always best (and more fun, really) to look at multiple different stats instead of one supposedly all-encompassing one.
Goose eggs don’t take into account the fact that, as Tango’s graphs show, you can have a high leverage situation before the seventh inning. Coming into a tie game with the bases loaded in the sixth inning is most certainly a high leverage situation, but unfortunately, these instances are excluded from the goose egg and are instead better encapsulated in Fangraphs’ Shutdown and Meltdown. Also, much like the save, the goose egg does not take opponent’s strength into account.
My biggest problem with the goose egg is that while it does a fine job at showing you what has happened in terms of pure results, it’s not actually predictive, and can’t tell you anything about how good a pitcher actually performed. If I come into the game with the bases empty and with a one-run lead, walk the bases loaded, and get three line drive out that the fielders can only stop with the most catlike of reflexes, I may still get the goose egg, but I don’t believe it would be accurate to say that I actually pitched well. For those purposes, it’s best to look at Baseball Savant to get the most in-depth look possible at how the pitcher’s velocity, spin rates, and hard contact were looking, as well as whether they were striking hitters out or walking the field.
Lastly, keep in mind that all the goose egg stats compiled were done so manually by me. As such, they are very susceptible to human error. I like to think that I was meticulous enough to avoid major mistakes, I can’t guarantee that one hundred percent.
Now let’s goose it up.