WAR and the 2009 Mariners
A few facts you’ve probably heard tossed around the last few days:
1. The Mariners won 85 games
2. The M’s were outscored by 52 runs
3. Their pythag record, based on the above run differential, was just 76-86.
People have extrapolated all kinds of things from those facts. One of the more popular explanations for those three facts is that the team isn’t as good as their final record indicates, and that the extra wins are either due to unrepeatable good luck or the bountiful harvest of unquantifiable team chemistry. I’m here to say that both of those explanations are bunk.
Run differential is a decent tool when its used correctly and its limitations are understood. Good teams have large positive run differentials, bad teams have large negative run differentials, and teams in the middle are generally mediocre. It’s pretty hard to succeed without regularly outscoring your opponents, as should be pretty obvious, and no team is good enough to consistently win enough one or two run games to overcome a lack of talent. So, you can generally do a pretty decent job of projecting a team’s Win-Loss record based on RS-RA. No team was more than +/- 9 wins away from their pythag record this year, for instance.
That said, run differential is certainly not a perfect estimator of a team’s real abilities. Run totals can be skewed heavily by performance with men on base, either by hitters coming through in the clutch or pitchers stranding runners. In general, teams do not have a real ability to be significantly better or worse than you would expect in these situations based on their overall production, but over the sample of one season, teams performance can vary enough to affect their run totals. And that affects their pythag record, even though it’s not a real indicator of talent.
This can have a real impact on how people see a team overall. Not surprisingly to anyone who watched regularly, the Mariners were the worst hitting team in baseball with runners on base this year, hitting .254/.317/.396 with a man on. They were also worst in baseball at hitting with runners in scoring position, coming in at just .234/.312/.358. As a team, the M’s hit better with the bases empty than in pretty much any scenario where they had a chance to drive in a run (which is not normal, if you’re wondering), and while that’s frustrating to watch, it’s not an indicator that the M’s were really the worst offense in baseball – they were just a bad offense that failed in the clutch more often than they should have. And that skews their runs scored total down, which pushes their pythag record down, and, well, you get the idea.
Anyway, that’s a lot of words used to say that pythag is not luck independent, and shouldn’t be used as the be-all, end-all determinant of how “good” a team really was. In fact, we have a better way of measuring team talent level, and it’s one you should probably be familiar with by now – Wins Above Replacement.
You’ve seen us talk about WAR a lot. It’s the best measure of player value that we have, and while it’s not perfect, it’s pretty darn good. It sums up a player’s offensive and defensive value, as well as accounting for time on the field, and puts it on a scale over the production that could be expected from a pretty good Triple-A player would offer for the league minimum. At the team level, it’s the total of all the wins added by players on the roster throughout the year. And, because all of the inputs used in the formula are context free, it doesn’t know anything about the timing of specific events and isn’t affected by “clutch” performances in the way that run totals are.
WAR is a better indicator of talent level than pythag. And you know what WAR thought the M’s record “should be” this year? 83-79. The M’s got 21 wins from their position players (mostly thanks to their league best defense) and 16 wins from their pitching staff. Based on the calculations from FanGraphs, a replacement level team this year would have won ~46 games, so add those 46 wins to the 37 extra that the M’s got, and you have 83 expected wins.
In other words, the M’s weren’t a 76 win team that got really lucky or willed themselves to 10 extra wins through their harmony and hugging. They were a team that played well enough to finish two games over .500 and actually finished four games over .500. There’s nothing to explain. The M’s finished right about where we’d have expected them to, given how well they hit, caught, and pitched.
Don’t let the Pythag Police try to convince you that there’s an inevitable massive regression coming because the M’s outperformed their pythag. Their pythag underperformed their actual offensive level, and once you adjust for all of the facts that could be considered “luck” (not just some of them, as pythag does), you have to conclude that the M’s were basically the team that their final record indicates. There is certainly still a lot of work to do to get this team to be a playoff contender, but in order to know how much work needs to be done, you have to start from the right foundation. And that foundation is not pythag record.
Thanks, Derek. This helps me straighten out differences between actual record and pythag record.
That said–and knowing you’re not a fan of pythag record–has anyone tried to determine if there are commonalities between teams that significantly exceed or fall short of their pythag record? I’m only asking because the Mariners have done this in two out of the last three years, so it’s been on my mind. Your explanation convinces me it’s most likely a coincidence, but has anyone out there looked over the historical data for teams where the pythag record is way off and offered any explanations?
And by Derek, I meant Dave, of course. Sorry.
The difference between actual record and pythag record is almost always due to a deviation from normal distribution of run scoring.
For instance, a team that wins a bunch of close games and loses a bunch of blowouts will outperform their pythag record. That’s been the M’s recipe in 2007 and 2009. Their record in one run games was terrific, but they never blew anyone out, so their wins didn’t inflate their RS in the same way that their losses inflated their RA.
The opposite is also true. This year’s Indians are a great example. Won a bunch of games by big margins, but lost a bunch of close games. As such, they vastly underperformed their pythag.
In general, teams don’t really have control over the distribution of their run scoring. They can’t stop scoring in one game to save more runs for the next. Over a big enough sample, this would even out, but 162 games is not a big enough sample to get rid of all the noise.
Dave, I’ll ask something I asked in the context of 3rd order pythag in an earlier thread: using WAR, what was the expected winning % of the 2007 team?
I ask to illuminate your point about the remaining work to be done, as I think this team is at minimum probably two position/offensive players and a pitcher away from being a team that can reasonably expected to be a contender.
The M’s were a 79 win team, using WAR, in 2007. This team was clearly better than that team, not even accounting for the differences in age.
So when you say we “should” be 83 wins by WAR, are you leveraging relief innings? I think the Mariners have quite an imbalance between their high leverage guys and low leverage guys. This could skew WAR measurements (unless you already adjusted for them).
Awesome writeup. This makes me feel better about the team.
(And thanks to Joser, who has pointed out the limitations of using straight Pythag several times that I have seen.)
Leverage is included in Pitcher WAR.
Thanks, Dave. To follow up, do teams that win loads of one-run games but lose loads of blowouts tend to have anything in common? As I see it, lallaihei’s explanation above–that the M’s have a strongly pronounced difference in quality between our high-leverage and low-leverage relievers–might explain why we win close games and get far, far behind in less-than-close games. I wonder if that’s true for other teams that outperform their pythag records? Or am I just looking for patterns in what is actually randomness?
Bad offenses?
(Actually, I’d be interested in the answer, but it seems reasonable that this would be the common factor).
Thank you Dave, it’s a relief to see some levelheaded analysis which shows that the dreaded (and too often used term) “regression” isn’t waiting to pounce just around the corner.
While I agree fully that “team chemistry” wasn’t responsible for “extra wins” (which wins you show weren’t there to begin with), I think it still hasn’t been proved that some players weren’t performing better (i.e. getting higher WAR) due to a more positive attitude in the clubhouse and cohesiveness with teammates.
So are you planning on doing your “2010 plan” any time soon, Dave?
I wonder if some of your ideas might fall on something other than deaf ears this offseason.
Well, I’m not sure this is a useful kind of statement. For one thing, there’s a difference between a happy clubhouse having an effect, and a happy clubhouse having a LARGE effect. If clubhouse dynamics add only 3-4 runs over the course of an entire season, it’s something that’s negligible. If you want to assert the effect, you really have to be able to measure and quantify it.
And more importantly, you have to be able to generate these effects reliably and consistently. I think this is easier said than done, as what makes Player A happier and more productive is not the same for Player B and is counter productive for Player C. And it just might work one year and not the next. Might just be easier to get Player D, who has a higher OBP and SLG.
This agrees very closely with the BP (if I may mention them) Adjusted Standings where they sort by 3rd order wins, which is another more in depth look at what drives team wins and losses.
I wonder if certain teams tend to over-perform consistently compared to their underlying stats. The Angels always seem to.
So you’re going to maintain your position until someone manages to prove a negative?
Dave, Any comments regarding #14 in the Bill James Primer: When a team improves sharply one season they will almost always decline in the next.
I would say: Generalizations and adages are much less useful than good information.
“Dear NASA, I know you’ve done the calculations and testing to confirm that this rocket will escape the Earth’s gravity, but do you have any comments regarding the old saying what goes up, must come down?”
Bad offense, good defense, and a rotation with a couple of bad starters? When the decent starters take the hill backed up by the good defense, their pop-gun offenses usually can’t run away with the game even though the other team can’t score many runs. And when the gascans are out there, there’s just no way for the bats to catch up.
I was tempted to look at the records for close vs blowouts depending on who the starter was, but then figured it would be small-sample-size-theater.
I’m not Dave, but my comment is that saying the 2009 M’s being due for a decline because they were 25 games better than the 2008 M’s is about as valid as saying the 2009 Angels are due for a decline because they were 22 games better than the 2009 A’s. This year’s M’s are an almost entirely different team than last years.
58% of M’s plate appearances (and a similar amount of innings played on defense) and 39% of starts in 2009 were by guys not on the 2008 roster. The 2009 M’s were not a the 2008 M’s suddenly playing better and due for some regression to the mean. They were a different team with a higher level of talent.
One thing to remember is that 9.5 of those wins were provided by Washburn, Bedard, Beltre and Branyan, who range from almost certainly to quite likely not to be here next year. If Felix, Gutierrez and Ichiro regress (Felix may well get better, but the other two cannot be counted on to match their incredible 2009’s) there are a whole lot of wins to find somewhere else and not a whole heap of obvious places currently on the 40-man to find it.
When you look at the Mariners this year they were basically a .500 team. To take the team to the next level you need another 10 wins over current war)
Catcher – Moore/Johnson (At best equal to 2009)
Firstbase – Likely less production as even if you resign Branyon he wont be as good. (downgrade)
Secondbase – If its Lopez (equal) if its Tua (downgrade)
Thirdbase -Should be an upgrade if Beltre resigns, if its Tua (downgrade), Hall (downgrade), Figgins (upgrade)
Shortstop – all options equally bad (equal)
RF – Ichiro (Atbest equal likely down from 2009)
CF – Death to flying things – (equal)
LF- Saunders (equal)
DH – Is a question mark but I gotta think it will be a more productive position. (upgrade)
SP – Likely equal but possible downgrade – King down a bit, RRS up a bit, Bedard/Washburn’s numbers are gone and it will be tough for Morrow/Snell to match them and the number 5 will be below average no matter who they choose.
RP – equal
So as I see it if the M’s can sign a quality 3B Beltre or Figgins they are a 77 – 83 win team just like this year. I just don’t see where the Mariners will get those 10 wins they need as they don’t have 30 million extra to try to buy it, and the likely free agents are mostly right handed how don’t hit at Safeco.
I still think the M’s are 2 years away from a 90- 95 win team and they need to focus on the rebuilding and then the last of the Bavasi contracts will be over so you can really have a 100M payroll and win.
“Shortstop – all options equally bad (equal)”
Really, Do you know how really AWEFUL Betencourt was for 63 games?
Oops! Awful!
once you adjust for all of the facts that could be considered “luckâ€
This is a bit of an overstatement. Certainly no metric can adjust completely for luck, but I get the larger point the WAR is better the Pythagorean Theorem of Baseball.
I think too many people here are overestimating how good the Mariners were this year.
Don’t get me wrong, it will definitely be a daunting task to improve on this season, but to pretend like this team has limited areas for improvement is utterly ridiculous.
1. Yuniesky Betancourt played half a season for the Mariners as their starting Shortstop, and could very well have been the worst regular in baseball for 2009.
2. Adrian Beltre, Russell Branyan, Erik Bedard and Ryan Rowland-Smith all spent significant time on the DL.
3. Griffey and Sweeney were the regulars at DH.
4. Morrow was the closer for a not-insubstantial amount of time.
Certainly this team was able to achieve quite a bit, and I have no doubt that they earned every one of their wins, but there are clear improvements to be made for a creative and motivated GM.
Excellent article, thanks! I was definitely under the impression before that the Mariners really should have won only ~78ish wins this year, but got lucky, and that we would need some roster improvements next year just to get into the 80s.
How did the team perform in 2007 and 2008 in terms of WAR? Would this have predicted that the 2007 team had been lucky and would be expected to regress in 2008?
Thanks!
Dave mentioned this earlier in the comments.
As for 2008, they were a 66 win team using WAR. (Assuming I’m doing it right, 46 wins at replacement level, +10 wins from pitching, +10 wins from batting)
You can find this information on Fangraphs (Select year, then batters or pitchers and sort by WAR)
@ Seattleken,
Some of your points are well taken, though I think the overall analysis is rather ham-fisted. To wit:
C: Johjima/Johnson/Burke (1.0/0.4/-0.4) was worth +1 WAR this season (leaving out Quiroz who played very little and at replacement level). I think it’s a bit early to say one way or another what kind of WAR Moore might have next year, but if Johnson stays at +0.4 (and there is room there, obviously, for some improvement), Moore would have to be a +0.6 player assuming he and RoJo are the tandem next year. The jury will have to stay out until Moore plays a full season, but given his skills it seems he certainly has the ability to produce more value than that.
1B — I’m not certain on what basis you expect Branyan to regress. Yes, he had the best WAR of his career and the third best on the 2009 M’s, but this was also the first time he’d played a full season (not counting the injury) as a starter. His plate discipline and and batted ball numbers for 2009 ended up being pretty much in line with his career averages, so assuming he comes back more or less healthy, I don’t see how he rates as a downgrade.
3B — Pretty much anyone other than Beltre is going to be a downgrade defensively, and just because Figgins had a typical UZR/150 for him at 3B, there is no way, NO WAY, he is a +6.0 WAR player again like he was last year. In fact, the +2.9 WAR Beltre had this year is pretty much right in line with Figgins’ career average over the six seasons in which he’s had significant playing time, INCLUDING this year’s number. ON that basis alone, Figgins would be a push, and others have raised the issue that the M’s probably don’t need another singles-hitting pop-gun in this lineup.
I’ve read several comments like this over the past several days… Is everyone really that down on Jack Wilson? Assuming his injuries aren’t a sign of things to come, his 3 year WAR line is 2.6/1.6/2.0… I don’t get the lack of optimism from some of the people commenting here.
I believe that Geoff Baker made a comment on one of the Mitch’s shows over the last couple weeks that came down to:
“Scout/People believe that the Mariners played so hard and in so many one run games that it isn’t sustainable year over year because you can’t expect a team to always treat every game like its must win”
I am 99% assuming this is just another one of his random opinions he passes off as fact. However, I was wondering if there is any future impact to a team playing so many one run games in one year. Does the team not stay as mentally tough as Baker claims?
Both the gloom and the sunshine are warranted. The M’s are another +10 WAR away from being a solid playoff contender, and they have another 10 WAR or so out, or potentially out, the door after the season (Beltre, Branyan, Bedard, Washburn mostly). So, they need to find 20 or so WAR in the offseason to be a legit playoff team (assuming regressions and progressions for the guys who stay balance, which seems reasonable).
That 20 WAR has to fit into roughly seven open* slots (1B, 3B, LF, DH, 3xSP) plus whatever upgrades Zduriencik can make over the returning guys (2B and SS mostly). That’s almost +3 WAR per position. Hmmm…
The gloomy side is that isn’t going to be easy – that’s finding half a team worth of above average players.
The bright side is we have 2 legit All-Star outfielders (one entering his prime) and one legit candidate for best-pitcher-in-baseball in the rotation. Those three positions should give up +15 or better WAR, and that’s a hell of a core to build around.
It’s not easy to build a championship caliber MLB team – it ain’t tiddlywinks at that level. Luckily we have a GM who is good at his job and that’s reason to be optimistic.
* 1B and 3B count as open though resigning Beltre or Branyan is certainly an option for filling them.
Note that all of his positive WAR comes from defense. Many of the people “down” on Jack Wilson seem to be looking only at his offensive numbers, and evidently don’t believe a run saved is equal to a run earned. I’m not sure why so many people think offensive production has to come from the SS (the Jeter/Nomah/ARod/etc “super SS” era is almost a decade gone by now) but that seems to be what is skewing the opinions of some. Yeah, the team needs offense and it would be nice to get more from every position, but trading superb defense for maybe average offense is a fool’s bargain, especially with this pitching staff.
It’s probably also true that Wilson played here so briefly before he got injured that many people don’t have a sense of his defensive worth (and don’t look at / believe the defensive stats) so the offensive part of his portrait is the only one they’re seeing.
You know, I’ve read several articles elsewhere that credit Wakamatsu and “chemistry” for enabling the team to outplay its Pythagorean projection. But when you look at this:
…couldn’t you use that as an argument against Wakamatsu and chemistry? I mean, wouldn’t a good coach have found a way to make a team score clutch runs? Would the team maybe have been a little more “clutch” if they weren’t having so much fun?
(For the record, I’m playing at devilish advocacy here: looking at third order wins — which is the more accurate version of the Pythagorean projection — the team was a “real talent” 83-79, just two wins less than their real record, so there’s nothing really to “explain”… except maybe why so many commentators don’t use 3rd order wins. Oh, yeah: it’s harder to explain and it would mean they wouldn’t have a hook on which to hang an article.)
For what it’s worth, calculation by completely different methods also concludes that, based on what happened between the white lines, the Mariners “should have” won 83 games.
The so-called “Pythagorean” method is a second-order predictor, meaning it works with actual runs; third-order methods rely on projections for runs, then apply a games-won formula of some sort. It is dangerous to rely on second-order methods to judge a team’s performance.