Running the 2008 season a hundred times
The results of my last couple sim seasons has been bothering me, since it diverged so far general opinion and from what I thought it would be. I decided to look at this in a lot more depth. I used the ZiPS projections for 2008 and SG’s quite useful RunDMB program, cranked up the USSM Labs Comp-u-matik 2000, and went at it.
Then I put together a likely M’s lineup, cheated a little by turning Betancourt’s defensive rating up, and ran a hundred seasons. This took a day.
Your 2008 simulated Mariners:
Average record: 77-85
Average runs scored: 716
Average runs allowed: 759
Number of times they won the division: 6
Number of times they won the wild card: 0
Best season: 93-69
Worst season: 59-103
Best offense: 804 runs
Worst offense: 607 runs
Best run prevention: 674 runs allowed
Worst run prevention: 866 runs allowed
Standard deviation for wins: 71-84
Standard deviation for offense: 674-757
Standard deviation for run prevention: 717-801
The division favorite was not the Angels but the torn-down Athletics, 47% to 42%, and Texas won the division almost as often as the M’s. The A’s-Angels thing is as much a shock as anything. General analyst-on-TV-or-radio seems to be that it’s all about the M’s-Angels, but Oakland fields the best pitching/defense combination in the AL and their offense is decent too.
Added: since this seems to be causing a lot of hostility, I’ll explain part of what’s going on here. The projection set doesn’t include known, current injuries — so when you look at a depth chart and you see that a bunch of their starters are likely to be guys like Saarloos or even Greg Smith because so many will be down for the start of the season, that’s the rub — these weren’t run with them starting off injured for x days and then coming back. You can make your own assessment of how important that is to the outcomes, but they only play two series against the A’s for five games in the first two months of the year. So even if you want to throw out their season finish, the M’s aren’t going to move up substantially in the win column by taking more games from Oakland.
Back to the M’s, though.
Here’s some graphs of the distributions:
That’s a scatter with smoothed lines.
And for the people who complain about graphs without absolute axis bounds:
And by request, the bar graph
Here’s the cumulative probability:
27% of seasons were 74 wins or under
50% of seasons were 76 wins or under.
77% of the seasons were 82 wins or under
95% of the seasons were 86 wins or under
99% of the seasons were 91 wins or under
I did a projection post back in January. Pre-Bedard-trade, of course, it’s still an interesting contrast. I guessed at ~795 runs scored and ~780 runs allowed. Compared to what I got simulating, that’s too optimistic on the run prevention and way, way optimistic on the runs scored. Clearly, replacing the fifth starter with Bedard and tossing Sherill aside makes a difference in run prevention, but it’s interesting that the overall difference wasn’t all that large — I’d have thought it’d be a lot more than 20 runs. But considering that the trade meant a pretty large defensive downgrade in right plus a whole in the bullpen, it’s reasonable.
The defense is baaaaaaaaaaaaaaaaaaaaaaaad. Even assuming Wilkerson is healthy and passable in right, Ibanez makes that offense bad, and the infield doesn’t make up for it, dragged down by Sexson’s awfulness. And remember, I turned Betancourt’s defense up. The pourous defense particularly hits Washburn and Silva as you’d expect. As Dave told me, they should consider running Reed out in right field when Washburn starts just to keep Washburn’s head from exploding in frustration after giving up the seventh double of a game.
The other interesting thing to note is that DMB, in running through the whole season and simulating every game, does take into account the rotation matchups (which, as I’ve noted, don’t in practice have that much effect). So pushing everyone back a spot doesn’t help.
It’s the runs scored that really hurts. Here’s where the ZiPS differed significantly from the numbers I came up with “random guesses, hunches, wishcasting, and general skullduggery” but which was largely three-year averages:
– Ichiro’s down a little on OBP and off on SLG
– Shaves Beltre by 5 points of OBP and 10 points of SLG
– ZiPS is down on Lopez (.302/.369 versus my .320/.400)
– the Wilkerson ZiPS is lower than my Jones guess
– didn’t give Sexson as much of a bounce (.323/.441 vs my .330/.460)
– down on Johjima (.316/.405 vs my .325/.425)
Please note that when I disagreed with what an excellent projection system came up with, I was high every time, sometimes by a lot. Ponies for everyone!
Since I ran the season, I’ve stared at the results and tried to find a reason to not write this post. Some thing that would invalidate the results, or that would give me an excuse to change something and go back to do it all over with even more pro-Mariner assumptions. But there isn’t. The ZiPS projections have been excellent in the past, and if you use the PECOTA projections the team is just as bad.
If you start from last year’s team and make adjustments, it’s easy to come up with another five, ten, sixteen wins. But starting from scratch, using reasonable assumptions, the picture looks much bleaker.
Comments
187 Responses to “Running the 2008 season a hundred times”
Leave a Reply
You must be logged in to post a comment.
it’s more a recognition of the fact that “intangiblesâ€, by the very nature of the word, are hard to identify (and might possibly even be misleading compared to tangibles).
Exactly. You could say that
Results = Skill + Intangibles + Luck.
Intangibles are things that may impact the outcome, but we don’t know how to measure them. They’re different that mere luck because they’re not the result of random chance, but since we don’t know how to measure them, they are indistinguishable from mere luck. The instant we figure out how to measure an intangible, it is no longer an intangible but a skill. Defensive Range is something that made that transition over the last quarter century. It used to be that Range was an intangible – some people had “good range†the way some people are “clutch hitters†– based on anecdotal evidence. The lack of data in the early days of SABRE had some people saying patently stupid things like “Defense doesn’t matter.†They said that because, as Manjini correctly noted, some people will ignore things they don’t understand (just like some antediluvian folks discount stats today because they don’t understand the math). Defensive metrics were basically useless 20 years ago, pretty much limited to Fielding Pct. And while some people ignored them and focused on lumbering on-base and HR machines, other people claimed defense obviously mattered and, since it couldn’t be measured, proved statistical analysis was badly flawed. And yet still others looked for ways to better measure defense, and so were born range factors and zone ratings in their myriad forms and glory. They’re still not perfect, but defensive range is now a measurable skill and not an intangible. Things have progressed so far that many of us here argue trading Jones was bad precisely because of his defense. I think that twenty years ago, a “stat-based website†(er, or the pre-Internet equivalent – I guess a Xeroxed newsletter, written in WordStar no doubt) arguing the merits of a trade based on defense would have been quite remarkable.
I suspect some of the intangibles Manjini mentioned will someday be proven to be actual skills with measurable impact. Others will remain essentially superstitions (the result of humans trying so very hard to see patterns where there are none and to feel in control of random chance). And some will probably fall into the category of “very rare skillsâ€, skills that a handful of extraordinary players actually have (e.g. Ichiro may have the ability to locate his hits, John Olerud had the ability to decrease the error rate of his infielders) but that most MLB players – even average HOFers – completely lack, or lack to such a degree as to make no real difference in results.
If this is your first, third, fifth, or five hundredth post here, you have not earned the right to be condescending to ANYONE.
We humans, equipped with brains that love nothing more than identifying patterns (even when none really exist) find all sorts of patterns in the results random chance produces.
That line stands out more than otheres, but I could go on and quote your entire post, because all of it is something about which I can say “I wish I would have said that.”
Mostly, I wish I had the ability to say it in such a compelling and eloquent way. And, that more people that practice baseball analysis would understand some of the concepts presented there.
Wilkerson played first over Sexson against righties and I threw a different righty with a better defensive rating — LaHair? I forget — in against lefties.
Lahair’s a lefty — perhaps you used Morse?
I think that Patterson was also waiting while this mess sorted out, fits the defensive mold the M’s really need right now in a corner OF, is projecting in James, CHONE, Marcel, etc. to have a similar offensive season as they’re projecting for Jones (with far more SB’s.) So what if the payroll ended up at about $135M, right?
OT for this thread, but the other one’s closed. Derek, way to go on being a delegate!
I’ll allow that OT because Yayyy!
I’ve run a couple other sims with the all-defense team I got 93 wins with… they’re pretty ugly. The average run saving is huge — I saw a -100 off the average of the initial — but they cannot score to save their life. I think my 93 wins was partly a fluke and mostly me managing all the games while nursing delicious beers.
Given some underlying lack of trust in the team W-L statistical projections, does anyone have historical data for analysis (i.e. ZIPs 2006 predicted AL West finish vs. actual)? The simple analogy is the sports columnist’s pre-season “10 predictions for team X” to be re-visited at season’s end.
Taking a look at historical projections would bolster the argument for their accuracy as well as help refine & improve the methodology.
First, more props for Derek the delegate…at my precinct it was a wild scene…my middle school teaching background came in handy as caucus chairman to prevent a Zimmer-esque brouhaha.
Second, I’ll echo the appreciation for all who ran sims…fun reading.
Third, a few interesting notes on Patterson: Wilkerson and Boras notwithstanding.
*Hit .294, OPS .737 after moved to 2 hole.
*Quit hitting for the fences…dropped his K/AB from a brutal 4.1 to 7.1 last year
*After only hitting .240 career vs. lefties, hit .310 in 142 ABs last year
*.341 in 44 ABs at Safeco
I think the Defensive/Managed season is like putting a manager who runs the numbers on one team, while all of their opponents has no such edge or equality. I do think that what DMZ did would probably have more positive effect on the win totals than the moves most Managers make in game, but when the actual Manager isn’t likely to make most of the same decisions and the other team doesn’t have a manager at all it isn’t going to show anything of predictive value. It’s still interesting that 93 wins were accrued with Jimerson, Reed Wilkerson and others getting a good deal of ABs with that though.
You can go look at diamond-mind.com and look for their past season runs (lot higher sample sizes). They also do an annual article for ESPN, I believe, that should be easily available. Use your Google-fu.
DMZ
Sometimes you have to just have to put your $$$ (or even a nice bottle of single malt) where your projections are… I’ll take the over on the projected wins and spot you a game so you can sleep at night through out the long season.
Seriously, if you had to bet your life, who here, among this very educated crowd would take the ‘under’ on the win total?
I know if you use inferential statistics in science, stock trading or business – when something smells funny { “AL West Champ… Your Fremont Athletic’s!”} you have got to be a little bit skeptical of your model, data and/or methodology.
[we’ve hit our quota of “you hate the Mariners” posts for today, try again later]
[we’ve hit our quota of “you hate the Mariners” posts for today, try again later]
Ok, so I know I am probably going to get blasted for this, but I want to hope against hope for a little bit. I agree this trade hurts us offensively and defensively, but I do think there is a chance that the M’s surprise us.
I played college baseball, and we were pretty good. We had an ace my freshman year that was absolutely dominating. We were a very good hitting team and he mowed us down every time we hit live. Once, in a fall ball game he threw a 15 strike out, 5 inning perfect game.
I know that this is a stats oriented web site, and I agree whole heartedly with the analysis of USSM, but I also know that on the days our ace threw, we were real loose. We knew that we only needed to get a run or two and we could win. Most of the time, we put up huge offensive numbers because there was no pressure to produce, we knew Kenny(our ace) would keep us in the game.
I am hoping that the M’s will play this way when Bedard and Felix pitch. Hopefully their confidence in these two will allow them to stop trying to hit 8 run HR’s, and just play ball. Remember half of this game is 90% mental.
In support of Dave and Derek:
As a former “USSM is so pessimistic!” type poster, I think that before arguing with Dave and Derek, one must first consider that they have historically been correct about the implications and potential consequences of the Mariners’ FO decisions. When Dave and DMZ respond to someone, such as manjini, who thinks that they are simply pessimists who use numbers to their advantage, this is pretty much a waste of their time. I am not saying we should squash all arguing and discussion, because differing opinions are good. However, it is unfair to continually raise the idea the Dave and Derek are just grumpy statheads who hate the current M’s situation. If they weren’t die-hard fans, why would they spend so much of their time committed to expressing their opinions and posting on this blog. For example, there are plenty of aspects of the M’s which they love: Felix, Beltre, the young guys in the farm system, Ichiro, Betancourt, and so on. It’s OK to be an optimistic fan, but don’t depend on the intangibles such as veteran leadership, two aces (with crappy overall defense) and clubhouse chemistry to get the M’s into the playoffs.
Thank you Dave and Derek, for TRYING to get your points across. This blog is awesome!!!
Some of the newbie drivel here reminds me of a story from a few years back. Anybody, please correct the particulars, I have only a vague memory of the details. DMZ wrote a piece for the Weekly, which laid out the case for the M’s being a sub-.500 team. The predictable result was that some fans got all riled up and one wrote a letter to the editor calling DMZ “Dumbsteg” and stating unequivocally that the M’s would seriously outperform the projection. What happened? DMZ’s projection was pretty close to the final W-L record and the “Dumbsteg” guy, to the best of my knowledge, never wrote a follow-up letter to the editor admitting what a dumbass he was.
I predict a repeat of this phenomenon in 2008.
Also, before last season, there was a PECOTA forecast that the White Sox would win 72 games. A Chicago columnist, Dave van Dyck, wrote a column, dripping with contempt for the “surreal world of computers” that made the prediction. His tone reminded me of Geoff Baker’s. What was the White Sox record in 2007, you ask? 72-90.
<i> and </i> for italics
<blockquote> and </blockquote>for an indented quote should do it.
Just a random thought. For the sake of argument, lets say the M’s win it all this year but then fall into the crapper the next three years, would it be worth it?
Uhmmmm…30+ years with no title… If the M’s could take a WS and then have 3 down years I think it’s worth it. If anybody thought that’s really what was going on there would not be as much complaint. I know I’d take a WS and 4 down years as a cycle I’d be pretty happy with. 2 WS every 10 years is pretty good no matter what the other years look like.
169 – that seems to be the goal of the front office, but there seems to be a consensus among USSM posters (my opinion included also) that the front office hasn’t come anywhere near creating a team with realistic playoff chances. They’ve potentially put the team “into the crapper” in a couple years without boosting the present to the playoffs, let alone the World Series.
If there’s a Bedard extension, I retract the “crapper” statement, but with no extension, I think that’s exactly where they’ll be.
With respect to everyone who has played baseball below the professional level: It’s not the same. The variation in skill from top to bottom below professional ball is huge. The fact that there can even be a five-inning, fifteen-strikeout perfect game says all you need to know: that pitcher is on a different plane than the people he’s pitching to. Differences like that level out by the minor leagues; by the major leagues, everybody who has made it — Willie Bloomquist, Horacio Ramirez, everybody — was a star wherever they came from. Unless you’ve played in the major leagues, or somewhere close, you really have no idea what that level of competition is like. In particular, if you think you know more about major league baseball than what years of statistical analysis say, than you are — I’m sorry — making more of your own experience than it’s worth.
172- go back and read my post again. I didn’t say I knew more than years of statistical analysis, here is what I said, “Ok, so I know I am probably going to get blasted for this, but I want to hope against hope for a little bit.”
I still think it is highly unlikely that the M’s challenge the Angels, but since I am an M’s fan and I want them to win, my only hope is to grasp at straws. Obviously, Bedard isn’t going to be throwing any 15 K, 5 inning perfect games, but Major Leaguers are people, and who is pitching does make a difference. Enough of a difference? I doubt it.
Also, I totally agree with you, Major Leaguers are on a totally different level. My freshman year I played against Ryan Franklin when he was at Seminole. I lead off with a bunt hit and stole second. Our next hitter doubled, and our 3 hitter singled to drive him in.
The pitching coach came out and lit him up. I heard words that I didn’t even know existed. Franklin was untouchable the rest of the game. He struck me out the next at bat on a slider that totally baffled me, and then he got me on a fastball that I had no shot at. They came back to beat us 3-2 and we didn’t have another base runner.
In my opinion, to say that he was a star, doesn’t begin to tell the story.
Derek,
The same advice you once gave Corco:
Ride a bike, get a date.
Ryan Franklin? I heard once that he was from Spiro, Oklahoma.
*strangling noises*
John —
I ran the sims overnight, so it’s not as if my continuous time investment was a day — it was a couple hours getting everything working and tweaked properly, and then I went and read a book while it did its thing.
I appreciate the thought, though.
of course, with the M’s in recent years, when they felt they only needed to score ‘a run or two’, that’s just what has happened.
And then they needed to score that one extra run … and didn’t 🙂
Well put together post and info.
BUT – there’s one piece of additional information that would be EXTREMELY helpful in judging the accuracy/utility of the systematic approach.
What did the exact same approach predict for the 2007 Ms?
The biggest failure I see from many analysts is an inability (or unwillingness) to post BOTH projections and previous results.
Mind you – it’s quite possible for any system to have five consecutive “hits” and then be wildly off the 6th year. The question then is attempting to discern WHY it was off, (was it something systemic – or something organic with a team that the system simply didn’t/doesn’t account for).
I recall almost every pre-season pick for the Ms in 2007 being extremely negative. And I suspect that if you were told ahead of time that the defense would drop from 13th in DER to 27th, the projections would’ve been even more drastically negative.
Well, actually, I thought a lot of the 2007 projections were in the 81-84 win area, with some handful showing the team battling the Angels. That doesn’t make the actual results quite that remarkable.
Derek- Thanks for all the hard work, bought you a beer! Also, when you were running all those M’s sims did you take a look at Adam’s numbers in Baltimore? I got almost the exact same results you did in my DMB seasons.
Dave, congrats on being a delegate! This was my third caucus and it’s always fun to get involved at this level.
I saw a guy on the side of the road with a sign, “Will toil with numbers for beer.” Now I can put a name to the face.
What is the estimation error of these simulations? (i.e. what is the 1 standard dev measure across all win number forecasts–that is, if you took all pre-2008 projections for all teams using the same assumption set, and took the difference between that and actual number of wins, what is the std. dev. of that set of numbers? I would assume the mean is statistically identical to zero.)
Please? Bonus beers if author answers; simulated knuckle-touch for anyone else.
BTW am I the only one to notice “whole” in right field in the OP? I had to read it three times to make sure because Derek rarely misspells.
Also, “Ibanez makes that offense bad, and the infield doesn’t make up for it” should be “Ibanez makes that outfield bad, and the infield doesn’t make up for it”.
178 –
“I recall almost every pre-season pick for the Ms in 2007 being extremely negative.”
I honestly don’t remember this, but perhaps your definition of ‘extremely negative’ differs. I know the Hardball Times had the M’s at 82-80, I saw a number of 80-85 win projections, many at 83 wins or so (isn’t that what USSM had? I don’t remember). None of that sounded terribly pessimistic, and indeed, it wasn’t far off. Eyeballing the ZiPS numbers, they were perhaps a bit lower than that, but I haven’t run the DMB seasons.
Clearly, the drop in DER/team defense has a lot to do with the surprisingly low projections for 2008, but I don’t think there were many 75 win projections last year.
Why is it that nobody seems to want to run the numbers for 2007? I’ve been trying to find somewhere that retained the pre-season projections for 2007 using ZIPS and DMB, but so far, no luck. (I’ve found some of the individual team projections for 2007 – and frankly, the few I’ve looked at were often WILDLY out of whack.
I think it’s completely fair for DMZ to say – “if you don’t like the results, than talk to the system.”
Well, I have ZERO sense for how accurate the system *HAS BEEN*, except for a few comments of “well, I think so and so was projected last season.” Memories are imperfect. I happen to remember almost exclusively negative pre-2007 projections – most projecting FEWER wins for the Ms in 2007 than they had in 2006. But, I didn’t log them, so maybe I just happen to remember more negative projections than their were – or maybe they were just had more interesting commentary.
But, all that aside – for ANY projection system, I want to see how it has done in the past. The easiest way to get this picture is to plug in 2007 projections and look at 2007 results.
If it projected the Ms runs differential correctly and a 79-83 record, then maybe I buy into this year’s projections a bit more. If it was off by 50-60 runs (either way), or wildly off-target for Cleveland or Detroit or Oakland, then that would be helpful as well in adjusting my personal sense of how useful the sims actually are.
I HAVE found the 2007 player projections for the 2007 Ms. Here are the actual results (OPS) – followed by ZIPs projections for 2007.
POS – Player – REAL – ZIPS (diff)
CA – Johjima – .755 – .771 – (+16)
1B – Sexson — .694 – .814 – (+130)
2B – J Lopez – .629 – .737 – (+108)
3B – A Beltre- .801 – .779 – (-22)
SS – YuBet — .726 – .697 – (-29)
LF – Ibanez — .831 – .800 – (-31)
CF – Ichiro — .827 – .779 – (-48)
RF – Guillen – .813 – .776 – (-37)
DH – J Vidro – .775 – .726 – (-49)
(ERA)
SP1 – Felix —- 3.92 – 3.71 (-.21)
SP2 – Washburn – 4.32 – 4.45 (+.23)
SP3 – Batista — 4.29 – 4.62 (+.33)
SP4 – HoRam —- 7.16 – 5.13 (-2.03)
SP5 – Weav/Piner 6.20 – 5.10 (-1.10) * Weaver not on the projection – so Pineiro was the projected primary #5.
From MY perspective, these projections paint a picture of an unreliable system, which is PRONE to underprojection (at least for Ms hitters). There were two players who each hit WILDLY under their previous norms. I understand why those projections were off, and I wouldn’t expect any system to do better.
But 6 out of the 7 “normal” hitters for the season are underprojected. I could take this as a case of park effect perhaps being too freely used – except for the pitching — the #5 slot really doesn’t apply. And HoRam was an utter disaster.
But, the thing that baffles me is Wash/Batista. Two pitchers with LONG histories, were both negatively projected (compared to reality) *DESPITE* the fact that the team DER fell from 13th to 27th between 2006 and 2007.
===
For 2008, the team has *ONE* batter at an age where there is legitimate concern of decline based on age – (Ibanez).
Age-declines typically become “significant” at age 36.
So, given the 2007 system CONSISTENTLY underprojected for players having “typical” seasons, I would expect that the 2008 projections are likely to have a similar bias. This would lead me to expect the 2008 run production projections for said system to be generally lower than what the actual will be.
I believe that the fact Seattle managed to post a 104 team OPS+ in 2007 *DESPITE* having two players MASSIVELY under projection, (and zero players massively over expectation) is an indication that inherently, the club WAS (and is) a slightly above average hitting team. They should, therefore, continue to produce slightly above average runs.
As for runs allowed – the team had roughly 350 innings from HoRam/Weaver/Baek/Fear that were significantly worse than replacement level. (280 runs in those innings). These will be replaced by Silva and Bedard. If you use NOT the 2007 numbers – but the 2006 numbers for Silva/Bedard, (when Silva posted a career WORST 5.94 ERA), they allowed 222 runs in 376 innings. That’s a 60 run improvement easily, if you shave off the extra 26 innings. That’s not optimized – that’s specifically examining worst-case scenario for Silva – and “only” using Bedard’s 2006 production.
If you use their career 162-game averages, you end up with 179 runs allowed in 365 innings. THAT would be a 100 run improvement – again – NOT optimized to paint a rosier than reasonable picture.
The only pitcher on the club with “danger-zone” age decline concerns is Batista. Of course, the club still has Morrow, who holds potential to be an excellent #6 starter and stop-gap fill-in for any injuries.
Sherill’s departure might hurt some, yes. But Seattle threw roughly 50 more RELIEF innings than most other MLB clubs last season. If they get an extra 50 innings out of Bedard/Silva (likely – given they replace the Ho-mess that were the 4/5 slots in the rotation), than whatever negative impact is realized by said departure “should” be minimized.
In the end, even if ZIPS/DMB is the “best” projection system available – it doesn’t mean it is good. But, thus far, I have zero sense for how good it may or may not be.
So, given the 2007 system CONSISTENTLY underprojected for players having “typical†seasons, I would expect that the 2008 projections are likely to have a similar bias.
Looking at just the M’s is a poor sample. Surveys of ZiPS’ overall predictions found them quite good.
There’s also an assumption there that posting a higher-than-expected result in 07 means that they should continue to produce more than expected. If you think it’s underpredicting M’s because of park effects, that might be true, but I don’t see a lot of evidence that ZiPS is missing because of park.
Hey guys,
Sorry for being a jerk here in the comments. Believe it or not, I really didn’t mean to be. But reading back… well, I tried to make a point, but yeah… it got too personal. I apologize to Derek in particular. I’ll think my posts through more in the future.
Now here’s a legit question and forgive me is this is answered somewhere that I missed:
How does a sim like this factor in all 25 men on the team? Given that we don’t know who will fill out 4-6 of the spots yet (though your current post makes some good stabs at it), does the sim just take into account the starting nine, rotation, and key bullpen pitchers? I doubt it makes that much difference, but just curious.
The A’s plans are to use Justin Duscherer as a starter this year.
I think the 2008 rotation for the A’s will be something like Blanton, Harden (while he lasts), Gaudin, Duscherer, DiNardo.