Defensive Stats In The Times
This got lost in the shuffle with all of yesterday’s news, but Baker wrote a long story on defensive statistics and their use in evaluating fielding prowess. If you haven’t read it, it’s worth checking out.
Baker did a ton of work on this piece. He spent 45 minutes on the phone with me, and I’m a sidebar to the real story. He also talked to Dewan, Lichtman, and Blengino, and he made sure he had as much information as he could get. He does a pretty good job giving an overview of what +/-/UZR/PMR do without making the barrier for reading the article too high. It’s certainly an entry level piece, but that’s where 99 % of the Times readers are at.
Baker also wrote a supplemental blog post where he adds a little context to the article, explaining that he just didn’t have the space to fit in every angle. Defensive statistics are complicated, and he’s right – you just can’t sum them up in 2,000 words without leaving out a lot of stuff.
One of the things I told Geoff in our phone conversation is that I tend to look at the advanced defensive statistics much like ERA. ERA measures pitching ability, teammates fielding ability, scorer bias, park effects, and luck. It includes a bunch of stuff that pitchers have no control over, but most casual fans have no problem accetping ERA as a measure of pitcher value. Likewise, UZR and +/- measure range and instincts, but also positioning (is that the player or the coach?), pitchers ability (not all balls in play are equally catchable), scorer bias (line drive or fly ball?), park effects (do fly balls in Safeco’s LF hang up longer than in other parks?), and luck (did you get a bunch of weakly hit ground balls that were easier plays that they looked on paper?).
We’ve moved beyond ERA for evaluating pitchers because we have better metrics now – things like FIP strip out a lot of the non-pitcher stuff and give us a better tool for just evaluating the pitching aspect of run prevention. We’ll have better things than current UZR and +/- shortly, once Hit F/x is introduced and we have speed/trajectory/hang time of batted balls. But, until then, UZR and +/- are better than anything else out there right now, and they’re good enough as long as the issues with the stats are accounted for.
The shortcomings that Baker (and Ibanez) point to are real, and that’s why we talk about players fielding in terms of ranges. We say that a guy is a +10 to +20 fielder, for instance, and we won’t argue any number in that interval. He might be +11 or +19 – we’re not sure. There’s noise in the data. We know that, and we account for it.
Do you ever hear people do this with ERA, though? The same noise (probably more, actually) exists in ERA, but you never hear someone say that a pitcher is a 4.25 to 4.75 ERA guy. They’ll say he has a 4.36 ERA, and that’s what’s used to evaluate his past performance – the noise is ignored and the entirety of the number is attributed to the pitcher.
In most ways, I think people like us who are pushing the value of advanced defensive metrics have been far more honest about the quality of the metrics we’re using than those who reject our numbers as not good enough but cling to Batting Average, RBIs, and ERA.
Current advanced fielding data, such as UZR, should be looked at much like you do ERA. It’s got some problems, no doubt, but we don’t have a FIP for fielding just yet, so until we do, it’s a good enough proxy if you acknowledge the flaws. We’ll have a FIP-like fielding metric soon enough. Until then, UZR is better than anything else out there, and using it to make decisions will lead to more correct decisions than ignoring it entirely.
Comments
24 Responses to “Defensive Stats In The Times”
Leave a Reply
You must be logged in to post a comment.
I loved that he even wrote the article — seriously unimaginable just five years ago — and that he tackled a real controversy: Ibanez. I can’t blame him for not getting into the Griffey thing. Though it needs to be gotten into.
What I didn’t love is the impression I got from the article, that stats geeks are trying hard, but ultimately Ibanez is right — there’s too much noise, and the numbers don’t mean ANYTHING. Obviously, Ibanez is going to be defensive about some guys saying he can’t play, but the article, to me, came close to suggesting that he was right and the stats guys are just wrong.
I didn’t get that sense, although I suppose you could. I thought the take-home message was more that Ibanez has a point, which is fair enough. Although it is a little awkward to be introducing newfangled defensive statistics, which much of his audience is not familiar with, by starting off with their flaws.
I just appreciate Baker taking the time to research this, especially after his talk with Bill James. It seems like he’s responding to some of the criticisms of that piece, and really trying to learn about defensive metrics. That’s awesome, good on ya, Baker.
Baker writing about advanced fielding metrics? I never thought I’d see the day. Good on him.
ERA is a very apt analogy when you lay it out like that. I really enjoyed Baker’s article, not because it taught me a lot but because it was nice to have the ideas laid out for the statistically uninclined. If there were one thing that I could have written into the article, it would be that while these statistics are prone to a certain degree of subjectivity and variance and so forth, that error can be smoothed greatly by looking a) at three year averages and b) at different systems and what they each have to say. This is the most important point to be made about how to look at these stats, and while Baker alludes to it, he does not lay it out explicitly which is a shame.
I think it’s good to see Baker at least thinking about the numbers and putting them into context. Before advanced fielding stats started becoming available, we were measuring fielding with very poor tools. It was as if we were only measuring pitching based on winning percentage.
The Ibanez stuff, though, is awfully nit-picky. It’s like arguing over whether the economy is in very bad shape or very, very bad shape. Even the anecdotal evidence demonstrated that Ibanez wasn’t a good fielding LF.
Well, wait. A newspaper columnist writing about fielding metrics… in the newspaper? That’s something I wasn’t expecting. But I’m not surprised it is Baker. For all guff he gets around here (and only sometimes deservedly) he really works his ass off — generating reams of text (and photos on his blog) but also in terms of trying to improve his understanding of his field. A lot of sports reporters, by the time the make it to “columnist,” have pretty much petrified in place. They know what they know, and now they just want to repeat it out loud to everyone else. They think they have earned the role of being that opinionated guy at the end of the bar for their entire readership. See Murray Chass and countless others.
But Baker is not like that (and neither is Stone) and he should be given a lot of credit for not taking the easy way out and writing yet another Player Fights For Roster Spot spring training story (or producing anything resembling Jim Street’s output).
If newspapers want to stay relevant this is the kind of work they need to be doing. Getting the game results and the quotes from the after-game press conference is trivial for anyone with an internet connection. It’s a lot harder to stumble into a readable introduction to something you didn’t know about (because if you don’t know about it, you aren’t looking for it in the Googleverse).
If this article means one more person in a bar not looking at me blankly when I mention defensive stats, or one less new commenter here saying something inane, Baker has done us all a service. Hopefully he’ll continue and expand this from time to time, addressing some of the other points folks here have raised.
Yes, but the timing is going to be tricky. For a lot of fans, baseball is something that hasn’t started yet for 2009, so the Griffey Honeymoon is still in its early days. And we don’t know how much he’s actually going to play in the field. While we might like Baker to get out there and lay the groundwork early (“Griffey isn’t the player you remember, and he doesn’t belong in the field, and here’s why”) all that will really do is earn him a lot of enmity among his readership.
If I was him, I’d be planning on doing that article (and maybe even compiling notes for it) but keeping it in my back pocket until there’s some triggering event that makes it seem suddenly “relevant” — whether that’s Griffey complaining about DHing too much, or some comment from someone else about Griffey’s limitations in the field, or Griffey having to rest his knees, or whatever.
Unless Wakamatsu makes Griffey DH as much as we hope, the controversy is certainly coming. But I don’t know that Baker has much to gain by being the guy to trigger it.
I appreciated the effort, which was a far cry from the usual “VORP is a funny name; bloggers in mothers’ basements” stuff you get in some from some of the real old school guys. He clearly did a lot of reporting and he clearly spent a lot of time thinking about fielding.
Unfortunately, the Raul angle at least gave the impression that this was a bit like the creationism debate, i.e., unless new fielding stats are perfect in every way, then the Raul’s subjective assertion that he’s a good fielder is entitled to exactly as much weight as these “flawed” stats. There are no defensive metrics that show Raul a good fielder, and no one needs defensive metrics to show he is a bad one. You just have to watch.
There are no park effects involved in how long it takes to run to a spot where a ball is hit. You don’t have to have played in the major leagues to know that Raul does that more slowly than Endy Chavez and that the difference in their speed and reaction time is material to whether a ball is caught or is a double. With all due respect, any serious fan can, if he or she chooses, make an educated guess as to whether a batted ball on television is likely to be caught, and will be right a sufficient amount of time to trust his or her judgment. The ones that look like trouble but are nonetheless caught are being caught by presumptively good outfielders and the ones that are not caught but should have are being mishandled by presumptively bad outfielders. Over time, the difference between the good and the bad makes a pattern and becomes clear. Refinements like Hit F/X will help us fine make distinctions and have statistics that can be compared across ballparks and seasons, but we really don’t need a lot more data to know in a general sense who is more like Ozzie Smith and who is more like Ozzie Nelson in the field.
Don’t get me wrong, the single best catch I ever witnessed at Safeco was by a very young Raul in 2000. I didn’t even realize he was out there; I thought it was Stan Javier until I checked the scoreboard. It was up there with anything Ichiro has ever done. But Ichiro still has speed and athleticism in the outfield, while Raul is left with comparisons to Manny, Pat Burrell and Adam Dunn. “Better than a water buffalo” is not much of a compliment.
I didn’t get that impression at all. Quite the opposite in fact. I mean, Baker’s always been a fantastic beat writer and always willing to learn and adapt, but it seems like lately he’s just been on the defensive after having some crap thrown his way. To me, this article was almost like he turned the corner and started really getting behind these kinds of advanced stats. You can tell he put a ton of effort into it, and I’m guessing he’s getting pretty intrigued by it on a personal level. I just think it’s great he’s even getting the discussion out to the masses. Between this, and the content he provides on his blog… he really is head-and-shoulders above most of his peers.
Two things occurred to me while reading this.
The first is that the ERA comparison made me realize I’ve always tended to think of it flexibly if I thought of it in an evaluative sense, as in “he’s a 4 – 4.5 guy.” Which is enough for me, most of the time.
The second is that one thing definitely lacking in defensive statistics is the precise results recorded by ERA. We can see precisely how bad Silva was last year (or how bad the team was with him on the mound), and there’s no +/- to fudge it. But there’s no solid, accepted number, even with noise, we can apply to Ibañez. Unfortunately.
Hopefully as stats evolve we’ll be able to say whether Raul was “even worse than” or “almost as bad as” Pat Burrel [or your bellwether glove butcher of choice].
The problem with the Griffey thing — and traditional in-the-clubhouse reporting in general — is, the instant Baker publishes a statistical critique of Griffey’s defense, no matter how mild, he instantly loses his clubhouse access. They might let him in, but they sure as hell won’t ever talk to him again. So I can see why he would avoid that, assuming he would even consider doing such a thing, which isn’t obviously true.
Sometimes Outside Baseball is the right thing.
The second is that one thing definitely lacking in defensive statistics is the precise results recorded by ERA.
The entire point of the comparison was that ERA doesn’t provide precise results.
The entire point of the comparison was that ERA doesn’t provide precise results
I guess I meant that the preciseness of the number, and it’s being widely accepted, offers (me at least) a certain psychological satisfaction. Also maybe because it’s so easily calculated, the noise inherent in the inputs seems less important than the lack of “transparency” in more complicated figuring. Speaking in terms of making defensive stats more mainstream.
The entire point of the comparison was that ERA doesn’t provide precise results.
ERA does provide a precise result for the number of earned runs scored per nine innings pitched. It’s a flawed stat for many purposes, including evaluating how good a pitcher was or for evaluating his future value. But, if you set aside for a moment the earned vs. unearned issue, it does provide a fairly precise measure of what happened when a given pitcher was on the mound.
I don’t have the sense that the fielding stats currently measure past results that precisely. I could be wrong though.
People look at ERA as a “hard†number. It is completely objective, built off “hard†data. Irrefutable. It’s simple, easy to understand, and it’s just one number, and it’s been around for ages. If 100 billion people computed it, they would get the same thing. Unfortunately, that just scratches the surface those who believe in it don’t want to look deeper.
What is an “earned runâ€? It is a run that is credited by the subjective “official scorerâ€. If he decides something is an error; it’s an error. Was he wrong? Was he write? No one argues this (or if they do it’s forgotten) because he’s the “official†scorer. What about a double play or fielders choice? Was that the work of the pitcher or the defense (either good choice or bad; good play or bad)?
ERA is built on just as much subjective opinion as the premier defensive metrics; but because “earned runs†are recorded by only one person, the completely human “official†scorer; people that compute ERA will always come up with the same answer as the other 100 billion people. This gives the illusion that it’s objective, hard, and irrefutable.
Some good argue that the “official†scorer is a small sample size and too much power in one hand; and that using several different defensive metrics to get an idea of a players true ability would be more accurate than basing your opinion off of one stat essentially created by one person’s opinion.
However to a lot of fans, Baseball is a Religion celebrated in the summer at Cathedrals that seat thousands; as such it’s hard to break Dogma. (Obviously I could be completely wrong; but that’s how I look at it.)
Typos are my bane.
I don’t have the sense that the fielding stats currently measure past results that precisely.
This is more what I was trying to get at. If ERA is to FIP as, say, UZR is to [future defensive stat], I just wish that our flawed defensive stats were as good as our flawed pitching stats. Ibañez’s -10.4 UZR/150 just fails to describe his awful defense to me as accurately as Silva’s 6.46 ERA describes his awful pitching. I mean those sixes are just squatting there like toads.
Fine — as long as you stick with it being a “measure of what happened” and not a measure of the pitcher. And that’s the other problem with ERA (aside from the false sense of knowledge the two digits after the decimal implies). People attach ERA to pitchers, but it’s not a purely individual stat. And if you’re going to call it a team stat, you might as well just go by runs per 9 and stop worrying about the earned/unearned part altogether.
Yes, it’s a poor evaluative stat for determining the value of a pitcher or predicting future performance, but it’s obviously a relatively precise measure of what actually happened. It’s no different in that sense to say… simple counting stats like the number of hits a player had in a given year. Hits are determined by many factors other than an individual player’s performance, but it would be a nonsense to refer to a player as having “between 180 and 195 hits”. We know how many hits the player had, irrespective of how many he should have had if it corresponded to his true talent level.
If ERA has any purpose at all, then it’s simply as a measure of what actually happened while the pitcher was on the mound. It should never be used as a measure of how well a pitcher performed.
joser: I agree completely with your last comment. ERA is a often misused stat, and runs per 9 is better at measuring the one thing that ERA can be useful for.
I just thought it was excessive to say ERA isn’t precise. It does precisely measure something (earned runs scored per nine innings). It just doesn’t precisely measure what many people think it does (e.g., how good a pitcher was).
And, if there is a fielding analog of a descriptive state like runs per nine, a fielding stat that isn’t predictive, I’d be interested in it. My sense though is that all of the advanced fielding stats have too much subjectivity built into them to be as precisely descriptive. Perhaps the subjectivity is currently unavoidable, but it’s still there (I think).
I have disagreed with Baker in the past for some shoddy analysis, but he deserves a long applause for the amount of work he put into introducing defensive metrics to the masses. I never thought I would see a baseball article this well-researched and thought-provoking in the Sunday paper in a long, long time. Reading it on my work break was an incredible pleasure, and I can only hope it will open up people’s eyes to the importance of defense (such as the quote he slipped in, “a run saved is just as good as a run scored”. Dave, Tango and MGL have been beating that drum for years now, but to hear that in a mainstream article is incredibly liberating).
In short, Baker deserves congratulations. While not a perfect article (as the comments above me have pointed out), it is an incredible step forward in advancing the public discourse on defense. We have a special gift in Geoff Baker, let’s not run him out of town yet…:)
Yes, ERA precisely measures “earned runs” which are the subjective opinion of the human official scorer. That makes ERA as flawed/subjective as the defensive metrics or potentially more so.
The only precise stat in baseball is homeruns, and even then there may be a little bit of human error (was it a foul, fan interference, or a HR that was called a foul?)
Ignoring the defensive metrics because they are subjective while accepting the other stats because they are objective or precise is illogical. I’m not saying ERA can’t be useful, but that it’s no where near as useful as people believe it is, and if one can accept it then they should be able to accept the defensive metrics.
Don’t confuse “precise” with “accurate”. ERA is precise to three digits — very precise indeed. Its accuracy, however, is something like +- a run either way in a typical setting.