Fielding statistics and defense
I’ve been thinking about defense with the team’s recent woes. Dave wrote a large article on evaluating defense a while back that stands up nicely, and I came across an interesting post randomly that I thought I’d pass along: “Comparison of Fielding Statistics” which compares 2006 data from six different stats and comes to some interesting conclusions about their utility.
I have some quibbles with the piece’s logic in places, specifically the comparison of stat “features” leading to
So, based on that table, I would have to say that UZR and PMR have the best methodologies, with a nod to the Fans data because they can provide such unique insights into player skill.
The problem is that this doesn’t at all evaluate methodologies. If I came up with a defensive metric called Random Runs that claimed to be built on hit-location data, zones, ball type, batter handedness, ballpark-adjusted, and player skill types, and I did all of those things horribly, that’s not a better system than something that does fewer things the right way, even though you’d check off those boxes.
The particularly interesting thing is the easy-to-scan graphs of system-to-system results. It’s interesting to see that in the 2006 data, the correlation is both highly significant and not anywhere near as good as you see from offensive contribution measures.
It all goes to reinforce something I’ve been saying for years — recognize that defensive tools are still pretty rough, but looking at a couple of them you’ll be able to get a pretty good idea of how good a particular player is with the glove.
Comments
28 Responses to “Fielding statistics and defense”
Leave a Reply
You must be logged in to post a comment.
Interesting.
True, although team defense is a lot easier to evaluate with % of balls in play turned into outs, which is still better than going through each player individually and trying to figure out who is hurting the team and how much.
The defensive stats would need to be detailed to the point where it indicates the speed with which the ball came off the bat; arc of the ball leaving the plate; its expected coordinate of landing (X,Y) as well as the coordinates of the defensive players involved at the time of the hit (not at the pitch, but the hit since players are moving at the time of the pitch). If you could successfully measure all of that information on every ball put into play, you’d be able to determine what range a player really has, just in getting to the ball. But that’s just the first piece. Then you’d have to be able to determine the speed of the throw, was the throw to the optimal base (did you get the lead runner or not), was there a delay while the fielder checked a runner at third before going to 1st, timing for the fielding of the ball and preparing to throw, and so on. Speed of the batter and other runners would also come into play, what was the coordinate of the runner at the time of the pitch/contact with the bat, and on and on.
Until then, we’ll have to live with some people thinking Derek Jeter is/was a great defensive shortstop.
Another good point there is the importance of using stats derived from both sources, STATS and BIS.
If you look at UZR, PMR, and Plus/Minus, average the three, and give yourself a margin of error of +/- five runs, you’ll be fine. They’re not perfect, when viewed through a prism, they’re just fine.
Hi David,
Thanks for the link.
In response to your critique, you’re right that there is an unstated assumption that all of those systems are not making substantial logical errors in their implementation of each of those criteria. In my previous piece, which is linked from the page you linked, I walked through the logic of most of those systems in some detail, so I think this is a defensible assumption. Once you accept that assumption, the argument is simply that more information will generally be better (or at least not worse)–and that shouldn’t be controversial.
Nevertheless, my primary intention with that chart was not so much to provide a mechanism of evaluation of the various systems, per se, but rather to provide a quick reference of the key differences between the various systems I was about to compare quantitatively. I think it does that quite well.
-j
I think that’s one of the key findings of this and other studies comparing fielding studies (Michael Humphrey’s 3-part series last year found the same thing, and if anything was more convincing). And it’s something that’s not always widely appreciated.
Another key point is that the Fans’ Scouting Report holds up quite well vs. the objective measures despite being based on a radically different sort of data. That’s another thing that isn’t always appreciated.
-j
For now, I’ll just assume that anyone who wins a Gold Glove, isn’t likely to be the best defensive player at his position; but rather had a great year at the plate, or made a few highlight reel plays (does Ichiro really win a GG as a rookie if he doesn’t make the throw in Oakland?).
Jim Edmonds might be the best example of this, the guy was always out of position, playing to shallow, and had to make fantastic grabs running down balls that most CF’ers would have played on three steps.
What?! You don’t think Raphael Palmiero deserved a gold glove in 1999 when played all of 28 games at first base?! The nerve! 🙂
-j
I think Dave’s “give yourself a margin of error” point is the most important one. There’s a good chance there’s no such thing as a perfect defensive stat; not everything is knowable. But you can get reasonably close, and reasonably close is usually good enough. It’s the same with hitting stats or anything else, really; people pretend that a player with a VORP of 28.1 is “better” than a player with a VORP of 27.9, but it’s not really a useful distinction. 28.1 vs. 11.5 (or 73.5) is.
I do have another rough and ready defensive guide I use — it’s not as accurate as UZR, but it’s a lot easier to figure, and you don’t even need a computer nearby. Just look at the uniform — if it says “Mariners” on the front, probably not so good, plus or minus 1.5 Beltres.
Even a sad panda cannot deny the truth of this statement (though Ichiro! might).
I’m Derek. He’s Dave.
Sorry–you’d think I’d know that, given that I’ve read your book and subscribe to your blog. -j
Woo hoo!
We can subscribe? Why didn’t I know this?
You got grandfathered in under the old subscription rate, I think.
If not, I’ll be happy to send you a bill.
We can subscribe? Why didn’t I know this?
Derek didn’t tell you? Huh. We all got emails the last few days with a Paypal link in it. Weird.
15 – you mean you don’t charge a monthly fee to every reader? Deerrreekk!!!!
Jinaz – thanks, this is great stuff. I caught the Humphreys piece in THT, but missed this somehow. Nice work!
To me, Jinaz’ study reinforces some skepticism about defensive metrics, especially in certain positions (i.e. outfield; this was discussed more in Michael Humphreys article I think). The variance between the two data sources often results in the kind of weirdness we see with Ichiro – UZR thinks he’s the worst in the league, RZR or PMR think he’s good. It’s nice when they all line up to a degree, and then you can use a rough average (along with the error bars Dave talked about), but for a lot of players you just can’t.
To me, that’s NOTHING like what we have w/offensive metrics. wOBA/OPS/GPA whatever – the different weights applied to different skills will result in Player A coming out in top in one metric and not in another. But there’s nothing *approaching* the situation we have here, in which the metrics, collectively, can tell us nothing about Ichiro’s defensive value.
Tiger Tales complied a whole bunch of defensive stats to rank 2007 fielders. Good stuff. Jinaz has an interesting write-up regarding catching defense too.
I’ve been sending checks every month to Derek’s exiled Nigerian uncle. When he gets his throne back I’ll be rich, so really it’s just a loan. But that was my second best idea, my best idea is to bring Griffey back!!!11!!!
I’m pretty sure I speak for everybody when I say I hope jinaz will be sticking around.
#20
Dude, that’s a scam. The checks should be going to Dave’s exiled Nigerian uncle.
M’s have the best defense in the AL West. Steve Phillips told me at the start of the season so I know its true. We don’t need to bother with these fancy numbers. You just take the simple formula, sweat(dirt+grass)^2 = Defensive Grit
@15 – bloglines, baby.
@18 – thanks. I agree that we should be skeptical about defensive metrics, but that’s part of the reason I (and Dave, and Sean Smith) advocate taking an average of the best available defensive statistics. If there’s agreement, you’ll get a solid number +/-. If there’s disagreement, you’ll estimate league-average or so, which is a good baseline to estimate anyone’s performance.
@19 – thanks for the link on the catcher work. It’s admittedly very incomplete, but it gives us a baseline to work from.
-j
I share the skepticism on defensive stats but I think if you take averages of several measures as jinaz and I did last winter (Thanks Justin for doing it first and making my job easier.), you’ll get some good information. The players who do well on all systems are probably very good fielders and those that do poorly on all systems are probably bad fielders and the average will reflect that. Those players who do well on some systems but not others will end up in the middle of the pack which I think is probably appropriate in most cases.
Plus, I was pleased to see that the correlation between the fan fielding survey and numerical measures (Justin’s study) was reasonably strong. That gives me a little more confidence that the measures are working.
noble of you to refrain from rickrolling that Paypal link
One of the other valuable things you can get from something like this is finding systematic biases.
I know a couple years ago I was able to look at a couple systems and figure out that whatever one of them was doing at one position, it was just wrong — compared to all the other systems, they might as well have been randomly ordering players.
That’s extremely useful information when you’re doing evaluation.
If there’s disagreement, you’ll estimate league-average or so, which is a good baseline to estimate anyone’s performance.
“Those players who do well on some systems but not others will end up in the middle of the pack which I think is probably appropriate in most cases.”
I think that’s true for a great many players; the variance is fairly tight and centered around zero.
But I’m still concerned about the huuuge variance seen in a number of OFs. I’m just not comfortable averaging a -35 and a +33 or whatever Ichiro’s actual numbers are. Same w/Grady Sizemore. One of these numbers paints both players as hideously overrated when you factor in defense, and the other paints both as MVP candidates. When we’re trying to get reasonable estimates of team defense or OF defense specifically, that’s pretty important. Again, there’s zero doubt that what jinaz and tiger337 have done here is helpful and moves the ball down the field (and since I’m lavishing praise here, I can’t believe we haven’t even talked about Justin’s fielding translations with THT data here; I use it a lot). I know Derek’s said that if a system doesn’t work for one guy, it’s not reason enough to discard it. That seems fairly reasonable.
And yet, as M’s fans, it matters a hell of a lot if Ichiro is a lifesaver or an anchor in CF. It matters a lot if park factors or something else will make *every* RF (except ichiro) look like crap. The team can’t keep hemorrhaging runs through poor DER, and the team can’t keep trying out new OFs only to see each one worse than his predecessor. These data should clearly help the FO make decisions about who might perform well, and yet, we’ve all been surprised. I don’t think anyone thought a healthy Jose Guillen would’ve been one of the worst RFs in the league, and I know I didn’t think Wilkerson would’ve been at least as bad.
In summary – we need to quantify/explore the BIS/STATS differences, especially for the OF. We need to figure out how much natural variation to expect from year to year in these numbers. We’re getting closer, but we’re still at a point were we need 2 or 3 years of data from multiple sources and then we need to hope that those sources don’t contradict each other. That’s tough for teams to use to really inform their decision on extending, say, Wlad or Yuni a few years ago.