Accepting Randomness
I try to avoid linking to things I write other places, because I don’t want to become a self-promoter and I figure most of you probably know how to find my stuff at FanGraphs, WSJ, or ESPN anyway. But, I’m making an exception today, because my afternoon post over at FG is the kind of thing that I would have written here a few years ago, and I think it has quite a bit of relevance to the 2010 Mariners season. The post is entitled Accepting Randomness. Here’s the first few paragraphs, and you can read the rest over there.
Most of the conversations about the Dan Haren trade boil down to how a person feels about pitcher evaluation. There are clearly still a lot of people that simply believe that whatever happens is the pitcher’s responsibility, so if he gives up a bunch of hits and some home runs, he’s doing something wrong and that should be held against him. High BABIP or HR/FB rates are evidence of throwing too many hittable pitches, or that his stuff has deteriorated, or that his command isn’t as good as it was, or some other explanation that we haven’t yet figured out. But, whatever it is, it’s definitely something, and it’s definitely real.
These opinions are generally held because of the outright refusal to accept randomness. The idea that something could happen repeatedly, without cause, is very hard to for a lot of people to swallow. But it’s true, and it’s a very important concept to buy into when trying to project the future performance of baseball players. Random happens.
Awesome post Dave! I have been trying to convince a co-worker for months that BABIP is effected by luck more then skill.
Great post, Dave.
I think people’s tendency to resist the idea that luck is at work is because there are so many potential causes for variation in results. These include randomness, but also could include minor injury, loss of velocity, mechanical issues leading to poor location/command, etc. I would think that randomness is at play in almost every case, but that in very few is it the only cause of variation, so people want to find and analyze what other issues may be at play and discuss whether they are controllable or correctible, or not.
The problem is, as you point out, we can’t know how much randomness, or any other potential cause, is at play.
To me, the takeaway of your post is the last paragraph:
Yes, thank you. Because I for one am clinging to the belief that Jack Z actually went through a pretty good process this off-season and has had terrible luck when it comes to the actual performances of his position players so far. And I would prefer to continue clinging.
Here’s an insightful piece by someone who has pretty good results to show for his process, but still acknowledges that the process is the only part he can control. The outcomes will be severely affected by luck.
I agree with the general point you’re trying to make Dave, but you are falling into a common pseudo-scientific fallacy when you say things like “The idea that something could happen repeatedly, without cause, is very hard to for a lot of people to swallow”.
First, nothing can happen without a cause, the cause might not be apparent or might be practically impenetrable, but it must be there or else we have creation ex nihilo on our hands and at an absurdly mundane level. Secondly, randomness or chance cannot be that cause. The idea of chance and statistical methods are merely predictive measures used to help understand situations in which we don’t have an exhaustive (or sufficiently exhaustive) understanding of the causes at work. If we knew the complete litany kinetic forces acting on a coin when it is flipped, we’d know precisely how it would land and wouldn’t need to talk about chance. All of the events you mention certainly have distinct causes. The problem is, that the causes are frequently not discernible and if we don’t know the causes we don’t know if the effect is predictive. Thus we rely on statistical metrics, not because they describe fundamental operations of reality, but because they help us predict the effects when those fundamental operations are unknown.
Hi Dave,
I actually appreciate it when you link to your other work. I don’t have time to read through a bunch of websites, so it’s really nice when you tell us about Mariner’s related stuff you’ve written elsewhere. (or that other ussmariner writers have elsewhere.)
I did see your 40-most-tradeable player series over on Fangraphs – I had a Mariners-related question about it. If one was doing this list from the viewpoint of a specific team – say the Mariners with their quite atypical stadium – would the order of the list change much/at all?
Thanks for all the work you and the other authors do here,
Robert
Hey ScottieDawg —
Thanks for the link to DePodesta’s blog. I found one particular comment particularly telling:
As tough as a good process/bad outcome combination is, nothing compares to the bottom left: bad process/good outcome.
This, to me, is a perfect description of the M’s 2006 and 2007 seasons, and unfortunately it led to a number of stupid decisions that will haunt the team for years to come (Choo, Jones, Asdrubal Cabrera, maybe Chris Tillman, the list goes on). While at some point we will all have to stop blaming Bavasi for the organization’s current shortcomings, there can be little doubt that Z is still cleaning up that mess. The reason we have Bradley is that he did the smart thing in trying to fix a dead-loss situation inherited from Bavasi by turning it into a potentially productive switch-hitting bat with some power. The fact that it hasn’t worked out seems to fit entirely DePodesta’s “Bad Break” category. And, it should be noted, Z has managed to restore some org depth while at the same time trying to do things that will at least make the big league club more bearable to watch.
By the same token (and I take to heart Dave’s admonitions some weeks ago about venting at Wakamatsu), I do think 100 games in there are issues on this team for which the present administration bears responsibility, and I think it is high time they acknowledge the mistake (bad process) and move on. No one, including the guardians of this site, really liked the Figgins/Lopez switch in spring training (though various rationalizations were advanced as to why it might work). “Belief systems” aside, I wonder if isn’t time to accept that the poor performance at the plate and in the field (of Figgins especially) is more than randomness, since our 2B doesn’t look very comfortable playing 2B…?
I agree, Dennisss. I appreciate what GarForever is saying, as well – there are certainly instances of bad process/bad result in this front office’s offseason performance – but I think that vast majority of the decisions the M’s made were the result of good process, with a dizzingly high rate of bad results. That sucks, but I’d still take that over the bad process/good result front office types.
I’ll second the thanks for posting the DePo blog post, scottiedawg. It wasn’t anything we haven’t seen here, but it is nice to see it acknowledged by somebody who has actually run a Major League team.
This is intended to be consonant with everything you say.
Just because something has a cause doesn’t meant that the cause isn’t random. Hypothetically, suppose: Maybe Dan Haren has been getting hit hard because of the location of his release relative to his shoulder. That’s really it, it’s why he’s been getting hit hard this season and if he had done otherwise, he wouldn’t have been hit hard. However, the fact that he’s releasing where he is might be random, in the sense that it could change any moment, and Haren won’t be hit hard anymore.
Even if there’s a cause for everything that’s happened to Dan Haren this season, that doesn’t mean that you should think it’s going to be a part of a steady trend.
This is a problem that has been talked about for centuries. This is nothing new. It is nice that Dave makes the connection to baseball for people, I suppose.
You should at least put up links on the sidebar to your stuff over at WSJ & ESPN.
Great post. I would be inclined not to accept randomness in this particular case. It is not difficult to find crazy-chance happenings after the fact. Had we discussed the odds of those 14 strait NFL coin flips going one way BEFORE they happened, I would be much more blown away.
Without getting too philisophical, there can be 16000 things happening every day that can have a 1/16000 chance of something going a particular way. So the odds of something crazy happening seem pretty good considering all (I think you made good mention of this with your 5 pitching month piece).
With so much attention on Haren already, I would be very suprised if his performance has been due to random bad luck. I think he is still a good pitcher but hitters got to him for a reason I think. But I could be wrong and often am! Love this discussion. We could go on for years!
Seconded for what Conor said. I frequent USS Mariner and generally not the other sites. I’d prefer to know what I’m missing out on by coming here.
Max Planck and Werner Heisenberg would disagree with you. As would the entire scientific establishment.
I’m not saying that baseball performance has all that much to do with quantum mechanics, but the flawed assumption you’re starting from is the same one that scientists had wrong for centuries until quantum theory became generally accepted. Randomness certainly can be the cause of a particular result, at least at the level of what we can observe, measure, and control.
14 consecutive coin flips? That’s amazing!
The problem, I think, lies in defining “random”. For the purposes of baseball performance, “random” has to mean unpredictable and uncontrollable.
Clearly, the outcome of the coin flips is entirely deterministic–it is the confluence of the starting condition of and forces applied to the coin, and the chemical impulses in the calling player’s brain that led him to make the prediction he did.
But those forces cannot be controlled sufficiently accurately as to control the fall of the coin, nor known precisely enough to allow for prediction.
A phenomenon can be both deterministic and “random” at the same time.
Dave
I agree with Conor, you should link to your other pieces in the sidebar.
My background is physics rather than sociology, but I do know that using a random process can often model the behavior of a large enough group fairly well, even though the individuals do not necessarily behave randomly. The danger is when you assume “since the large process is random, every smaller chunk of it must be as well”.
To use a flawed analogy for illustrative purposes – BABIP in aggregate may very well fit a random model because of the size of the population involved; however that doesn’t mean that short term swings in an individual’s BABIP are always random (I realize that’s not what Dave is saying anyway). Maybe the guy hurt his leg and is slower getting to first than when he’s healthy (not random). Or, maybe he really is on an extended unlucky streak (random). But with enough individuals lumped together, circumstances such as injuries should average out, assuming getting injured is a more or less random occurrence (so we can’t effectively use this concept for, say, anything involving Jack Wilson or anyone playing alongside Yuniesky Betancourt).
Did Dave get ripped off?
That’s why they
playflip thegamescoins.Good stuff from all involved, as usual.
I’d recommend a book: “The Drunkard’s Walk: How Randomness Rules Our Lives”, by Leonard Mlodinow.
I’ve worked for a couple of decades as a consulting statistician, and probably the one thing I can really say about people and statistics is: people don’t have the first goddam clue about statistics. Their brains just don’t deal with ’em. Mlodinow uses as a terrific example the “Monte Hall” problem, and points out that the overwhelming majority of people don’t get that they’re wrong about it, even after you explain it to them.
The same thing is true in baseball about, oh, what, batting average? ERA? Things everyone is sure mean lots and lots, but which actually mean not very much. It’s no wonder there’s an uphill fight in getting people to accept numerical approaches with actual predictive value: our brains just don’t work that way.
A funny thing about people’s perception of randomness, it seems like there’s an uncanny valley of sorts, where people want to assign causes to moderatly low-probabilty events, but are willing to accept pure luck on either side of the valley.
For example, Ichiro’s career batting average is .331. If he hits .340 or .315 in a season, most people accept that as the impact of randomness. On the other side of the valley is Asdrubal Cabrerra turning an unassisted triple play in 2008. There have been 14 of those recorded in all of MLB history, about 1 per 10 years. An extremely low probability event, and nobody tries to claim that, for that one play, Cabrerra somehow raised his true talent level of defense a couple or orders of magnitude.
Or Cammy’s 4 HR game back in the Good Old Days. He’s a decent hitter, and has power, but people don’t insist he did something different that day. It was just his day, things went well. Maybe he was seeing the ball really well, but people accept that as part of the randomness of life.
It’s stuff in the middle that they want an explanation for.
@Catherwood
I second that. “The Drunkard’s Walk” is a fantastic book that illuminates society misunderstanding of probabilities (especially conditional) and statistics.
@georgmi
The problem, I think, lies in defining “random”. For the purposes of baseball performance, “random” has to mean unpredictable and uncontrollable.
I think we needed this. When flipping a coin there are obviously forces at work that could be measured with the proper technology. How hard it is flipped, where it lands, at what angle it lands, etc. But putting together all those variables is like trying to predict a molecule’s drunkard’s walk. You really can’t do it. Entering a baseball season or a game, there are enough variables going on, with the pitcher, his fielders, and the opposing batters, that finding the true cause of something can be virtually impossible. Because the results, perhaps babip or HR/FB, follow a random pattern (even if that pattern the result of a bunch of other small random variables: see central limit theorem) it should be modeled as such. A random patter that, all else equal, regresses toward the player’s mean (or possibly league mean).