Monday, September 12, 2011

Wins and WAR and MVPs

I got back-to-back emails last week from Brilliant Readers that pretty much sums up my frustration with so many gripes about baseball statistics. The first was written to defend the pitcher's win. I was actually glad to get that because I obviously did not do a good enough job making the irony clear in my In Praise of Pitcher Wins (Sort Of) post last week. Maybe if I had titled it "In Praise Of Pitcher Wins (Not Really)" or "In Praise of Pitcher Wins (LOL; I Spend 99.2% of the Article Ripping Pitcher Wins)" the point might have been clearer. But I heard from so many people who actually thought I was PRAISING pitcher wins, and they wanted me to know pitcher wins are not a good way to measure pitchers and that they're a waste of time and all those things I have spent 500,000 words on the last few years. Ah, well, when you hear from that many people missing it, then it's the writer's fault. I obviously did not get the point across well enough.

But some Brilliant Readers did get my intention, and one in particular made a well-reasoned argument that wins, flawed as they are, do tell us with a pretty decent sense of accuracy whether a pitcher is good or not, especially over a long career. OK, put that thought away for a minute.

The next email was from another Brilliant Reader who had myriad complaints about WAR. This too was well-reasoned, and it made the point that WAR is far from perfect, that the formula between Baseball Reference and Fangraphs is quite different, that it's ridiculous to take out the human element from baseball analysis and simply determine who is the best player by the decimal points of WAR.

It was good to read those back-to-back, because in just two emails I felt like I had seen the arc. The first BR wants too little from stats. The second BR expects ways too much.



Take the wins email. It is absolutely true that if a pitcher wins 20 games in a season, he almost certainly had an excellent season. It is even more true that if a pitcher wins 250 games in a career -- 300 games in his career, even more -- he will have had an excellent career, no exceptions. But, with all due respect: So what? How is this useful? Do we really have that hard a time determining whether a pitcher is "good" or "bad?" This feels to me like when they have one of those contests where you have to guess how many jelly beans are in the jar. Sure, you could guess "a lot." You probably won't win the contest.

If a pitcher wins a lot of games over a career, yes, he's good. It's also true that if a pitcher pitches a lot of innings, he's good. If he gets a lot of starts, he's good. Heck, since 1901, 10 of the top 11 in hits allowed are in the Hall of Fame, and the only one not in the Hall of Fame is Tommy John who has a great Hall of Fame case. The point is that we should expect a whole lot more from our baseball stats than vague "He's good" tips. Take these 10 pitchers:

-- Bob Gibson
-- Sandy Koufax
-- Pedro Martinez
-- Herb Pennock
-- Jack Morris
-- Jamie Moyer
-- Juan Marichal
-- Bret Saberhagen
-- Jim Kaat
-- Jerry Reuss

Let's rank them by wins and see what we get:

1. Jim Kaat (283)
2. Jamie Moyer (267)
3. Jack Morris (254)
4. Bob Gibson (251)
5. Juan Marichal (243)
6. Herb Pennock (241)
7. Jerry Reuss (220)
8. Pedro Martinez (219)
9. Bret Saberhagen (167)
10. Sandy Koufax (165)

OK, is that how you would have lined them up for Game 7 of a World Series? No, probably not. Now, how about we rank them by WAR and see where that gets us:

1. Bob Gibson (85.6)
2. Pedro Martinez (75.9)
3. Juan Marichal (64.0)
4. Bret Saberhagen (54.7)
5. Sandy Koufax (54.5)
6. Jamie Moyer (47.3)
7. Jim Kaat (41.2)
8. Jack Morris (39.3)
9. Herb Pennock (36.9)
10. Jerry Reuss (33.1)

Well, that might not be perfect … but that gets us a lot closer to what our heart and mind tells us, doesn't it? The biggest problem with wins, it seems to me, is not that it tells us inaccurate things. It is that, so often, it doesn't tell us anything at all.

But, then there's the other side of the argument. Look at the WAR list again. Bret Saberhagen is .2 WAR better than Sandy Koufax. Would anyone -- including Bret and his family -- make the argument that Sabes was a better pitcher than Koufax or had a better career? No. Of course not. Would you definitely want Jamie Moyer to pitch Game 7 over Jack Morris? We know Morris had a pretty good Game 7. And this is a major argument that people want to make against WAR in its various , that it is not flawless, that it is not accurate to the 10th or 100th of the decimal point, that its components (particular the defensive components) are desperately flawed, and thus it allows people to regurgitate the garbage-in, garbage-out argument that helped make O.J. Simpson a free man (for at least a little while).

It is striking to me that wins could have such a low bar and WAR such a high one, that wins could still be of some use because -- to use a GPS analogy -- it can generally locate where Philadelphia might be while WAR is of no use because it might tell you there's a traffic jam on Schuylkill Expressway when that was cleared up like TEN MINUTES AGO.

Of course WAR in its various forms is not perfect, not near-perfect. There's a substantial margin of error involved; I think everybody knows that. But is perfection even the point? Take this year's MVP balloting. According to Baseball Reference, Jose Bautista leads the American League in WAR. This has led people to argue whether or not Bautista deserves to be the MVP -- after all, his Blue Jays really have never been in contention, and WAR is not a perfect stat (Jacoby Ellsbury leads the league in WAR according to Fangraphs) and so on.

But I would say that the fact people are even having this argument suggests that WAR is accomplishing something. In 2006, Grady Sizemore might have been the best player in the American League -- he led the league in what people are now calling rWar (Baseball Reference WAR) and fWar (Fangraphs WAR) -- but he got pretty close to 0.0 consideration for MVP. I think now people would give him a bit more respect. The point is that WAR often does exactly what I think statistics should do. It challenges. It offends. It forces people to think rather then act automatically based on old and sometimes outdated thoughts. And it pushed people to defend their thinking, to show their work, which I think is good.

I wouldn't want people to use WAR lazily, to put any emphasis on a two tenths of a point difference, to just blindly follow. But, to be blunt about it, I really don't think that's much of a problem. People who go to the effort to use WAR, to understand it, generally seem to know that it's a tool, like all statistics. It's adaptable. You can use your own defensive observations pretty easily, just as an example. And it will get better. It will get more precise. Even now I would argue it gets us much closer to something real than blunt instruments like wins or RBIs or batting average. I'll say this: If it ever gets to the point where people just start using WAR blindly, without critical thought, then it will be past-time to find the next thing.

Continuing on the Bautista theme: I think WAR will prevent us from getting an egregious MVP choice for quite a long time. I could be wrong. But based on WAR, there have been seven players in baseball history who have contributed 4 or less WAR and won the MVP award, and I don't that's as likely to happen now. These, to me, are the egregious choices. You could always argue that one guy deserved the MVP over another, but assuming they both had surpassing seasons then it's only that: A fun argument. Was Ted Williams better than Joe DiMaggio in 1941? Fangraphs has Teddy Ballgame with 11.9 WAR to DiMaggio's 10.6. But a 10.6 WAR is a great, great season. DiMaggio was a deserving MVP. Start there. Now we can argue whether Williams deserved it more (I think so).

Point is, I don't think we will get a 4.0 WAR winner for a while.

Here is a list of those:

1979: Willie Stargell (2.3 rWAR; 2.8 fWAR)
-- The ultimate "soul of a team" winner, Stargell and Keith Hernandez shared the MVP award even though Pops only played in 126 games. His WAR wasn't even close to his teammate Dave Parker, who might be in the Hall of Fame had he won his second MVP award and been viewed as the driving force of the "We Are Family Pirates."

1992: Dennis Eckersley (3.0 rWAR)
-- I've written about this before … for a long while, American League voters had this fetish about giving closers MVP awards. Rollie Fingers got one. Willie Hernandez got one. Eck got one. Eck's is probably the most egregious because he might not have even been the best relief pitcher in his own division, 39 of his 51 saves were won by two runs or more (heck, five of them were by FOUR runs or more), and he only pitched 80 innings.

1987: Andre Dawson (2.7 rWAR, 3.7 fWAR)
-- His WAR is so low because of two things that hardly seemed to matter in 1987 -- he didn't walk (Dawson had a .327 on-base percentage) and his big power numbers were in many ways an illusion of context (at home, in the friendly confines of Wrigley Field, Dawson hit .332/.373/.668; on the road he hit .246/.288/.480). The writers thought they were making a break through by giving a hitter on a last place team the MVP, and I applaud those sentiments. They just happened to give it to the wrong guy that year. … I also mean this as no knock on Dawson, incidentally, who in other years like 1981, 1982 and 1983 had a real MVP case).

1996: Juan Gonzalez (2.8 rWAR, 3.7 fWAR)
-- Here's a good example of a voting catastrophe that, I honestly believe, would never happen in 2011 because of WAR. Sure, there would be people who might argue for Juan Gone because of his 47 homers and 144 RBIs. But it just seems utterly unthinkable to me now that anyone would have voted for him over Ken Griffey (9.7 WAR), Alex Rodriguez (9.4 WAR) or Chuck Knoblauch (8.8 WAR). I just don't think the voters as a collective would do that now.

1974: Jeff Burroughs (3.6 rWAR, 3.9 fWAR)
-- Or this … it astounds me how often American League voters have given this award to a Texas player who did not have that good a season.* Well, it astounds me how often American League voters have given this award to Texas players period. Burroughs. Juan Gone. Juan Gone again. I-Rod. A-Rod. Hamilton. That's amazing, isn't it? The Rangers only reached their first World Series LAST YEAR. There were few great hitting choices in 1974, which is why Burroughs won. Reggie Jackson certainly could have won the award. Bobby Grich could have won it, and maybe then people would have given him more Hall of Fame consideration. But this should be said: Burroughs had a very good offensive year; it's his defensive numbers that shatter his overall WAR. So if you believe those numbers to be overstated, this is not as egregious a choice as others.

*I should say here that while I think WAR will prevent us from getting egregious MVP choices, I could be wrong. There is a contingent suggesting that Michael Young should be an MVP candidate even though his rWAR and fWAR are both less than 4 and he's slugging .406 on the road this year and he's mostly a designated hitter. So if he indeed proves to be an MVP contender, I will have to concede that I'm wildly overstating the effects of WAR.

1950: Jim Konstanty (3.6 rWAR)

Well, the voters felt like they HAD to give the MVP to someone on the Phillies. After all, that was the year of the Whiz Kids, the year the Phillies -- after three decades of pain -- finally won a pennant. But picking Konstanty was just kind of nutty. Oh, he had a good year as a reliever and occasional starter, but come on. Here's something interesting. As Sept. 23 began, the Phillies were 7 games up with 11 games to play. Konstanty would pitch in six of those 11 games. And here's how he did:

-- Sept. 23: Pitched scoreless inning in 3-2 loss.
-- Sept. 25: Gave up two runs in bottom of the eighth in 5-3 loss.
-- Sept. 26: Came in with Phillies up 5-4 and gave up two runs, though Phillies came back to win.
-- Sept. 27: Gave up losing run in 8-7 loss.
-- Sept. 28: Pitched two scoreless innings in 3-1 loss.
-- Sept. 30: Gave up four runs in loss that pulled Brooklyn within one game.

Your 1950 MVP! I'm thinking had WAR been around, his teammate Robin Roberts might have won the 1950 MVP award. And while someone else might have been a better choice -- Eddie Stanky and Jackie Robinson and Stan Musial all had great years -- Roberts, at least, had an MVP caliber year.

2006: Justin Morneau (3.8 rWAR, 4.0 fWAR)
-- I think (hope?) this will be the last really misguided MVP choice. I guess people were talking about advanced metrics like WAR in 2006, but their effects were still quite muted. This was the year that Sizemore might have been the best player in the American League but there were other more prominent choices who would have been better than Morneau. I personally thought the MVP was Morneau's teammate Joe Mauer. But the guy who finished second in the voting was someone you may have heard of, a guy named Derek Jeter, and there was the expected uproar out of New York when Morneau won. Thing is: I think those guys in New York were right. If it came down to Jeter and Morneau, I think Jeter was a clear winner as his 6.3 WAR suggests.

This year, we have a great MVP race … in both leagues, really. The American League gives us a great hitter on a non-contending team (Bautista), a dominating pitcher (Justin Verlander), a speedy outfielder (Ellsbury), a force-of-nature second baseman (Dustin Pedroia), a Yankees centerfielder with power and speed (Curtis Granderson), a Red Sox first baseman who is an artist at the plate (Adrian Gonzalez). All of them are having great years. All of them are worthy candidates. Based on WAR, maybe you can throw in an underrated player from Texas (Ian Kinsler), a Detroit masher (Miggy Cabrera). It's great.

And in the National League, rWAR says that Matt Kemp is having clearly the best individual season. I don't have a National League MVP vote, but I would expect the voters will weigh that heavily. Kemp has been absurdly good in 2011. But Justin Upton has had a phenomenal year -- fWAR has Upton and Kemp separated by an irrelevant margin -- Troy Tulowitzki has been exceptional, Ryan Braun was my preseason MVP choice and he's putting up an MVP-type year, those Phillies pitchers are pretty awesome. The guy I like watching is Joey Votto. I saw him play three games in Chicago, and it reminded me: he was the runaway MVP last year, and he's almost EXACTLY as good this year. In fact, because he is scoring a bit better on defense he will probably have an even higher WAR this year than last.

And this gets at exactly why I love WAR. It makes things more fun. Should we be a slave to it? Of course not. But, is that even necessary to say? I mean: should we be a slave to anything?

48 comments:

  1. Thank you, Joe, these are the best posts!

    Well, these and your posts about family.

    ReplyDelete
  2. VER-LAN-DER
    M V P
    Case closed.

    (Tigers fan, natch.)

    ReplyDelete
  3. Good article as always. Just one thing I'd like to point out though. There is always a traffic jam on the Schuylkill Expressway, its just a matter of how much of one.

    ReplyDelete
  4. "Kemp has been absurdly good in 2011. But Justin Upton has had a phenomenal year -- fWAR has Upton and Kemp separated by an irrelevant margin..."

    Meanwhile, Upton isn't in the top ten for rWAR.

    One of the problems with using WAR is which version to use. BB-Ref and Fangraphs have some bizarre differences of opinion.

    ReplyDelete
  5. I'll also note that Upton's OPS is 300 points higher at home than on the road, which may have something to do with the aforementioned difference in WAR.

    ReplyDelete
  6. You are right-WAR is just a tool, not the be all end all. I use a stat that equates to runs above average, (originally bastardized from runs created and linear weights, and expanded from there). Pitcher numbers are derived from the offensive stats put up against them, not from pitching stats. In some cases it falls in line with WAR, but sometimes it does not.

    I think trying to convert to wins muddies up the works for the uninitiated, and makes players look closer together than they are. Some of the positional adjustments are arbitrary as well.

    For all the Verlander people out there (and he is my favorite pitcher) He most definitely deserves the Cy Young, but I don't think he has had a good enough season for the MVP. I think for a pitcher to win the MVP it has to be clear, and I don't think that is the case. The clear cases usually happen in big offensive years, where a pitchers numbers are better even than they look. (Think Pedro) In a relatively pitching friendly year like this, the great hitter is probably having a better year than the great pitcher.

    ReplyDelete
  7. I have a theory which I hope isn't right. If Kershaw pitches in September like he pitched in August he wins the Cy Young - top in SO, IP, ERA, 2nd in WHIP.

    I think the writers will, in the back of their mind, discount Kemp because one winner from such a train wreck of a team (I'm a huge fan) is enough.

    Terrible logic - but a human response.

    ReplyDelete
  8. Joe, you're as close to perfect as it gets, but please don't encourage the Jeteraters, even when they're right.

    ReplyDelete
  9. Get real Joe, traffic jams on the Schuylkill Expressway are never cleared up. Just sort of paved over.

    ReplyDelete
  10. My issue with WAR is not so much that I want it to be more accurate than it is (I do, but am willing to accept that statistics suggest things more than they tell us an exact story)—my issue is that WAR is one of the few statistics that actually attempts to make a value judgment on a player for you. WAR is not allowing me to do the analysis, it’s analyzing something for me. And it would be a little lazy of me, I think, to accept it wholesale as a measure of a player.

    Take Slugging percentage—SLG is an impassionate measure of how hard a player hits the ball; how many bases he tallies. It is utterly objective. WAR is someone’s recipe on how to judge the VALUE of a player—and different cooks have what appear to be quite different recipes. It’s Skyline Chili vs. Texas Chili.

    It’s okay to take Skyline’s interpretation of Chili as a standard of some kind and measure other chilis against it. It’s not an un-useful exercise. The issue I have with WAR is more in how people use it. I see a whole lot of writers now using WAR as an objective measure—that is, not throwing in other ingredients to their arguments surrounding various players. Adam Jones either is or is not a very good player based on WAR—that’s it. The argument stops there. My response: Oh yeah, well that’s Tom Tango’s recipe (is he the progenitor of fWAR?). Which is all well and good, I like Tom Tango, and his valuation is well-reasoned, and should absolutely be factored into the argument, but it’s still just one datapoint.

    ReplyDelete
  11. I personally like pitcher wins because it's kind of a reward for good pitching performances. For example, I like that Verlander has 22 wins. He's having a great year, he deserves 22 wins.

    But other than the satisfaction I get seeing a good pitcher reach 20 wins, I find the usefulness of wins as a comparative stat is minimal.

    ReplyDelete
  12. As a dyed-in-the-wool Yankees fan who is in love with the idea of Joe DiMaggio as much as anything in baseball: Ted Williams probably should have been the MVP in 1941. I also think it's easy enough to understand why he wasn't.

    1) New York finished 17 games ahead of Boston for the pennant.
    2) DiMaggio, for various reasons, generally got more love than Williams.
    3) At the time, the streak probably seemed more impressive than hitting .400. The hitting streak record was nearly 50 years old at that point, and DiMaggio blew by it. Hitting .400 had been a fairly frequent occurrence in the teens and 20s, which weren't as far removed. Who could know that no one would hit .400 again? (Though no one has really challenged the hit streak either.)

    ReplyDelete
  13. When people look back at the Morneau MVP, they never mention how much of the reason he was considered is that he turned around a bad season for himself and the Twins.

    Through June 8 in Seattle, he was hitting .235/.295/.444. The Twins were in 4th place, 11.5 games behind Detroit, 10 behind the Sox, and seven under .500.

    In Seattle, he had apparently gone out partying with some old friends and afterwards, had some sort of a come-to-Jesus meeting with Gardy.

    From that point on, he played in every game and hit .364/.414/.616. The team went 70-33 (an 110-win pace). I don't know how to find fractional-season WAR, but I'd assume it was negative for the first 2.3 months of the year, and thus higher than 4 for the last 102 games.

    Being a Twins fan, it sure felt like Morneau was reasonably in the MVP conversation because he had turned his season around and in doing so, turned the team's as well. There was a narrative there behind the numbers, which is why we do have voters and don't just give the award to the highest WAR.

    I wonder if Morneau had been injured and hadn't come back until June 9th, yet put up the same numbers from that point, whether people would be so dismissive of the year he had. (If someone has the partial-year WARs, please post them.)

    ReplyDelete
  14. I don't fully understand the Verlander for MVP arguments. Is his performance that much greater than CC's, Weaver's, or Haren's? A Cy Young vote for any of these 4 seems perfectly justifiable. Verlander's season just doesn't appear to be so dominant, head-and-shoulders above his peers (a la Greinke '09), as to warrant MVP consideration.

    As much as the Cy Young wins for Greinke and Felix in '09 and '10 are celebrated as victories for the WAR-crowd over fans of pitcher wins, I am skeptical that much progress has been made. Felix and Weaver had nearly identical seasons last year, yet Weaver never seemed to gain any real consideration. And Verlander's MVP chances are clearly being supported by his surpassing of 20 (and 25) wins - rather arbitrary totals that are nice to look at.

    ReplyDelete
  15. For all of the Verlander-for-MVP talk (and deservedly so with a 7.7 rWAR, 6.4 fWAR), there's an argument that he's not the MVP of the Tigers. Alex Avila is having far and away the best season for AL catchers (5.4 rWAR, 5.3 fWAR) and Miggy Cabrera's not doing too shabby either (5.7 rWAR, 5.6 fWAR).

    ReplyDelete
  16. I'm open to consideration/education about every possible statistic, especially advanced ones that make it possible to "compare" dissimilar players - but my chief hesitation at embracing WAR whole-heartedly is that I have NO IDEA how it is calculated. I get what it's supposed to represent - how many more wins a team would have compared to if that player were replaced by an "average" player - but HOW do you arrive at this magical figure?

    I GET "OPS" - it combines on-base percentage AND slugging percentage, so I can "see" how the speedy jackrabbit centerfielder (say… Michael Bourn) compares to the crushing first baseman (say… Ryan Howard) – and I can even calculate it myself.

    WAR is much more mysterious…

    I picked Bourn and Howard as off-the-top-of-my-head examples of the two archetypes I mentioned (speedy CF, crushing 1B). By OPS – a statistic more advanced than ol’ fashioned batting average, or even sabermetric-loving on-base percentage, Howard seems to have a significant advantage over Bourn this year: .849 to .749

    But by WAR Bourn becomes the “more valuable” player: 5.0 to 2.9

    And it’s not all just defense. Gold-glove CF Bourn should be expected to “save” a few games, but when I look up ONLY Offensive WAR… Bourn still has the advantage: 3.6 to 2.6

    So – I “get” what Wins Above Replacement is supposed to represent… but do I believe it in my bones? Not yet.

    ReplyDelete
  17. http://www.insidethebook.com/ee/index.php/site/comments/how_to_calculate_war/

    ReplyDelete
  18. My biggest objection to WAR for HOF (not necessarily MVP) is that it's a counting stat. An average pitcher who lasts forever will rack up some big counting stats (e.g. Moyer). But if used as Joe suggests, as one of an array of valuable tools used to gauge how deserving a player is, it offers useful insight. Moyer, for example, has 12 seasons (12!) with ERA+ under 100, but only 4 seasons with WAR under 0, so if ERA+ is accurate in comparison, WAR needs some work (unless Moyer is one of the best fielders and hitters of all time, and with zero GG and an OPS+ of -9, he's not). So there are 8 seasons where below average Moyer still earned some WAR. That suggests B-Ref's WAR is too friendly to pitchers. As for Verlander's candidacy, he's not even leading the league in ERA+ (Beckett) or ERA (tied with Weaver) so if it's not absolutely clear except by the flawed stat of WAR that he's the best pitcher, then he can't be the MVP, unless you eliminate Bautista, Granderson, Pedroia, Ellsbury, Gonzalez, etc. from consideration.

    Were I to pick an AL MVP, I'd go Bautista, in part because there are so many worthy candidates on contenders, which to me tends to cancel each other out. But I'd still have to wait until the end of the season, in case the Rays or Angels make the playoffs.

    ReplyDelete
  19. One simply cannot gloss over the elephant in the room on the WAR issue - THERE IS NO AGREEMENT ON HOW WAR IS CALCULATED!!!!! I'm all for advanced stats but how can anyone ignore that itsy, bitsy problem? And it's not like they vary a little, they can vary a lot - you mentioned Kemp & Upton in the NL. Per fangraphs, they are neck and neck at 7.1 & 7.0. Per b-ref, they are 8.6 (Kemp) and 4.7 (Upton). That's not a small difference.

    Batting average is a bizarre stat no doubt, but at least everyone agrees on how its calculated. . . .

    ReplyDelete
  20. "So there are 8 seasons where below average Moyer still earned some WAR. That suggests B-Ref's WAR is too friendly to pitchers."

    Wina Above Replacement. Not Wins Above Average. Also, ERA+ and WAR differently assess starters and relievers; the former makes no distinction, and the latter does.

    A recent no-win season of full length was Derek Lowe's 2009, when he had an 88 ERA+. Or this year, with Liván Hernández, at 86.

    That's assumed that it's about what you'd get from a bargain-binner AAA starter, if you actually gave him 200 IP in which to be Eh.

    ReplyDelete
  21. My dad always used to say, "Figures lie and liers figure."

    ReplyDelete
  22. I think WAR is a great tool to use. My problem is, a lot of people use WAR as the be all, end all, and will listen to NO arguments to the contrary. It's annoying because with those people, it doesn't allow debate. The answer to everything is "His WAR is 0.2 higher" and that's the end of that.

    ReplyDelete
  23. I can't take any stat that people don't know how to calculate seriously. The fact that BBRef and Fangraphs have different WAR values is a huge, huge black mark against WAR and one that deserves more attention. You can hate wins and RBI but at least we all agree on how much each player has.

    ReplyDelete
  24. WAR! Huh...Good God y'all
    What is it good for?
    Absolutely nothing
    Say it again

    -via James Brown (who must not have been a sabermetrician)

    ReplyDelete
  25. @BoatDoc...That's not a common sentiment, but one thing about WAR, it does not compare players to an "average" player but to a "replacement level" player. A replacement level player is substantially worse than an average major leaguer.

    ReplyDelete
  26. @KyleLitke...who are these people you speak of? Can you give me an example of someone using a .2 difference in WAR as the be-all and end-all of an argument?

    My point, obviously, is that I think you're arguing against a straw man. I'm not aware of anyone who uses WAR in that manner.

    ReplyDelete
  27. "You can hate wins and RBI but at least we all agree on how much each player has."

    Well, because the scorer decided what was what. Like, nothing's ever going to be as beautifully simple as Runs Scored, but who cares?

    Fangraphs' WAR is generally more rewarding to excellent performances and Rally/B-R views them more marginally.

    I've seen Joe just add the two "competing" figures together, which I've always liked.

    ReplyDelete
  28. The key reason in difference end up being the relative-strengths of replacement and its application to position.

    ReplyDelete
  29. The creator of WAR did say it's meant to be taken in 3 year samples no?

    ReplyDelete
  30. "Moyer, for example, has 12 seasons (12!) with ERA+ under 100, but only 4 seasons with WAR under 0"

    To expand on what Ebassan said, average ERA+ is by definition 100. But an average player will have approximately 2.0 WAR. There is no contradiction here.

    ReplyDelete
  31. I love stats, loved them before I ever heard of Bill James and sabermetrics. I’ve tried to embrace them as much as possible but frankly, after a while they become so convoluted as to have almost no value to me. WAR is like that for several reasons:

    1) The difference between rWAR and fWAR confuses me and gives me little confidence that either is accurate. (I was also told there is a third source of WAR, although I haven’t been able to find it.)

    2) I have no idea how to calculate WAR on my own. Is there a formula like ERA to just plug stats into?

    3) My understanding is that WAR includes a number of arbitrary values, like park effects, value of various fielding positions and the contribution of a replacement player.

    4) WAR is of no use below the major league level. I coach high school baseball and I can incorporate stats like on base percentage and slugging percentage, WHIP and other stats to evaluate players, but WAR is useless to me.

    Perhaps, Joe, you could write a column about how WAR is calculated and how to read the differences in the two versions.

    Also, @adam, while I have no doubt that James Brown may have sung his own version, the song WAR was popularized by Edwin Starr.

    ReplyDelete
  32. Interesting take, Joe (and it works nicely in conjunction w/ the IATMS vs Rob Neyer discussion last week). Here's my only issue (and Rob said something of similar effect) is the statement: "People who go to the effort to use WAR, to understand it, generally seem to know that it's a tool, like all statistics. It's adaptable. You can use your own defensive observations pretty easily, just as an example."

    In reading the comments on this blog, and on IATMS, Rob Neyer, and Beyond the Boxscore, it's clear to me that a lot of people don't really understand WAR, including those that understand what it's trying to accomplish. Even some Brilliant Readers (and I include myself in that group) aren't 100% comfortable w/ knowing precisely how it works. And that's a bit troubling to me.

    As for my own personal issues w/ WAR, I don't like the range of uncertainty (let's say 15%), as well as the flukiness that can affect defensive stats on a year-to-year basis (both Matt Kemp and Jacoby Ellsbury have a significant better UZR this year than last, so do you dock them a little bit?), and it becomes too broad for me.

    I agree that no one is arguing that a 0.2 difference is the end-all, be-all to a debate. But if we're using a 15%+ variance, by the end of the year we're talking about a 1.0+ difference.

    ReplyDelete
  33. "WAR! Huh...Good God y'all
    What is it good for?
    Absolutely nothing
    Say it again

    -via James Brown (who must not have been a sabermetrician)"

    Clever, but that's not James Brown. It doesn't even sound like him.

    ReplyDelete
  34. "Well, because the scorer decided what was what."

    Yep. So we all know what the number is, unlike WAR, where people just throw their own numbers in and come up with their own results.

    ReplyDelete
  35. @Vidor, others -

    As I understand it, WAR's development wasn't just "throw their own numbers in." You should follow the link Ebessan put above and read a little about it. Here's the really short version:

    Each event on a ball field produces value, either for the offense or defense. Over time, people have measured those values by counting up everything that's ever happened in every baseball game for which we have data - 90+ years at least. So, in the 100,000 situations where a man singled to lead off an inning, how may runs scored, total? How many on two-out hits? What if you followed it with a double? A walk? Etc. etc.

    Based on the results of what actually happened in baseball games, under a variety of conditions and scoring environments, we know that the average single contributes x number of runs per 100 events, and so forth. Add up the expected run values of every event, and you have a general idea of how many runs a player's bat has contributed. That, in essence, is WAR.

    The discrepancies between bbr and fangraphs come in the adjustments, IIRC - calculating the less-clear-cut things like baserunning and defensive value.

    That doesn't mean you have to like it, but to pretend it's all just made-up guesswork is incorrect.

    ReplyDelete
  36. This comment has been removed by the author.

    ReplyDelete
  37. "The discrepancies between bbr and fangraphs come in the adjustments, IIRC - calculating the less-clear-cut things like baserunning and defensive value."

    In other words, guesses.

    ReplyDelete
  38. Okay, War. War's a perfectly A-OK metric to me, sure. But remember that's all it is. Remember Offensive W-L Pct? Linear Weights? Win Shares? All of them had their day, are flawed just because and have their critics. It's important to never over-rely on anything like this-WAR is Not a magic bullet or the be all to end all.

    ReplyDelete
  39. Great post!

    I'm confused about the point you're trying to make with the list of Rangers MVPs though. "Burroughs. Juan Gone. Juan Gone again. I-Rod. A-Rod. Hamilton." Even with the Rangers not going to their first World Series until last year, 4 out of those 6 awards were still given to guys on a Rangers playoff team- not surprising in the least that the voters would do that.

    And then also, I'm assuming you're only talking about Burroughs and Juan Gone as the Texas players who "did not have that good a season"? I'm assuming you weren't trying to imply that I.Rod (8th in rWAR, 4th in fWAR), A-Rod (1st in rWAR & fWAR), or Hamilton (3rd in rWAR, 1st in fWAR) were Texas players who didn't have that good of seasons.

    And actually, despite WAR, I still think it's a little weird to say Juan Gonzalez or Jeff Burroughs "didn't have that good a season". Burroughs, in a pretty weak hitting class was 3rd in the league in OPS. And Gonzalez, when ignoring the lens of steroids (as people were doing at the time), still had a great season both years. Definitely not deserving of the award in '96 for sure. It just seems difficult to add your phrase "didn't have that good a season" to any of these guys you listed.

    ReplyDelete
  40. Sean wrote: "(both Matt Kemp and Jacoby Ellsbury have a significant better UZR this year than last, so do you dock them a little bit?)"

    In 2009, Matt Kemp won a Gold Glove on the basis of many truly spectacular plays. His athleticism was more than enough to overcome lack of experience and poor route taking in the outfield, and he had a positive UZR number (for the only time in his career).

    Last year, it was fairly well documented that he was dating Rihanna and that or something else affected every element of his game. He was not only sluggish in the field, but his base running was also slower. He went from 34 steals and 8 times caught to 19 steals and 15 times caught. His UZR went from a career norm of around -1.3 (average of 2007-2009) to -25.7.

    This year seems to have demonstrated that 2010 was an outlier. He is more focused on the basepaths, with 38 steals versus 9 times caught, having worked a lot with Davey Lopes on stealing and running the bases. His UZR defense has returned to career norms: not the positive outlier in 2009, or the negative one of 2010, but a number that fits within the range of his other career numbers. Or maybe in 2010 Manny was no longer patrolling left field, so Kemp was not getting a lot of Manny's balls as in 2009. Now Kemp is adjusting to a real left fielder (mostly Tony Gwynn) and regressing to his norm.

    I see no reason to think that Kemp's UZR rating is skewed this year. I wonder why the Dodgers don't use Gwynn in CF and Kemp in LF on games they both start, since it's fairly clear that Kemp needs strong messages to work on his game. Maybe they should have John Shelby and Kemp analyze tape of every play Bourjos makes to try and get Kemp to see the difference.

    Similarly, Ellsbury this year does not come close to Ellsbury's 2008, so there's no reason to think this is other than a good season for him.

    ReplyDelete
  41. 1974 was a weird case--in the AL at least. If the season had ended a month early, Louis Tiant would have been in the running for the MVP-he was 20-8 for a team that had a thin starting staff and no one outside of Yaz having that good a year for Boston. We all know what happened next-8 game losing streak, 5 shutouts in 8 games, a 7 game lead turned into a 7 game deficit and a 3rd place standing come October.

    Jeff Burroughs got it because he hit over .300, lead the league in RBI, and played on a semi-Cinderella team. There wasn't any one guy that you'd have given it too outright-Grich? Bando? Carew? Jax? Dick Allen was having another big year but tanked on the Chisox w/ a few weeks to go, so you knew he was out. I think Burroughs was over-rated that year, but he wasn't a BAD pick.

    ReplyDelete
  42. WAR calculations are only mathematical methods of performance scouting. They differ in their answers because they have different "opinions" on different considerations of value, in the same way that flesh-and-blood scouts will occasionally have differing opinions on a player's skills. Were any of us the head of an MLB team's scouting division, who among us would ask for perfect agreement from the team's scouts?

    ReplyDelete
  43. "Were any of us the head of an MLB team's scouting division, who among us would ask for perfect agreement from the team's scouts?"

    Maybe not, but nor do we pretend that a scout's evaluation is a statistic that we can represent with numbers.

    ReplyDelete
  44. "Maybe not, but nor do we pretend that a scout's evaluation is a statistic that we can represent with numbers."

    The 20-80 scale that scouts commonly use suggests otherwise.

    ReplyDelete
  45. Favorite stat: Ryan Howard leads MLB with 112 RBI and has 1.7 WAR. Will he get any MVP votes?

    ReplyDelete
  46. Here's something I don't quite get. WAR calculates wins above replacement. Replacement level varies by position. Thus, it seems that a replacement level 2nd baseman, for example, is worth fewer total wins than a replacement level 1st baseman.
    Replacement level players are not worth 0 wins. They are at 0 WAR, but not 0 wins. Baseball-reference sets a replacement level team at a .320 win pct, or 52 wins. Thus, if all positions are equal, each of the 9 positions on the field is worth about 5.8 wins.
    But of course all positions are not equal, so a replacement 1st baseman is worth more wins than a replacement 2nd baseman. What if, just as an example, a replacement level 1st baseman is worth 6.8 wins and a replacement level 2nd baseman is worth 4.8 wins?

    If such is the case, and I don't know how to figure out the actual win value of a replacement level player, then how do we come up with the idea that a 2nd baseman with a WAR of 5.0 is worth more than a 1st baseman with a WAR of 4.0?

    The sum total of wins (not above replacement, just wins) for the 1st baseman would be higher than for the 2nd baseman, and thus the 1st baseman would be more valuable.

    I understand that it's easier to replace a good hitting 1st baseman with another good hitting 1st baseman than it is trying to do the same thing with 2nd baseman. But it seems to me, when it comes to total value, the replacement value is worth something, isn't it?

    What am I missing?

    ReplyDelete
  47. I haven't scrolled through all comments, but on the question of Texas Rangers MVPs, it seems like voters give a little extra credit to players on a team that isn't always in the postseason. See Josh Hamilton and Joey Votto last year.

    Looking back to the Pudge and Juan Gone MVPs, they all were won in years the Rangers made the postseason. Add to that an affinity for HRs and RBIs and there you go. The inflated numbers from Rangers Stadium and enhanced workout regiments made them jump out as seemingly deserving winners.

    ReplyDelete
  48. @Mark Daniel - that is a good point as far as absolute wins... but the idea behind WAR is not to calculate absolute wins. Scarcity counts.

    Let's use two players as an example, and stick to "boxcar stats" (to borrow the hockey term) -

    player A is your 2b. He hits .280, 16 hr, 80 rbi.
    player B is your 1b. He hits .300, 32 hr, 110 rbi.
    both players are equally-valuable defenders and baserunners.

    Now, player B hits better, and everything else is equal, so he is worth more in absolute terms to your team. But if both of them were injured you'd actually miss the 2b more, simply because he's harder to replace. More people are capable of fielding passably at first, meaning you'll probably find a better hitter there who also will hold his own with the glove; you will struggle to find a second baseman who can hit enough while still competently fielding his more difficult position.

    So - your new 2b, C, hits only .250, 6 hr, 40 rbi. Your new 2b, D, hits only .280, 22 hr, 70 rbi. In absolute terms, each is roughly the same absolute amount "worse" but you've half of your offense from the 2b position, and only about 35% from the 1b position. C is also a probably worse fielder than D. Your team would therefore benefit more from A's early return, than B's. And that's what WAR is trying to calculate... your team would be five wins worse with C instead of A, and only four wins worse with D instead of B.

    ReplyDelete