My Answer to “I Don’t Like How Complicated WAR Is and How It Is Constantly Changing.” “WAR is Like GDP for Baseball”

Posted by admin on March 29, 2013

Some of the common critiques of the Wins Above Replacement framework include: 1) Why do FanGraphs and have such different numbers, 2) How can we trust it when the numbers change, and 3) How can we trust it when I can't calculate it?

For the first question, our announcement today of a consistent replacement level between FanGraphs and has done a considerable amount to bring our two methodologies into alignment at least on the question of how big of a basket of WAR to hand out to players each year. Previously, FanGraphs allotted nearly 300 additional WAR due to a much lower replacement level. Our meeting in the middle has erased this difference to zero.

For the next two questions, I would point to a very widely quoted and very widely used statistic from economics, Gross Domestic Product (GDP). Here is the Wikipedia article on Gross Domestic Product. I'm going to argue that WAR is essentially GDP for baseball.

  1. GDP is an estimate of the market value of a country's goods and services. WAR gives you an estimate of a player's overall value.
  2. Wikipedia lists three different approaches to computing GDP, all trying to come to the same value. There are at least that many different implementations of WAR trying to estimate the same value.
  3. The guidebook for computing a country's GDP runs over 600 pages and requires the collection of statistics that no one person could accumulate. The explanations of the various WAR frameworks probably run 200+ pages across all of the sites with a WAR framework and at least on accumulating and crafting the data to go into the calculations has take years of work.
  4. Calculations for historical GDP change as new information or new industries and types of work are performed in an economy and then added to existing calculations. Calculations for WAR change as new data becomes available from proprietary vendors, RetroSheet or sabermetric study (such as pitch framing, pitch blocking and pitch f/x data).
  5. GDP is computed after the fact and we often aren't officially notified of a recession until many months after a quarter ends and researchers may continue to change GDP estimates years after the fact. WAR can change retroactively as we get additional information on how a new park plays or has changed or how a player's quality of competition has changed.
  6. GDP is a single number showing a broad measure of economic output--for example, Great Britain and France's GDP numbers are close enough to be indistinguishable to anyone watching their economies on a day-by-day basis. Yadier Molina, David Wright, Buster Posey, Ryan Braun and Andrew McCutchen were all within a half a win of value last year--a difference so small that a case could be made that any of them were the most valuable player last year.
  7. GDP does not account for every aspect of economic performance such as whether the production is from rebuilding after a disaster, the economy's asset value or black market production. WAR does not account for every aspect of player value such as clubhouse persona, any value from clutch/non-clutch performances, use by the manager, salary/contract status, or likely growth or decay of skills.
  8. The people who compute and create GDP calculations are economic experts who are building on years and years of economic study and research. The people who compute and create WAR or WAR-like frameworks are building on and expanding the years of sabermetric research by experts now employed directly by teams (Sean Smith, Tom Tango, Keith Woolner, Bill James) as advisors or experts in the area of statistics and evaluation (Nate Silver, Pete Palmer).
  9. The people who use and rely on GDP, news media, politicians, business owners don't have a prayer of computing it, but rely on subject experts to provide well-reasoned and carefully calculated estimates of economic value. The people using WAR (GM's, news media, agents) to estimate player value don't have a prayer of calculating it, but rely on subject experts either publicly or privately to provide well-reasoned and carefully calculated estimates of player value.

I can certainly understand unease with using one-number estimates like WAR, but I would point out that it comes from a long line of research, thought, and process that is common throughout the social sciences.

Editor's Note: You have my permission to republish any or all of this provided you link to this original post and attribute it to me. Sean Forman

41 Responses to “My Answer to “I Don’t Like How Complicated WAR Is and How It Is Constantly Changing.” “WAR is Like GDP for Baseball””

  2. Sean Lahman Says:

    This is absolutely fantastic. Transparency is important, of course, because it allows for peer review. Smarter people can tell you if your methodology is flawed. But I've never understood the mindset that says that a metric is unworthy of acceptance if it's too complicated to calculate yourself. The complexity of WAR is a quantum leap forward for baseball analysis. The fact that we have PhD-level mathematicians and economists tackling the fundamental baseball questions is a great thing. Maybe I can't do my own WAR calculations, or figure the GDP, or decipher the science behind tomorrow's weather forecast. It doesn't mean the results aren't insightful.

  3. Dave D Says:

    I don't like stats like this because they are useless.
    People think that a newer stat means more input to the game when all it really does it make it worse.
    If you have to write 9 paragraphs to explain something then is it really worth it?
    batting average is hits divided by at bats. That's not even 9 words...!

    Why not come up with the BAMP stat?
    Batting Average during Moon Phases...

    At what point does all of this start to get ridiculous and worthless? What's wrong with going back to the normal stats from Cy Young days.

    And this is from a guy that LOVES stats!

  4. Doug B Says:

    the concern I have is that WAR =/= dWAR + oWAR. Aside from stopping run scoring and scoring runs I can't think of anything else in baseball. so why can't WAR = dWAR + oWAR?

  5. Brian Says:

    Yeah I agree with Dave D. Statsy stats are just stats for the nerds who like to stat. Why not just stat if you like to stat? I went to a stat once and the nurse asked for 2 stats worth of stat, STAT!!!! I mean what the stat is up with that? In conclusion, read my stats. No new stats.

  6. admin Says:

    Thanks guys, I'm convinced.

    Doug, the position adjustment is included in both values. They are not designed to be added.

  7. Dave Hogg Says:

    Great comparison of WAR and GDP. Of course, this is also why no economist tries to provide live updates of every city's GDP.

  8. Doug B Says:

    " They are not designed to be added."

    why not? I'm not trying to sound snarky. I'm just wondering why you wouldn't want that.

    in baseball you score runs and you stop the other team from scoring runs.

    why wouldn't you want dWAR + oWAR = WAR?

  9. james h Says:

    Doug B, you can't add dWAR and oWAR to get WAR because it would adjust for position twice.

    Was the two digits after the decimal just a failed experiment? I'm glad its gone.

  10. Charles Saeger Says:

    There's a debate where the positional adjustment should be. Sean shows it in both spots since he likes it in hitting, and many of the rest of us want it in fielding.

  11. jr Says:

    I actually agree with Doug that the two should add up - it's just another layer of complication (and another broadside for anti-WAR people to lambast stats with). Decide if the positional adjustment goes in hitting or fielding and leave it there.

    Personally I'd put it on the fielding side since it is about defense after all - that way we can compare who truly has the better bat independent of everything else, then adjust for position and fielding ability.

  12. Drew B Says:

    Dave D and Brian:

    There is an easy solution for those, like yourselves, that don't like these advanced statistics.

    Ignore. Them. As they say, ignorance is bliss. And I'm sure you'd be filled with bliss armed with a novice understanding of the game.

    The batting average comment was all I needed to see. By that logic, a team of Juan Pierres would be a World Series contender!

  13. DavidRF Says:

    I like the positional adjustment on the hitting side. Positional adjustments are just as much about offensive scarcity as they are about defense -- with a debate as to the relative importance of those two.

    Fielding numbers have higher error bars so its nice to have a subtotal that doesn't include them.

    All the finer-grained contributions are in the (sortable) tables:
    RAR = Rbat + Rbaser + Rdp + Rfield + Rpos + Rrep

    Looks additive to me. Its all very transparent.

  14. Tim Says:

    Your answer to how complicated WAR is was pretty complicated itself.

  15. jr Says:

    DavidRF --

    The issue is that the great majority of people aren't going to delve into tables. They're going to look at the simplest cumulative measures. If they don't add up they're going to assume either a) the stat is wrong and we're all fools, or b) there's another layer of complication they don't understand, at which point many throw up their hands. The problem is b) is actually correct - there is a layer of hidden complication. If the point is to reach out to fans who think stats are too complicated, why add to the complication?

  16. Justin Bailey Says:

    @Dave D - I just wanted to say that your post is one of the worst applications of rhetoric I have ever seen. And I read video game forums.

  17. Tools of Ignorance Says:

    @Dave D

    I respect your right to your opinion, though I believe you are merely exposing your own willing ignorance. I can't explain brain surgery in nine paragraphs either, though I don't think that disqualifies the usefulness of the practice.

    I love stats. I even have a romantic kinship with "baseball card" stats - I much more enjoyed fantasy baseball with RBI's, wins, and saves than I did with WAR, UZR and WHIP. But to show contempt to research without any investigation is telling Columbus that the earth is flat, ca. 1493.

    Ask yourself why these new stats are "meaningless" and "worse". Then ask yourself why essentially all MLB head offices incorporate some form of Sabermetrics in their operations.

    Reaserch. Advancement of knowledge. These are good things.

  18. MG Says:

    So please explain why dWAR + oWAR = WAR counts twice? There are two different numbers of WAR. One for defense and one for offence. How does it make it twice when there are specifically two different numbers? Why it be doubled?

  19. Tim Says:

    Aside from us not knowing how it's computed, I'm not sure it does what it claims to do either. Last year, for example, Joey Votto and Melky Cabrera both had high WAR ratings. It was 5.7 for Votto, in the top 10 in the NL, and best on his team. Cabrera was 4.7, second best on his team and he was in the top 10 at times before his suspension.
    Anyway, they both missed about 50 games, and in both cases, their team had a much better winning percentage without them, even though the guys that replaced them had lower WAR ratings. To me, that says the stat doesn't work. Same thing happened with Justin Morneau a few years ago.

  20. Jon Says:

    I get - and appreciate - that WAR is trying to put a complicated mess into one simple number. I use WAR as a vague estimate - that is, if someone is 3-4 wins better than another player, he's likely the better player. I do NOT use it for trying to split hairs over very similar players. I don't mind that it's 'complicated'. To me it's not nearly complicated enough, because it's nowhere near being a usable number other than as an estimate.

    Don't get me wrong - it's better than other 'single-number' stats, but they don't claim to be the 'all-in-one' number WAR does. That is, OBP is simply the chance a player will get on base in a plate appearance. It's extremely accurate for it tries to measure; the same simply cannot be said for WAR.

  21. Mark the Mathematician Says:

    WAR is nonsense. @Tim sums it up perfectly in #19.

    The comparison to GDP is disingenuous and dishonest.

    Everyone, including economists, regards GDP as an order-of-magnitude approximation. GDP is also taken in a context of a whole lot of other things, which are explicitly examined.

    On the other hand, WAR purports to be EXACT. It also presents itself as THE one, god-like quantity that defines a player's value.

    You say it accounts for positional differences. That means someone like you has to make value judgments. That invalidates it as an objective metric. Unless, of course, you're the god of baseball, which just isn't possible, because a whole lot of other guys are saying the same thing about THEIR version of WAR.

    The tortured explanation, along with the condescension of WAR aficionados for the rest of us, says it all. Old military saying: if you're working too hard to justify your conclusion (and, WAR-guys, you ARE), you've got the wrong conclusion.

  22. Tim Says:

    A good comparison would be the NFL quarterback rating system. Except they do actually show you how that's computed on the NFL website, and once you realize how it's computed, you also realize that it's heavily based on touchdown to interception rate. So you might as well just look at touchdown to interception rates. Simpler is better.

  23. Hurtlocker Says:

    I'm uncomfortable with the comparison to the fictional "replacement" player. I would favor a comparison to actual players, and base the performance on actual performance of the team.

  24. Jon Says:

    I don't mind the 'fictional replacement player'. It's not very likely you're going to find an average replacement at the AAA level with a sample size of 30-40 per position.

    What I don't like is the summary of the article. Basically it sums up as 'There's a lot of people that have worked on WAR for years that are smarter than you and they know what they are doing. Take our word on this.'

  25. John Autin Says:

    How's astrophysics coming along with that Unified Theory of Everything?

  26. tomfromnorton Says:

    To Doug B....You cant's just add oWAR and dWAR together because they are both weighted separately (at least I think so). But that's one of the biggest grips I have with the whole WAR statistic, it's arbitrarily decided on what the overall value of offense and defense equals to the actual winning of a baseball game. I personally believe far too much weight is given to the dWAR value and it impacts the player's overall value plus/minus unfairly.

  27. rick Says:

    "'What I don't like is the summary of the article. Basically it sums up as 'There's a lot of people that have worked on WAR for years that are smarter than you and they know what they are doing. Take our word on this.'"

    That is no worse than "I can't easily calculate it, so therefore it has no value!" And I say that as someone for whom WAR, and other statistics, are way out of my pay grade, so to speak.

    Though I think we can all agree that comparing WAR to GDP is not going to help it's acceptance in the least.

  28. Mark the Mathematician Says:

    @Rick #27

    That is no worse than "I can't easily calculate it, so therefore it has no value!" And I say that as someone for whom WAR, and other statistics, are way out of my pay grade, so to speak.

    In other words, to be a true, legitimate baseball fan, you need to "man up" and learn WAR. Absurd.

    WAR is a fabricated quantity requiring value-judgment input. Therefore, it is not objective. It also can be calculated a half-dozen different ways, depending whose methods you're using. Therefore, it is inconsistent.

    Finally, it keeps changing. That means it's not even a contender for legitimacy, especially when its inventors keep telling us, "This is THE ultimate stat!" Then they re-work it, AGAIN, sometimes introducing variations of 20 or 30 percent, and then say, "Now, it's REALLY the ultimate stat!" Until next time.

    If this were real science, with peer review, papers, and rigorous scrutiny, these WAR guys would have been laughed out of the mathematical community a long time ago. Luckily for them, it's just baseball stats.

  29. Jon Says:

    @Rick - While you may be correct, I don't possess the thoughts in your argument. I already posted what I like and don't like about WAR and your argument is nowhere in my posts.

    @Mark - Nice post. I get what they're trying to do. They figure run differential correlates to actual wins, so they try to get the theoretical runs produced by each player then convert it into wins. The problem is each situation is not the same. WPA takes this into account, but yet I've never seen a WAR take WPA into account.

    The snobbery of SABR without peer review (and sometimes not even releasing the formulas to anyone) is ridiculous.

  30. Sean Lahman Says:

    "The snobbery of SABR without peer review (and sometimes not even releasing the formulas to anyone) is ridiculous."

    This has been said several times in this thread, but the full formulas are available and their has been a lot of peer review. There's been plenty of discussion and debate on the methodology -- including in this thread.

  31. Frank Says:

    I'm always amused by the Pro-WAR and Anti-WAR arguments, in that I think that both sides, as a whole, so firmly hold their beliefs that they will never change their collective minds on the subject. It's similar to evolution vs. creationism in the intractability of each side towards the views of the other. With WAR, however, their is probably some middle ground where the two side can meet if/when each can get over their intolerance of the other side's views.

  32. tomfromnorton Says:

    Frank, I agree with your analogy and even though I don't think the WAR statistic is perfect it is a useful measure/tool of a player's overall ability. Looking back at some of the MVP votes of the 1970s, it's to bad the voters didn't have this type of a measure as there was a lot of horrendous votes during those years. Seems like any 1B who led the league or were close to the top in Homeruns and RBIs were the automatic MVP. Having said that, I think part of the issue is the SABR people who come across as the WAR is the be all and end all statistic when it's definitely a work in progress and still requires a lot of tweaking.

  33. Dave Says:

    I like how the simpletons think such a thing as a hard statistic exists. Batting average may be a simple calculation, but it ignores everything from opposing pitching quality to fielding ability to ballparks to scorer bias. The only true statement about batting average is that it is a calculation of the number of hits recorded by a scorekeeper divided by the number of at bats.

    It would be unfair to compare 2 players batting averages to each other for any purpose because you end up willfully ignoring the unknowns listed above, among others. Did one batter have better lineup protection? Was he playing in Coors, or Petco? The same is true even for simple counting stats like HR or triples, they are all based on other factors. While this doesn't mean a .350 hitter is worse than a .200 hitter, it is also extremely unlikely that a 7 WAR player is worse than a 1 WAR player. Meanwhile, no one would claim a 35 HR hitter is better than a 33 HR hitter, and no one would say a 7 WAR player is better than a 6.7 WAR player.

    Even 20 years ago, stratomatic baseball was pumping up Cecil Fielder's 51 HR season on his card because he didn't have the privilege of batting against atrocious Tigers pitching. And all of their stats were based on simple math, but they weren't willing to stick their head in the sand.

    Sadly, the old, simple statistics make no effort whatsoever to account for obvious situations that everyone is aware of. Dante Bichette's Coors Field years are dismissed because he played in Coors, but for no other reason beyond that, and without even finding out how much worse he'd have been elsewhere. It doesn't matter if you're a new stats or old stats guy, we all agree "Dante Bichette is not a .340-40-128 guy" (or 340/364/620 if you prefer). Perhaps Dante Bichette disagrees. 364 obp by the way, hilarious.

    But instead of just stopping and saying, "I have these old stats, and I think something is wrong" WAR actually tries to figure out what he was (1.2 ! although a lot of that was terrible defense) it continually strives to do better once new information is understood. If you just want to stop after "I'm pretty sure Dante Bichette wasn't that good," you're certainly welcome to. I'm just puzzled why you'd be annoyed that someone tried to figure out just how mediocre he was.

    The result is the old stats actually become worse as time passes, simply because they are standing still. Why would you want a statistic that is missing information you know about? If you truly weren't aware of simple things like ballpark advantage, or even platoon advantage, then I guess you're welcome to stop looking. But once you know these things exist, wouldn't you want you basic stat to capture that information, so you don't have to have the caveat in every single conversation? I realize that you may still be fearful of recent fielding stats, but the nice thing about WAR is that as these are better understood, WAR can be adjusted up or down. But all the Mets pitchers of the late 90s have their stats locked in with a great infield defense behind them, their old stats are lies, they ignored a huge advantage that everyone new about. Yes, old-statters, you're being lied to. WAR is slowly clearing the minefield.

    Pitcher wins will probably always be the best example, because it was an old stat that came with an odd set of rules. (at least most of the counting stats don't have strange rules defining them) These rules weren't terrible, but after 100 years, it became pretty clear that some of the ideas behind these rules were misleading, and managed to complicate the statistic (they may have been better off simply crediting the starter with a win if his team won the game). So you old-statters are sticking by something that is fundamentally flawed, and not even a piece of "hard" data as you prefer. Those pitchers didn't "pitch" a "win." They went out there for at least 5 innings, did some stuff, a lot of other stuff happened involving 17 other players, and then maybe they got a "win." Super.

    I'm not really sure why the old stats guys are so loyal to them. Do you still believe in earth-air-fire-water as the 4 elements? That also would get you reasonably far in everyday life, but you're probably paying someone else to fix your computer while you're out there digging that trench. Intellectual conservatism is always interesting to me, mostly because it's so selective.

    Flame away!

  34. PhilM Says:

    There's another aspect here, maybe a "third rail" -- those of us who like what WAR does and applaud its comprehensive nature but are continually frustrated by the recalculation and reassessment. I like to use WAR in my analyses -- and now I have to go back an plug in new numbers for the Nth time. Going forward, I'm planning on using just the WAR ranking -- we really can't rely on what the actual numbers are, so maybe the relative position of players is useful. And until someone can prove to me that Curt Schilling is legitimately a top-30 pitcher and almost 20% better than Carl Hubbell, I'm going to have to take WAR with a sizable grain of salt.

  35. Raker Says:

    "Sadly, the old, simple statistics make no effort whatsoever to account for obvious situations that everyone is aware of."

    All of sabermetrics start with "old stats". There are no new stats that don't involve "old stats".

    Post #28 is closest to my feelings, especially the ever changing part about the input of value judgements.

    To me, WAR is giving the sabermetric effort a bad name. There are a multitude of accurate and new mathematical ways to look at baseball that have nothing to do with WAR. It's somewhere between tiring and appalling having C minus math students like Brian Kenny blindly spouting untruths as baseball statistical enlightenment, with poor Harold Reynolds knowing there is something wrong but unable to communicate just what it is.

    Kenny and Bill James are on the Astros sabr bandwagon and predict a 95 win season within 5 years. If that happens, it will have everything to do with having a top 3 pick as reward for 10 straight years of playing like a bad Triple A team and nothing to do with embracing WAR.

    As for the WAR not equalling the sum of oWAR and dWAR? That would have been a plus and would have made at least hypothetical sense.

    One last point, I realize that the "replacement level" is just an arbitrary reference point but why couldn't it be a factual one like comparing a player to the MLB average?

  36. Sean Lahman Says:

    "One last point, I realize that the "replacement level" is just an arbitrary reference point but why couldn't it be a factual one like comparing a player to the MLB average?"

    It could be, but the whole concept of WAR is based on replacement level. It recognizes that because talent is not normally distributed, comparison to "average" gives a a poor representation of how players contribute. Average and median are significantly far apart that the majority of players would be deemed below average.

    WAA -- Wins above average -- would show that Joey Votto was worth about 40 wins last year. And we know that this doesn't correlate with reality. It would also show that as a team, the Reds were below average last year, even though they finished with 97 wins.

  37. Raker Says:

    "And we know that doesn't correlate with reality".

    My feelings have always been that the reason WAR is constantly changing is to try and correlate it to the WAR creator's perceptions of reality.

    For example, from a couple of years ago, "That Ben Zobrist sure is a great player, I'll bet he's worth more than Albert Pujols when you consider all the premium positions he plays." Than tweak the formula until you get just that.

    But when that formula also says Darwin Barney is a top 10 player, it's time to re-tweak it because it doesn't correlate with reality.

    Serious question, considering the fact that guys like Mike Trout and Bryce Harper were replacement players, does that in any way factor into the value setting of a replacement player?

  38. Raker Says:

    *because there is a chance that last year's replacement players were around MLB average.

  39. Xeifrank Says:

    WAR is a framework for a measurement not an actual measurement itself. The important part and the part that is left up to the individual is the implementation. WAR tells you that you need to find components for base running, fielding, hitting and anything else you find important. The implementation maybe tells you to use UZR or FSR etc... for the fielding component and perhaps wOBA for the hitting and speed scores or something else for the base running. There is nothing wrong with having different implementations you just need to spell out what your implementation is. Those that bash WAR, I just don't feel sorry for them. The framework is solid. Often times it is just the implementation they have a problem with. Then make your own.

  40. tomfromnorton Says:

    To Dave...very well written response until the last paragraph, no need to bring intelligence into the WAR argument, you sort of proved my point in post # 32. But I still think everything you wrote was valid. Hopefully some major tinkering will be done with WAR over the next few years (especially with defensive value, way overrated!!!) to a point that the majority of those who closely follow baseball statistics will consider it the best stat and there will be no (well, maybe a few) dissenters!

  41. Raker Says:

    @39 ...In defense of the guys who work tirelessly on WAR, I'll say that their task to find a "stat" that is all-encompassing is somewhere between daunting and impossible.

    That said, there is no partial credit here. When a "stat" includes judgement values that skew the obvious, then the professional scouts, and in some cases even the casual fans are a better source of player value than WAR.

    Sure, everyone knows all of the components that make a good baseball player(framework of WAR) but lumping all of those components in a single stat can't accurately be done. If it's as simple as tweaking it until it looks right, then you may as well look at all of the hard data and make your own judgements like scouts have been doing since the 1800's.

    Baseball is a game where a 10% difference is the difference between a HOFer and a minor leaguer. In some cases, just WAR adjustments alone change a player's value 20%. Unacceptable.

    There is an old adage in carpentry, "start square, finish square" that I think applies to WAR. Because they didn't start square, they constantly have to cut 87 degree correction angles to make their building appear straight.

    This wouldn't even bother me if it was acknowledged as a work in progress and not as a Holy Grail of stats that pits the enlightened against those of us who use their fingers and toes to count.