Sports Reference Blog

The Relationship between WAR and Team Wins

Posted by sean on August 8, 2012

This question came up on the SABR-L newsgroup for SABR members.

I'm almost reluctant to ask this seemingly obvious question, but I'm puzzled:

What does Wins Above Replacement Value mean in terms of team success? Or put another way, if a team consists of nothing but replacement players, or -0- value WAR players, how many games does that theoretical team win? Surely not -0- games. Nor 81.

Here was my reply:

We have a pretty exhaustive intro to the metric here.

http://www.baseball-reference.com/about/war_explained.shtml

A replacement level team is set on our site to win 52 games in the year, so there are about 30*162 * (.500-.320) = 875 wins above replacement available for the league as a whole.

Here is an example conversion of WAR to team wins.  

The Phillies batters have 9.8 WAR, pitchers 6.1 WAR, and they've played 110 games.  A replacement team would win 110*.320 = 35.2 of those games.  Using WAR, we estimate the Phillies to have 9.8 (WAR batting) + 6.1 (WAR pitching) + 35.2 = 51.1 wins compared to their 50 actual wins.  51.1 is right in between their actual 50 and their pythagorean 52 wins.

Using a team playing well, Texas has played 109 games, so the replacement team is at 109 * .320 = 34.9 wins.  They have 13.7 batting WAR and 14.7 pitching WAR or an estimate of 34.9+13.7+14.7 = 63.3 wins.  They actually have 64 with a pythag of 62.

Cumulative Team WAR is not constrained to match up with team wins and losses, but it definitely should match up to team wins and losses and team pythagorean wins and losses.

50 Responses to “The Relationship between WAR and Team Wins”

  1. tim Says:

    If the team stat doesn't correlate to team wins and losses then it's not a good stat. I thought the whole point of it was to show "wins" above some generic replacement player who would be the same for every team.

  2. admin Says:

    It does correlate. I'm not sure I follow.

  3. Dr. Doom Says:

    @Tim, I don't understand what you're saying. This post clearly shows that the stat DOES correlate very closely to actual wins and losses. A generic team of replacement players would win (approximately) 32% of their games, by the estimates here - thus 52 games in a season. Now, you can break up those individuals by position and then figure out how much one player is above that hypothetical level.

    If you were so inclined, you could "doctor" the number, so that it corresponded exactly with team wins and losses... but in my opinion, that would be even more contrived than what actually goes on in the WAR calculation.

  4. Matt D Says:

    You cherry-picked a couple teams that sounded like they proved what you wanted them to prove.

    Using another example, the Orioles are at 3.4 batting WAR and 13.6 pitching WAR for a total of 17.

    Replacement level is 35 wins, so WAR predicts 52.

    Actual is 59 (13 percent too high). Pythagoreon is 49.

    WAR is a b.s. stat. Those who have not figured that out by now will do so eventually.

  5. Dr. Bread Says:

    Matt....you cherry picked the team that probably has the biggest outlier...with their run differential the Orioles have everyone scratching their heads....the Cardinals are doing the opposite of the orioles....best run differential yet they are under performing so they would "prove" you correct also.....however most every other team is pretty close to right on with the WAR calculation.

  6. Charles Saeger Says:

    Something is wrong with the replacement calculation since the Orioles' combined Wins Above Average is -4.5, well in line with the team's Pythagorean projection. Is something being added twice?

  7. William Tasker Says:

    Excellent explanation as I too wondered about this. Glad the question was asked and answered so perfectly.

  8. Logan Says:

    Matt: If WAR is a bullshit stat, as you've claimed, than pythagorean win is more of a bullshit stat. (Since it's off by 10 for the Orioles, not just 7.) But you don't seem to have a problem citing it anyhow.

  9. tim Says:

    Actually, it doesn't correlate. I just went to last year's stats. The team leaders in the runs scored in the AL should be in the same order as the team leaders in Offensive WAR, since the only offensive team stat that matters is runs scored, and they're not.

  10. admin Says:

    Really, you don't think park factors or quality of pitching faced should matter at all?

  11. Dr. Bread Says:

    nice admin....nice.

  12. TheGoof Says:

    Has anyone studied large examples of team WAR over the years? Because I suspect it is only so-so as a predictor.

  13. kzuke Says:

    tim, i think what you may be missing is that WAR (which is an individual statistic) is being used to predict team performance. and pretty accurately at that. if you can think of one individual statistic (for all players) that can do the same, i'd be really interested in hearing about it.

    fwiw, i just did a quick analysis of actual wins vs WAR (bbref and fangraphs) and actual wins vs pythagorean wins. i only did 2012 AL (which is an extremely small sample) but the results were surprising to me. the bbref WAR statistic had a better R-squared value (0.68) than both fangraphs (0.41) AND pythagorean wins (0.61). granted, none of those make the correlation statistically significant, but i think it shows that it is valuable.

  14. kzuke Says:

    doing the same analysis for all of MLB for 2011, pythagorean has a better R-squared value (still all lower than 0.95). i supposed this should be expected as the win/run differential relationship tends to make more sense as the season goes on. regardless, the ability to predict team success on individual stats is invaluable

  15. Ron Johnson Says:

    Just to tack on to the initial post: The standard error of pythag wins is just under 3.5 wins per 162 games. Further, team WAR doesn't perfectly match pythag wins because the offensive side of WAR has a standard error somewhere around 15 runs.

    In other word team WAR should add up to pythagorean wins except that you substitute predicted runs scored for actual runs scored. This means that the standard error between Team WAR and actual wins should be a tad over 4 wins.

    That's as good as you're going to get without getting into special credits (or debits) for clutch/luck/timing.

    And to get back to #1, team WAR does correlate highly with team wins. It doesn't correlate perfectly.

    I'm not nuts about the way they get the error down on the offensive side (new weights are calculated every year in each league), but it's certainly defensible and the linear weights approach will work pretty well using generic weights.

  16. Corey Wamer Says:

    Question: How was it determined that the replacement level was 52 wins or a .320 winning percentage? Is the internal consistency "honed in" by picking a replacement level that "works"? If the replacement level is actually 40 wins (.247), or 20 (.123) wins or 60 wins (.370), all the calculations are off. What was the methodology for selecting 52 wins? The fact the .320 winning percentage causes the replacement wins plus the WAR to equal the team wins is not sufficient. That’s calculating backwards using team actual wins to “find” the correct number, in essence, using the known quantity (actual wins) to prove the validity of the formula (actual wins) . Is it 52 wins every year? If not, then is the replacement level changing yearly, or are you simply adjusting the replacement level to compensate for errors of formula construction? And how can you prove that the replacement level does change and it is not formulistic error?
    Treating wins as a known quantity in any formula that attempts to calculate wins is circular reasoning that will always yield results that look good on paper but are of dubious use. Just like Pete Palmer's linear weights, which uses team wins to calculate the "value" of offensive and defensive stats, the sum will always add up to the teams wins. It doesn’t mean the formula has any utility.

  17. Dave Says:

    Not quite CW
    1) you could set the scale however you wished. WAR first predicts runs, then translates that to wins based on the assumed 52. You could say a replacement team would win 40, then everyone would get an appropriately higher WAR. You could also use WAA as a perfectly valid base stat, and scale everything from an 81 win team. Or invent WAN (wins above nothing). As long as the total individual WAR's are reasonably predicting team wins, it's working. As the original post mentions, there are only 875 WAR to award each year, so think of it as a share system, not an absolute.So Trout is about 0.8% of the value of the MLBPA, even if espn might think him significantly more valuable.
    2) Wins are one of the few known quantities in baseball. There are 2430 of them every year (plus 1-game playoffs, minus rainout-abandoned games). All share systems are circular reasoning as you've described it.
    3) I too am curious about the 52 base. I would think a team truly consisting of only replacement players would do a little worse than than, although I don't watch of lot of Astros games. A more appropriate name might be WA52.

  18. Jim Says:

    I am saddened to see this morning that ESPN has now deposited this utterly useless OWAR stat onto their web site.

    The utility of any statistic is in predicting future performance of an individual player not in comparing his mathematical output to a fictional character. Most of this crap being tracked today is for fantasy geeks, lazy sportswriters, and player agents who want to inflate the 'value' of a player to management.

    The stat should also tell you something of value about the player. Shin-Soo Choo is 8th in the AL with a 3.5. So what? Joe Mauer is right behind at 3.4. Big deal.

    Causality trumps correlation. These stats correlate a players performance with a teams performance but the causal proof is lacking.

  19. Dr. Doom Says:

    Okay, to everyone here: do you believe in "average?" That's the starting point. There is no "mythical" player who is exactly "average" in everything. And yet, no one is worried about comparing people to "average." "Replacement," just like "average," is a mathematical concept. If you don't like where it is, recalculate WAR for yourself, using a different baseline. Or, if you don't believe that the concept is useful, there is a WAA (Wins Above Average) tool right on this very website!!! Use that instead. I just don't understand why anyone would have a problem with WAR.

    The basic premise is this:
    The goal of baseball players is to win games, and the best players help teams win the most games.
    There is a direct correlation between runs scored/allowed and games won/lost.
    Although we cannot give complete credit to individual players for individual wins, we can give players partial credit for runs.
    Once we have assigned credit for runs, we can compare that number of runs to the number of wins to which that number of runs correlates, and assign that player a number of "wins."
    We can then compare that number to the number of wins a different player achieved, or the number of wins a hypothetical player (average, All-Star, replacement, whatever) would have won, had he played for the same amount of time.

    I ask: what about this is nonsensical? It seems extremely reasonable to me. I don't see what's objectionable - but anyone who does object to WAR, please explain what about WAR it is you don't like. I would love to know.

  20. Thomas Says:

    I'm not a fan of WAR, but I'm not anti-WAR, I don't like it so I just don't use it.

    To answer your question Dr. Doom, explain to me how one gets a WAR? My (personal) problem with it is that I can't watch someone aquire a WAR. I can watch a pitcher give up (justly or not) an earned run, or watch a batter steal a base, or hit a double, but I can't watch tonights Phillies game and see Chase Utley gain a WAR. I also don't really understand the math behind it or like the idea of the idea of the so called replacement player, or average player.

    But as I stated before, I'm not one of those people who says, I don't like it therefore it's worthless and should be abandonded. I just chose to use, look up, and follow other stats. No big deal.

  21. Richard Chester Says:

    Reply to post #19.

    I am a WAR doubter. Would you explain exactly how Rbat is calculated?

  22. Dr. Doom Says:

    True, Thomas, you can't watch a player acquire a WAR. You also couldn't watch Cal Ripken, Jr. play 2184 games in a row; you couldn't watch Hank Aaron hit 755 home runs - not all at once, anyway. Because they accumulate over time. Sure, you may have seen a few, you may have seen components that lead to the total, but you couldn't see the whole thing. In fact, you can see the "record-breaker," but even that game/strikeout/home run/win/whatever is meaningless without each one that came before it. WAR is similar, in that what you'll see is the components.

    Roughly speaking, "10 runs equals one win." I hope that makes sense. If not, add or subtract ten runs to/from a team's total (scored or allowed), recalculate their Pythagorean record, and you'll see - ten runs (or maybe eight in the deadball era, or possibly closer to nine or nine-and-a-half nowadays) equals one win. Therefore, every time a player combines to create or prevent ten runs, he alters his team's record by one game. If that concept is clear, that's like 90% of the battle.

    The rest is pretty simple: we calculate (however you choose - you can use a run estimator, you could use OPS, you could use wOBA - each of these has a varying degree of 'accuracy') how many runs the player created, and we use a defensive metric to determine how many runs he saved. That's pretty much it.

    Now the rub is, if you stop there, you're comparing the player to zero. That may not be a problem for you. That's what Bill James does in Win Shares. The issue is that you're comparing the player at hand to zero. In other words, "this is you, compared to if we didn't play a shortstop/centerfielder/whatever at all. That's obviously not the case, so you have to compare them to some sort of a baseline. Now, you could choose average. If this player weren't to play, and you put an average player in his place, how many runs would that player create/save - and thus how many wins? The problem with comparing people to average is that there aren't just a whole bunch of average players lying around, waiting to play. So you have to find out what the level of player you'd get would be - that's the replacement player.

    Now, b-ref chooses 52 wins as replacement level. That's certainly reasonable. I mean, after all, if a AAA team assembled of random players on the cusp of a call-up came up and played a full season, how many games would they win? Certainly more than zero. Look at the Astros this year, or the Mets of 1962, or any of a host of other awful teams. Many of those teams were at or even BELOW what we'd expect a AAA team might do. Anyway, whatever you believe THAT level to be, that's what replacement is. I like the idea of a 61-win replacement team - that is, a team that's 75% of average. On the other hand, some people think that the 1962 Mets are as perfect an example of a replacement team as there ever could be, and they finished .250, so that's another reasonable guess. Anyway, once you figure out what that level is, you can reverse-engineer how many run (and then wins) you'd expect to get out of a player (of the same position as the player you're talking about) on that team. You subtract, and voila! The remainder is the player's WAR.

    So, ultimately, you can't watch someone "get" a WAR. You "can watch a pitcher give up (justly or not) an earned run, or watch a batter steal a base, or hit a double," as you say, and those are all components of WAR. So the cop out is, "every time a player accumulates (roughly) 10 runs, (depending on ballpark, era, etc.,) by either creating or preventing those runs, he accumulates a WAR." Maybe that'll satisfy you, maybe it won't. But personally, it makes sense to me. Because at least with WAR, you have a consistent framework by which to compare players, instead of just subjective judgments that change at the chooser's whims.

  23. Dr. Doom Says:

    That last post was @20. Sorry - it took me a while to compose, and I hadn't seen 21 yet.

    I believe Rbat is a pretty simple linear weights calculation, based on the number of walks, hit-by-pitches, singles, doubles, triples, and homers, with outs made subtracted, and adjusted for ballpark. The weights are probably something like .3 for BB, .5 for 1B, .7 for 2B, 1 for 3B, and 1.4 for HR, with -.25 or so for batting outs. That's just a guess, though. But that's the basic idea. Then you adjust for the ballpark, and that should pretty much do it.

  24. James Kunz Says:

    The reason I don't like WAR is because two different websites calculate it differently. And this website changed how it calculated it. Last year Albert Pujols' 2003 WAR was over 10. Now his 2003 WAR is 8.3. You don't see something untrustworthy about a statistic which changes not only between different websites, but also within them?

  25. Jim Says:

    Reply to #19.

    As I said -- I object to it because its doesn't represent anything meaningful to me. Its just a correlation of a players stats against some other numbers which is then correlated to wins. I'm not saying that any other stats are 'perfect' and capture the intrinsic value of one player against another, but WAR seems to try unsuccessfully to answer the very uninteresting question of "how better off (more wins) is this team with this player vs an average Joe"? Does anybody believe Josh Hamilton is so far down in the pack with a 3.0 oWAR and that this number represents him compared to his peers?

    I also believe it has little predictive value because identical numbers in one situation may yield a different WAR in another situation. Not all runs are equal and sometimes not all wins are even equal. When you watch games and see a player routinely fail to get a hit with less than 2 out and a RISP in close games -- but then launch a 3 run HR in the 8th inning of a pile-on 12-1 rout you understand the folly of trying to quantify a players worth with a single number - which is probably why I am not a SABRmetric enthusiast. This proliferation of more 'clever' stats seems to provide us with more data but not more useful information.

    The beauty of baseball is that its an extremely situation-driven game. As Yogi said "you can observe a lot just by watching".

  26. David P Stokes Says:

    @ # 17: If you invent "WAN" ("wins above nothing") haven't you at that point just re-invented Win Shares?

    @ # 20: Thomas, Dr Doom already posted a pretty good reply to your question, but to put it more concisely, you can't watch a pitcher compile a 3.45 ERA either. You can watch him give up a certain number of earned runs, but then you have to use math to calculate his ERA from his earned runs allowed and his innings pitched. Just because the math for WAR is more complicated doesn't make it any less valid.

    FWIW, I have some reservations about WAR, too, but the fact that it requires a lot of mathematical calculation isn't part of my concerns.

  27. Corey Wamer Says:

    Okay, but if the “replacement level” is 40 wins, giving every player a higher WAR, then the calculation (replacement wins + sum of player’s WAR) would still come close to the actual team wins. This is because of the use of linear weights, which weighs everything against the league’s actual wins. Wherever the replacement level is set (20 wins, 40 wins, 52 wins or 81 wins) the team totals, added to the “replacement wins” will add up to the actual wins. If you include actual wins in the calculation, you’re going to end up with something that resembles actual wins. It is a circular calculation. Of course, if the replacement level is set too low, then the wins above replacement become ridiculously high (10 to 20 wins for an individual player), or if the replacement level is set too high, the WAR become smaller, making the difference between players fractions of games, too small to make any kind of declarative statement about who is better than who. Even at 52 (.320) the spread is still too small (Does anyone really think that Denard Span is a .1 wins player better than Albert Pujols? Would you trade Pujols for Span? Just look at their salaries, no one who has to write the paychecks believes it.).

    Furthermore, there isn’t even an actual “league replacement” level. It doesn’t exist. Players do not play for a league, they play for teams. Players compete against the opposing team, and they compete against their teammates to keep their jobs. They do not compete against the entire league. For example, to say that Mike Trout has been worth 7 wins above the “replacement player” is not anywhere near true statement. Beyond the fact that the 7 games is determined by the .320 winning percentage seemingly plucked out of thin air, the “replacement player” for Mike Trout would not be a .320 player, the Angels wouldn’t tolerate that. Peter Bourjos, Vernon Wells or Torii Hunter would be playing centerfield, and they are nowhere near .320 players. In order to accurately determine what a player’s value to a team in terms of actual wins, every player should be compared to the next player in line, not against some mythical “replacement player”.

  28. Dr. Doom Says:

    Okay, Corey (@27)... but how is it fair to compare Mike Trout to Torii Hunter, but then to compare Andrew McCutchen to Nate McLouth? The baseline for everyone becomes different, and you lose any value in comparison. So Wally Pipp was a below-replacement 1B because Lou Gehrig's behind him, while Marv Throneberry in 1962 is above replacement because he's getting compared to a 38-year-old Gil Hodges? That just doesn't make sense. You HAVE to use a mathematical concept, because it's the only way to judge players on different teams.

    @25 - Jim, I appreciate that you don't see WAR as meaningful. I really do. But I DO get why Josh Hamilton's oWAR is so bad (actually, it's really quite good; it's just that there are a lot of players higher). Since June 1st, he's hitting .221; he has only 10 HR in 211 ABs - and one every 21 ABs is just not very good for a player of his ability; he has only 39 RBI in 57 games, which, while a respectable 110-RBI/162-game pace, many of his peers average more. So basically, you have a guy who's been a below-average Major League hitter for the majority of the season, while keeping his RBI up by playing for the highest-scoring team in the best offensive park in the American League. Um, no - I don't think he's been nearly as good as Austin Jackson or Robinson Cano or Adam Jones or Edwin Encarnacion, even though they're not as big of "names" as Hamilton is.

    As to your second paragraph, there are a lot of problems with what you've said. For example, "When you watch games and see a player routinely fail to get a hit with less than 2 out and a RISP in close games -- but then launch a 3 run HR in the 8th inning of a pile-on 12-1 rout you understand the folly of trying to quantify a players worth with a single number." The problem is, most players fail most of the time with less than 2 out and a RISP. We know this because players don't hit .600 in that situation. Looking at any study on clutch hitting online, you'll find that it exists to a degree, but that degree is small. And, even though A-Rod is a "bad" clutch hitter (because he's worse than usual in clutch situations) and Derek Jeter is a "good" clutch hitter (because he's better than usual in clutch situations), Alex Rodriguez still hits better in clutch situations than Derek Jeter. Because even though Jeter gets better and A-Rod gets worse, A-Rod is SO much better to begin with that the difference ends up being minor in comparison to their overall abilities. The fact of the matter is, it's physically impossible to watch enough baseball to know who the good clutch hitters are and who the bad ones are just by watching. In fact, it's really impossible to even tell who's good and who isn't just by watching - there are just too many teams playing too many games. So we need stats.

    Besides, your point about the hitter who hits meaningless homers - well, that means that his RBI count and his HR count are BS as well. And if even those "basic" stats are meaningless, then how do you compare players to one another? Tom Tango posted something on his blog the other day that said (basically) that we all create our own WAR. Because we all rank players in our minds. If I ask you, "who's better, Mike Trout or Yuniesky Betancourt?" you'll have an answer. It's an answer you'll have to derive from stats. Which means you use some stats, put them together, and arrive at an ordinal conclusion. That's exactly what WAR does, only it's systematized, so you can't BS in your method and make HRs count more for one player and less for another.

  29. Dr. Doom Says:

    @24

    James, I feel your pain. This is actually something I TOTALLY can empathize with, more than any other point "against" WAR. But there are a couple of ways that I hope I can help.

    First, look here:
    http://www.baseball-reference.com/about/war_explained_comparison.shtml

    Right on this very website, there's a wonderful chart that shows the differences in the WAR calculations. And this page gives you more in-depth descriptions of how this website does its calculation:
    http://www.baseball-reference.com/about/war_explained.shtml

    The important thing to remember about WAR is that it's not really a "stat." It's a framework. You can calculate it differently using different methods. Because there's no sure-fire, 100% predictive, perfectly accurate model for offense, defense, or pitching, we use different simulators. Now, you can study those different systems, decide which is the best, and use it. You can go find the method in each sub-category you like best and calculate your own WAR. Or you can just trust that each of the Big Three WAR systems (rWAR or bWAR, as this site's is called, fWAR on Fangraphs, and WARP on Baseball Prospectus) do a good job of calculating this system, and since they're rarely all that different, it doesn't make TOO much of a difference which one you use.

    It's kind of like anything else, though, in that people can make more meaningful distinctions out of small numbers than they should. For example, if the league leader (let's call him Jones) leads the second-place guy (let's call him Smith) in HRs, that means he's clearly the best HR hitter. But what if that lead was only 39 to 38? Is that really meaningful enough that you can say for certain that Jones is a better HR-hitter than Smith? I would say "no." Likewise, with WAR, anything within (say) half of a win is probably not all that meaningful. So if the leader laps the field and has two more WAR than anyone else, it's pretty obvious he's the best player; if he only leads by .1 WAR, it's more debatable. So, you check the different systems online, and you get a good idea. No, it's not exact. But either is looking at AVG, RBI, and HR and cobbling them together somehow (and non-mathematically) and "deciding" who's best. So it's not perfect, but anyone who uses WAR responsibly will know what its pitfalls are, and will know what makes a meaningful difference and what does not.

    I hope that helps. You might find that you like WAR if you give it a shot, and especially if you try to learn more about it. It's a great tool, and it gives a really good snapshot of the season in question.

  30. James Kunz Says:

    Thank you for your cogent response Dr. Doom. If everyone were like you, there wouldn't be such a contingent of WAR haters, and we wouldn't always have this big WAR war.

    Unfortunately I think people have a tendency to look for THE ONE statistic which can compare different players' abilities. It once was AVG, then RBI, then when I got into baseball OPS had a huge vogue, and now it's WAR. So people like me react against these small-minded people who reduce every argument down to "Oh, but Jones' WAR was 5.6 while Smith's was 6.4 and thus Jones didn't deserve MVP."

    In short, if everyone realizes WAR is a tool to be used in concert with every other statistic, instead of the end all be all, I'll stop getting so annoyed with it.

  31. Corey Wamer Says:

    You're correct. It doesn't make sense, which was my point. It doesn't make sense because the “replacement level” is different for different positions on different teams. There is no such thing as a “league replacement level”. Just because it doesn’t make any sense, we throw up our hands and say “we have to use this construct”?
    There are various “mathematical constructs” that could be used (Bill James’ Win Shares comes to mind). I’m not against using mathematical constructs, as long as the line drawn between players makes sense, which you’ve admitted that WAR doesn’t.

    WAR makes a declarative statement, that Mike Trout is 7 wins better than the “league replacement player”. The logic used to underpin that conclusion is dubious at best. The 7 games depends on where the line is drawn on the replacement level. It should be called “estimated wins above assumed replacement level” or something of that nature. We can, with high degree of accuracy, find the number of offensive runs created by every major league player. WAR is nowhere near that accurate, (which everyone seems to admit) and to claim it is accurate by reverse calculating wins when team wins (or league wins) is in the calculation (by using linear weights) is indefensible. If you’re saying that Mike Trout is worth 7 wins, and the methodology can’t be defended, what good is it? This is not about “hating” or “liking”. It’s about the methodology, which no one seems to be able to defend

  32. Dr. Doom Says:

    @31

    Waitwaitwait. I don't get it. You're okay with Win Shares and with Runs Created, but not WAR? WAR is the same, basically, as those two things. It just compares to a different baseline than zero. And it adjust for position, based on the number of runs expected out of a replacement player of that position. So it DOES adjust for that. I NEVER "admitted" that the lines drawn between players didn't make sense. They make perfect sense, and are based on mathematical models. I'm sorry, but I don't think we're even talking about the same things any more. What's the difference if Mike Trout is 7 wins better, or 5.5, or 6.2, or 8.7? Who cares? The idea that we should call it "estimated wins above assumed replacement level” - why? If you can't figure out that we're not dealing in absolutes, and you actually need that kind of clarity, I feel truly sorry for you. Should "The United States of America" be called "A Strong Confederation of individually governed units, originally designed for near-total autonomy but which have generally shifted toward a more federalist model over time?" No! That's ridiculous, although it's more accurate.

    Team wins is not in the calculation for WAR. Wins for individuals based on individual runs created (actually wRAA) is included. It's an estimate, which is why you can "reverse-engineer" wins. The fact that it comes out very close means that there's probably a lot of validity to WAR. My recommendation is that you read more about it before discounting it so much; it seems to me that although you complain and complain about "methodology," you don't actually understand the methodology of WAR. Perhaps if you read more about it, you wouldn't have so many hang-ups.

  33. Jim Says:

    Dr Doom, yes all stats have an element of BS to them. But that doesn't mean you can sum up many BS stats and come up with a ranking that has signicance.

    AVG, HR, RBI are subject to scrutiny but they are at least honest raw numbers.

    Jeter BTW, is a better clutch hitter than A-Rod. If you have a book on that that tells you otherwise, rip it up , set it on fire, and scatter the ashes. I'll base my analysis on 17 years of watching them.

  34. Dr. Doom Says:

    Situation: Leading off an inning (debatably a clutch situation)
    A-Rod: .307/.381/.581
    Jeter: .337/.390/.508
    Advantage: A-Rod, by a little.

    Situation: RISP
    A-Rod: .297/.399/.536
    Jeter: .303/.395/.423
    Advantage: A-Rod, by a lot

    Situation: Man on third, less than two out - you mentioned this one yourself
    A-Rod: .354/.397/.608; 713 RBI in 2387 PA (.298 RBI/PA)
    Jeter: .330/.391/.436; 349 RBI in 2085 PA (.167 RBI/PA)
    Advantage: Wow. Landslide for A-Rod.

    Situation: With two outs
    A-Rod: .288/.384/.578
    Jeter: .296/.382/.417
    Advantage: A-Rod... sensing a pattern?

    Here are some others, that b-ref defines as "clutch situations." A-Rod's better in all of them. See for yourself:

    ISplitPAABH2B3BHRRBIBBSOBAOBPSLGOPSTBtOPS+2 outs, RISP1290110533954823428168230.307.406.433.839478103Late & Close1570133838855432185171268.290.382.409.79154792Tie Game3673328310401491991282289526.317.379.457.8361500101Within 1 R61025437174025031150553503870.320.384.460.8452502103Within 2 R793070512239343411837796691140.318.383.456.8393213102Within 3 R917981602579399452089307771330.316.382.452.8353692101Within 4 R10027890928194505122410488551462.316.383.454.8374043102Margin> 4 R16401435419641424183166253.292.374.406.78058389Ahead4389383811722072677568435672.305.382.433.815166297Behind3605322310261582080381297517.318.384.454.8381464102Provided by Baseball-Reference.com: View Original TableGenerated 8/12/2012.

    I
    Split
    PA
    AB
    H
    2B
    3B
    HR
    RBI
    BB
    SO
    BA
    OBP
    SLG
    OPS
    TB
    tOPS+

    2 outs, RISP
    1342
    1102
    296
    41
    2
    57
    422
    209
    254
    .269
    .399
    .465
    .864
    512
    86

    Late & Close
    1492
    1268
    346
    67
    3
    82
    269
    187
    299
    .273
    .373
    .524
    .897
    665
    90

    Tie Game
    3115
    2650
    787
    154
    7
    176
    524
    393
    585
    .297
    .394
    .560
    .953
    1483
    102

    Within 1 R
    5539
    4719
    1422
    267
    14
    317
    953
    675
    1008
    .301
    .395
    .565
    .961
    2668
    103

    Within 2 R
    7290
    6250
    1859
    338
    17
    401
    1223
    862
    1334
    .297
    .389
    .549
    .938
    3434
    99

    Within 3 R
    8534
    7353
    2197
    395
    21
    484
    1452
    974
    1555
    .299
    .387
    .556
    .943
    4086
    99

    Within 4 R
    9400
    8117
    2428
    435
    25
    544
    1632
    1045
    1735
    .299
    .385
    .560
    .945
    4545
    100

    Margin > 4 R
    1634
    1434
    444
    74
    5
    100
    305
    159
    265
    .310
    .383
    .577
    .961
    828
    102

    Ahead
    4177
    3598
    1110
    189
    18
    249
    808
    465
    746
    .309
    .393
    .579
    .972
    2082
    105

    Behind
    3742
    3303
    975
    166
    5
    219
    605
    346
    669
    .295
    .368
    .547
    .916
    1808
    93

    Provided by Baseball-Reference.com: View Original TableGenerated 8/12/2012.

    I hope that table sharing thing worked... I've never tried it before!
    It's pretty clear to me that A-Rod is a better clutch hitter than Jeter, because he's a better hitter. Again, it's true that Jeter raises his game; it's true that A-Rod seems to shrink his. But improving Jeter by 3% and A-Rod being 3% worse doesn't come CLOSE to making up the difference between the two as hitters. I'm just not willing to believe that you've watched every game the two have played for the last 17 years. So I'm not going to throw out the numbers. You can go on believing whatever you'd like. But I don't think you have any numerical backing for it.

  35. Dr. Doom Says:

    Dang it! I messed up the tables. Just ignore them and click the "View Original Table" links. Whoops.

  36. Richard Chester Says:

    Reply to #23: Do you (or anyone else) know just where someone could find the exact linear weight factors for Rbat?

  37. Dr. Doom Says:

    @36

    I believe wRAA (which is used in Rbat) is based on wOBA weights.

    Try the Wikipedia article:
    http://en.wikipedia.org/wiki/WOBA

  38. Jim Says:

    @34 Dr. Doom-

    Thanks for the date but I think you can look at these numbers and try to support a conclusion but still get it wrong. Using your own referenced data just sticking to what 'they' call clutch stats:

    2 outs RISP: Jeter holds a .307 vs .269 career lead in AVG. 429 RBIs vs 422 RBIs in about 50 fewer PA. This is the singular minimally positive result in this situation --- get a hit and drive in a run. And it shows Jeter ahead. What it doesn't show is was this an important run in a close game -- which would make it more 'clutch'. With BB's A-Rods OBP is closer to Jeter's but I'm not looking for A-Rod to draw a walk in this case. In Jeter's case, maybe a little more receptive to a walk. I don't think SLG% is nearly as significant as you do because it will again depend heavily on the situation and a meaningless 3RUN HR will count as much as 3 game winning singles.

    Late&Close- at least we are talking about a situation here. Again Jeter leads .290 vs .273 on AVG.

    For the following situation you copied the numbers incorrectly:
    Situation: Man on third, less than two out - you mentioned this one yourself
    A-Rod: .354/.397/.608; 713 RBI in 2387 PA (.298 RBI/PA)
    Jeter: .330/.391/.436; 349 RBI in 2085 PA (.167 RBI/PA)
    Advantage: Wow. Landslide for A-Rod.

    Should read:
    A-Rod: .354/.397/.608; 496 RBI in 736 PA (.674 RBI/PA)
    Jeter: .356/.401/.464; 379 RBI in 588 PA (.645 RBI/PA)
    Advantage: nearly identical (A-Rods edge comes from HRing in that situation and scoring himself not in getting the man in from 3rd base more often -- back out the RBI's via the HR and they are both at .631 ).

    Man on 3rd and 2-outs --- the ultimate clutch spot if the run is important:
    A-Rod: .271/.413/.463; 221 RBI in 560 PA (.395 RBI/PA)
    Jeter: .310/.401/.404; 249 RBI in 573 PA (.435 RBI/PA)
    Advantage: Jeter

    So, I think there is numerical backup here that quite frankly I never bothered to look up until you showed it to me.

    Just for background, I have watched Jeter play in NY for 17 years. I have watched A-Rod everyday since he came over here and I consider A-Rod the most perfect hitting specimen of all time. Jeter to me was always 'really good, reliable, and solid'. As A-Rod said when he pissed off his ex-friend -- 'you never come into Yankee Stadium worrying about Jeter in the line-up'. And he is right. Probably because so many GMs, managers and fans were busy looking at his not so gaudy numbers and saying he's not even up there with A-Rod, Nomar, Tejeda at SS in the '90.

    That said, there is no way I would want to swap these 2 in a clutch situation (I don't even dare mention post season!). 2 out, man on 3rd, 1 run game, 9th inninng or later, postseason. You want Jeter in there. You do not want A-Rod in there. Now -- how do you compare these 2 guys using statistics when they are apples and oranges?

    I disagree with you point that "it's really impossible to even tell who's good and who isn't just by watching" . That's the fun part about sports.

    I do agree that " there are just too many teams playing too many games." but only to the point that one idividual can't watch them all. Stats try to fill the void but because they only capture a fraction of the game we should only rely on them for a fraction of our understanding of the game. So, if I come full circle back to WAR I don't think it adds to the understanding of the game very much. Maybe we should start simplifying ABs into a hockey-like +/- score --- "good AB" "bad AB"; was already have QS for pitchers?

    Who knows...

  39. Ricardo Says:

    I don't see what is so hard to understand about WAR. The point being that with many thousands of repetitions and many average players having come and gone over years of baseball played, a "baseline average" is the floor that is used, instead of the value of "zero", which quite frankly is impossible to use because baseball will always have some production from big league position players, regardless of how crappy they are.

  40. Corey Wamer Says:

    Yes, I am okay with Win Shares and runs created, but I have a big problem with using linear weights, which is the key component of the WAR calculation (see, despite your aspersions on my knowledge, {which in an academic discussion speaks to the personal character of the individual make the argument, but has little to do with the validity of the argument}, I do know something about what I am writing about here). Linear weights are not Runs Created, nor are they Win Shares. We can have that discussion, but the bottom line is baseball is not linear, it’s geometric. When one starts with a false premise, it’s no wonder why 231 different formulas are needed to get linear weights to “work”.

    The difference is the title of this blog was “The Relationship between WAR and Team Wins”, so it DOES matter whether Mike Trout’s WAR is 7 wins or 5.5 or 8.2. The argument was that the sum of a team’s player’s WAR plus the “replacement level” equals the team’s actual wins. So it matters what the individual players WAR score is. I never wrote that “the lines drawn between players didn't make sense”. We do that all the time. It’s THIS line which is drawn at a “league replacement level” and claims of accuracy because of formula construction which motivated me to post on this blog. The accuracy of WAR’s team totals is due the use of runs scored and runs allowed in the linear weights, and not for any other reason. The big difference in linear weights and runs created is that Runs Created DOESN”T USE RUNS AS A KNOWN QUANTITY. Win Shares STARTS WITH WINS, and divides the wins between offense and defense and then between pitching and fielding on the defensive side, and then distributes the win shares to the players. It DOESN’T use the league average, nor does it “adjust” elements based on the league (another problem with linear weights, for example, is that a players statistics are “weighed” by what goes on in games they do not participate in, If other teams hit more singles, the value of that players singles decline. Does it make sense that player’s value go up or down based on what other players do in other games?).

    The “baseline” is a mathematical assumption. No one has explained why .320 is the replacement level. I explained what happens when the replacement level is too high or too low. Just picking a number because it “looks right” was the same criticism I had of Win Shares, a criticism Bill James readily admits. However, neither Win Shares nor Runs Created use linear weights, so therefore, in my opinion, they are superior. Using a formula that includes linear weights, whether it weights offensive or defensive elements against runs or wins (which are basically interchangeable, runs scored are wins, runs allowed are losses), invariably will result in the answer near the teams actual wins. It should be no big surprise.

    You did admit that calculating Wally Pipp’s value based on Lou Gehrig’s value (or the other player examples you used) “just doesn't make sense”. But it’s the only value that matters to a team. If replacement players were readily available, there would be a lot smaller win/loss spread between teams. I’ve looked at hundreds of really bad teams, and they have one thing in common. It’s not that they didn’t have any good players, they had multiple positions that, aggregately, were simply awful (Look at the 2003 Detroit Tigers and look at the totals for Catcher,3B, SS and RF). The really awful teams CAN’T FIND replacement level players.

    Since we are dealing with wins, we are dealing in absolutes, since we know absolutely how many games a teams wins or losses, or the number of runs scored or allowed. Sorry, but these are absolutes. And the “It's an estimate, which is why you can "reverse-engineer" wins”. comment escapes me. I truly don’t understand how because it’s an “estimate” it works. You can’t claim it’s an estimate and then claim it’s accurate. Estimates are just that, estimates. Rough calculations or approximations. Wins, losses, runs are not estimates.

    And the “United States” reference also escapes me. Our nation is not a mathematical construct.

    Finally, to repeat, the wRAA is not weighted directly against wins, it’s weighted against runs, which every sabermatrician knows runs and wins are basically the same. The more runs scored, the more wins. There are 231 different formulas in linear weights (a different one for every season), so it better be accurate. If offensive elements are adjusted yearly by the number of runs scored, and defense and pitching elements by the number of runs allowed, OF COURSE the sum of the formula products will resemble the number games won and lost.

    We could have an intelligent argument and agree to disagree, but not in today’s world. Anyone who does swallow the bilge that come down the pipe must be stupid. Life is too short, so I am done with this. I hope that I have shined a little light on the subject and inspired some critical thinking.

  41. admin Says:

    .320 is not made up. It is based on study by Sean Smith on how minor leaguers and bench players do when playing full time as replacements. The differences between leagues are calculated looking at player who move in season.

    Regarding linear weights vs. RC. We are going to have to agree to disagree. Linear weights are endorsed by nearly all of the major sabermetricians and even Bill James has come around to appreciate and use linear weights.

    As for 321 different formulas. These weights are dependent on the run scoring environment. More runners on base mean hits are more valuable, etc. etc.

  42. DJL44 Says:

    I don't think you necessarily need to have team WAR = team wins but league WAR should equal league wins (30*81=2430). If that isn't close then there is an error in the runs to wins conversion or you're leaving something out that matters.

  43. Neil Says:

    WAR is supposed to tell you ballpark estimate (no pun intended) of a player's value to a generic team.

    We attempt to make his stats as context-neutral as possible by adjusting for the park, league run environment, opponents faced, etc. Then we figure out the context-neutral runs he created (via hitting & baserunning) above/below what a league-average hitter would have created in the same circumstances, adjust for the fact that hitting standards are different for different positions, and account for defense by adding in the # of runs he saved/didn't save compared to an average player at the position.

    That gives you runs above/below a league-average player who played the same position(s). But, as has been mentioned in this thread, average players don't grow on trees. Putting aside whether he's a free agent/arb-eligible/etc, the economic reality of the sport is that a player's salary is determined by how much he can offer over what a "freely available" player can give a team. That's why the replacement level matters -- it's the level of production that you can buy with the minimum salary.

    Buying production above that level begins to cost a lot of money. So WAR is an important concept for teams because it tells them how much more production Player X/Y/Z will give them than a minimum-salary player would be able to, a number that -- in an ideal world -- would be directly tied to how much more than the minimum they are paid.

    It's not about the actual replacement on each player's specific team. It's about determining a player's context-neutral, generic market value based on how many more wins he can generate than a player making the league's minimum salary. While various sites estimate it different ways, the basic idea is always the same, because, economically, it's the most efficient way of viewing team-building: wins in and dollars out. And WAR happens to be a very good framework for quantifying that.

  44. Alicia Says:

    Is there any statistical real life evidence that shows any validity at all for WAR? I've never seen any.

  45. kzuke Says:

    alicia, if you read the original post and the following replies, you will see that there is a correlation between player WAR and actual team wins. in that sense, it may be more valuable that traditional stats such as average, rbi, HR, ERA, etc. (i.e. the "see-able" stats)

  46. Jim Says:

    Alicia, 'validity' is tricky to define. I would say you need to pose the question you are trying to answer (eg. "WAR is a valid indicator of how many runs a player adds to a team above a AAA replacement player") first. To say that it is correlated with team wins may be true but so might 'salary' 'yrs experience' 'height'.

  47. kk Says:

    @34

    The problem with "clutch" comparisons is that they are little more than observations. Yes, some players do relatively better or worse in a given situation. Frankly, it seems to come up as a panacea for poor play beforehand--ie praise the walkoff, ignore the 0-3 and dropped ball that led to the team being down in the first place.

    Let's say you have a bench guy who is supremely good when the bases are loaded. Great idea in theory. In practice, you don't know how often that will come up, especially if the rest of the team is not good at getting on base. Players universally do not hit as well pinch-hitting (DH similarly has a negative effect on performance), so you have to take that into account. Plus the player he's PHing for. Presumably the other team manager has a pulse and will bring in the reliever who is best at stranding runners.

  48. mosc Says:

    If you want to evaluate WAR, do it in comparison with other stats. Simple runs scored and runs allowed gives a good benchmark. It's because of this that I tend to favor RE24-type stats for players since they are tied to the only stat in baseball that really matters, runs. If you want to prove that WAR has ANY statistical value, just take a sampling size of say a decade and see if it predicts W-L records more accurately than runs scored vs runs allowed. Not rocket science.

  49. Bob Stanton Says:

    I remain suspicious of WAR. Babe Ruth has the highest WAR rating. I buy that. Barry Bonds is #2 and Willie Mays is #3. Who would you rather have on your team? Mr. Potato Head or his godfather.

  50. kzuke Says:

    bonds had more runs, HR, rbi, sb, bb, a higher obp, ops+, and won five more mvp awards. the guy produced. give me one reason why mays was a more productive player. better fielder?

    i understand your point. bonds is a dick and a "cheater," but WAR doesn't measure that. it's completely objective.