Sports Reference Blog

Archive for the 'Stat Questions' Category

What’s a Home Game on Baseball-Reference.com? HTBF?

6th August 2020

With Major League Baseball making a mad dash to complete the 2020 season, a number of norms and standards have gone by the wayside this season. Due to postponements, cancellations, and Canada's need for a quarantine of those playing America's Pastime, MLB has been forced to schedule what they've considered home games to be played on the road. In these games, the host team bats first and they often go through the charade of wearing their road unis while the traveling team wears their home whites. We handle these games in a certain way and this has led to confusion as to what the home and road records and splits represent on Baseball-Reference.com.

Our policy has been and remains that a team playing in their home park is the home team regardless of whether they bat first or second (we call these Home Team Batted First or HTBF). We feel that home and visitor refers to location and not batting order. In a neutral site game (of which there have been very, very few), the home team would be the team to bat last. Since 2007, there have been 19 games where the home team batted first, those are listed below.

Read the rest of this entry

Posted in Academics, Baseball-Reference.com, Ridiculousness, Stat Questions, Statgeekery, Uncategorized | 1 Comment »

Launching Stathead

27th April 2020

If you haven't read it already, please read Mike Lynch's rundown of our new Stathead/Baseball service. I'm going to lay out some of the background for this change and explain some of the changes.

As I laid out in our post from early March, we are making changes to our Ad-Free and Play Index products.

Here is the thrust of what we said in March.

So we are making some changes. The Play Index for each site will be moving to Stathead.com. Stathead.com will become the center for all of our subscription products. We expect these products to include tools and information beyond just a redesigned set of Play Index tools. This won't happen all at once, but we'll start with baseball and then proceed through the remainder of our sports. Also, we will be ending our ad-free product and instead Stathead memberships will have ad-free built-in. There just aren't enough users to justify a separate ad-free product. These changes will begin this month and continue through April on baseball and then continue with the other sites after that.

If you are a subscriber, we will make every effort to make certain you are happy with the options we provide to convert your ad-free or Play Index subscription over to Stathead including the option of a refund on your subscription. You will be hearing more from us about the changes over the next few weeks as we will email users directly.

If you've looked at the cost of Stathead/Baseball vs the Play Index, you'll notice we've gone from $36/year (+ $20/year for ad-free) to $8/month. I realize this is a significant increase. As I said in my original post, we are extraordinarily reliant on ad revenue. Back in early March this seemed problematic. Now with the complete collapse of the advertising market it has the potential to be lethal. If you don't block our ads, you may have noticed that we now have more ads on our pages. This is in response to the downturn in ad revenue. Sports Reference is doing fine right now, but if we want to continue to succeed and also be aligned with the needs of our users, a healthy stream of subscription revenue is vital.

We also feel our products warrant this price. The only comparable products to our Stathead tools come from Elias and STATS LLC and would cost you $10,000+ a year to subscribe to. You could create your own from Retrosheet data, but that would probably take more than $8/month of your time to maintain.

We are using monthly billing for at least the first few quarters, so that we can monitor more directly the success we are having in recruiting and maintaining subscribers. We have discussed adding an annual billing option in the future.

For the time being, we will be maintaining both the legacy Play Index site (which has been free since the start of March) and the new site, but before too long we will take down the old Play Index site, probably late May. We are also working on converting the other Play Index sites. First will be hockey and then probably basketball after that.

We realize there aren't games being played and that you might be facing your own financial challenges at this time. Therefore, we are offering the first month free for all users. And then, until the leagues start playing games, we will be giving users the option of claiming additional free monthly subscriptions. We'll provide more details on the latter plan as we approach the time for subscriptions to be renewed.

If you are a current subscriber, we will be emailing you with information about how we will be converting your subscription to the new system and of course, we will provide your money back if you are unhappy with the conversion to Stathead that we are offering you. Our goal is to give you a more than fair deal and see you join us on stathead.com.

Please feel free to reach out to us if you have questions or concerns.

--sean forman

Posted in Advanced Stats, Announcement, Baseball-Reference.com, Stat Questions, Statgeekery, Stathead | 14 Comments »

Introducing the WNBA Player Season Finder

10th August 2018

Regular Basketball-Reference users are well acquainted with the Play Index, which allows us to compare players across eras and slice and dice season-level data by many criteria. Today we are now introducing the WNBA Player Season Finder, which will be accessible from both the Play Index page and from our WNBA home page. We have WNBA stats back to the league's inaugural 1997 season, which means you can now search all of WNBA history with this tool.

Just like our NBA Player Season Finder, with the new WNBA tool you can do single-season, combined season and total season searches. For example, with the combined season search, you can now create franchise career leaderboards, maybe to see how far ahead in first place Tamika Catchings is among point scorers in Indiana Fever history. Or with the total seasons search, you can now execute a search like players with the most qualified seasons of 2 blocks per game; Margo Dydek and Lisa Leslie lead with nine seasons each finishing with that mark in their career.

Of course, current season stats are also searchable with the Player Season Finder, so you can give them some perspective with past stats. A'ja Wilson is burning up the league in her first WNBA season, currently averaging over 20 points per game. Here's a look at the others in WNBA history who finished with 20 points per game in their rookie season.

Query Results Table
Tota Tota Per Per Per Per Per Per Per Per Per Shoo Shoo Shoo Shoo Shoo
Rk Player Season Tm Lg PTS G GS MP FG FGA 2P 2PA 3P 3PA FT FTA FG% FT% 2P% 3P% eFG%
1 Cynthia Cooper 1997 HOU WNBA 22.2 28 28 35.1 6.8 14.5 4.4 8.7 2.4 5.8 6.1 7.1 .470 .864 .508 .414 .553
2 Seimone Augustus 2006 MIN WNBA 21.9 34 34 33.1 8.3 18.2 7.4 15.7 0.9 2.5 4.4 4.9 .456 .897 .473 .353 .481
3 A'ja Wilson 2018 LVA WNBA 20.3 29 29 30.8 7.1 16.0 7.1 16.0 0.0 0.0 6.0 7.7 .446 .785 .446 .446
Provided by Basketball-Reference.com: View Original Table
Generated 8/10/2018.

Stay tuned for more additions to the WNBA section of our site here on the Sports-Reference Blog. If you have any questions or suggestions, feel free to contact us through our feedback form.

Posted in Advanced Stats, Announcement, Basketball-Reference.com, Features, History, Leaders, Play Index, Stat Questions, Statgeekery | 4 Comments »

A Discussion of WAR Wherein I Ardently Attempt to Avoid any WAR-Related Puns

21st November 2017

This article assumes a lot of prior knowledge about the discussion of Wins Above Replacement, you can catch up here

First off, none of us are here without Bill James. We are all at our very best merely Chaucer or Joyce to his Shakespeare. All sabermetrics predating him flowed into his work and all sabermetrics after him carries echoes of his work.

To the discussion at hand.
Read the rest of this entry

Posted in Academics, Advanced Stats, Baseball-Reference.com, Stat Questions, Statgeekery, Trivia, WAR | 9 Comments »

Baseball-Reference Minor League Data Updates and Corrections

1st November 2016

Our historical performance data for professional leagues (affiliated minor leagues, independent minor leagues, fall/winter leagues, and other international leagues) is provided by and licensed from 24-7 Baseball and Chadwick Baseball Bureau. It incorporates the work of many stalwart baseball researchers, including Cliff Blau, Art Cantu, Frank Hamilton, Reed Howard, Kevin Johnson, Bob McConnell, Jack Morris, and Ray Nemec, as well as members of the Minor Leagues Committee of the Society for American Baseball Research. Perhaps most importantly, it builds upon the seminal work of Ed Washuta, who magnanimously provided the framework to make the whole thing possible.

As licensors of this data, Sports Reference LLC is not in a position to update and make corrections to the dataset ourselves. All proposed corrections are passed along to the Chadwick Baseball Bureau. Their focus is on making corrections for post-1960 data first and then pre-1960 data as time permits. This is largely due to the sheer scope of the project, but also for economic reasons. The economic value to us for information such as the 1929 California State League is minimal and likewise the market for licensing such data is effectively zero. We wish to have the most accurate datasets we can, but must operate within the economic constraints of what the market will bear.

We realize that this may mean that certain issues and errors may linger on the site for months or even years and apologize for all such errors.

You can use the Chadwick Bureau website to send corrections or changes to the data.

Posted in Announcement, Baseball-Reference.com, Data, General, Stat Questions | 9 Comments »

What the Heck is Corsi? A Primer on Advanced Hockey Statistics

13th October 2016

Good news for fans of zambonis, fighting, and the greatest video game of the 1990s: the NHL has finally returned! After a wild season last year, there are all kinds of juicy storylines to follow this year. Can the Pittsburgh Penguins become the first back-to-back Stanley Cup winners since the Detroit Red Wings of the 1990s? How will the San Jose Sharks bounce back from coming so close and falling short. Will Alex Ovechkin reach 1,000 goals? Can Connor McDavid build upon a promising rookie year and live up to the hype? What round of the Eastern Conference Playoffs will the Washington Capitals be eliminated in this year (I kid, I kid)?

This blog post will seek to answer none of those. Instead, this week, I wanted to dig into one of the major trends that's been sweeping across the NHL the last few years, among fans and front offices alike. I'm talking, of course, about the rise of advanced statistics.

If you're a sports fan, you're probably at least vaguely familiar with Moneyball and the advanced stat wars in baseball. And you may have read articles about how thinkers in other sports, like basketball, have used similar principles to deepen their understanding of the game. This movement has reached hockey in recent years, as researchers have uncovered several new ways of understanding the game beyond the traditional stats like goals, assists, and plus/minus. These new analytics can help us understand why a team is over or under-performing, and whether that performance is sustainable. They can also help us appreciate unsung players who do more for their team than we may realize, because they don't put up flashy traditional numbers.

So, with that in mind, here's some of the basics to get you started in the world of advanced hockey stats. Read the rest of this entry

Posted in Advanced Stats, Announcement, Hockey-Reference.com, Stat Questions, Statgeekery | 2 Comments »

Explaining our Handling of “Holds”

24th February 2016

UPDATE (Feb. 25, 2016): MLB has informed us that they will be updating Brach's 2015 holds total to 15 (matching us). MLB's Cory Schwartz commented: "We do credit Holds whenever the pitcher enters in a Save situation and leaves with the lead intact, so this was an oversight on our part."

It recently came to our attention that for the 2015 season, we credited Brad Brach with 15 holds. MLB, meanwhile, credited Brach with just 14 holds (NOTE: After reading this post, MLB has agreed that 15 is the correct number of holds for Brach in 2015). It was discovered that the difference was in the handling of the Orioles 5-4 win over the Mariners on May 21. Before we jump into the details, let's examine MLB's definition of a hold (bolding is ours, for emphasis):

"The hold is not an official statistic, but it was created as a way to credit middle relief pitchers for a job well done. Starting pitchers get wins, and closers -- the relief pitchers who come in at the end of the game -- get saves, but the guys who pitch in between the two rarely get either statistic. So what's the most important thing one of these middle relievers can do? "Hold" a lead. If a reliever comes into a game to protect a lead, gets at least one out and leaves without giving up that lead, he gets a hold. But you can't get a save and a hold at the same time."

UPDATE (Feb. 26, 2016): Please see MLB's updated Holds definition here

As you can see, this isn't really much of a definition at all. There's little in the way of criteria here, and it's also pointed out that the statistic isn't even official, anyways. In fact, there's enough confusion that MLB.com credits Cory Rasmus with 2 holds in 2015, but Elias (MLB's official statistician) credits him with 1 hold in 2015. We credit him with 2, for what it's worth. This "definition" provides enough room for interpretation that variance in recorded totals is not uncommon.

Being that the statistic is unofficial, explaining all of this might be a pointless exercise, but in an effort to be transparent, we at least want to point out what standard we are using to assign holds.

Our standard is to give a pitcher a hold any time they protect a lead in a save situation (meaning they could have been eligible for a save if they finished the game). Brach presents an interesting study in that May 21 game. Starter Chris Tillman pitched 3 innings and left with a 4-1 lead. Obviously, he was not eligible for the win due to Rule 10.17(b), as he did not complete 5 innings. Tillman was relieved by Brian Matusz, who allowed 2 runs in the 4th, but completed the inning of work and left the game leading 4-3, when Brach took the mound for the 5th inning. Brach completed 2 scoreless innings, but the Mariners tied it up in the 7th after Brach left the game. The Orioles eventually won the game.

With the benefit of hindsight, you could say that Brach would have been in line for the win (not the save) if he had finished the game, since he ended up being more "effective" than Matusz, which would make it nearly a lock that the official scorer would have given him the win. But, hypothetically, Brach could have given up 20 runs in relief, but maintained the lead, and earned the save (with Matusz getting the win). As unlikely as that scenario is, the point here is that we're not using hindsight in assigning holds. In our opinion, the opportunity for a hold is defined when you enter the game and is only removed retroactively if you are given the win.

To be as clear as possible: our policy is to credit a hold when a pitcher enters the game in a save situation and leaves with the lead (and is not later given the win by the official scorer).

As we bolded in MLB's definition of a hold, "If a reliever comes into a game to protect a lead, gets at least one out and leaves without giving up that lead, he gets a hold." It would sure seem to us that Brach's May 21st appearance fits that criteria.

2019-07-09 Update on Long Holds

We were silent on this issue earlier, but we do not give a hold in the situation where a reliever is only in line for a "long save". A long save would be the type where they pitch three innings with the lead to end the game. For example, on June 29th in London Yankee Nester Cortes entered a 14-6 game in the bottom of the 4th with two outs, and a runner on first. This is not a standard save situation. He then pitched three innings in relief and did not relinquish the lead. If he'd gone on to complete the game he would have received a save, but his appearance did not begin as a save situation as the save is dependent on him pitching three innings. We do not include these situations as save situations and do not credit holds in these cases. This is true of most record keepers, but we are aware that MLB gameday did give Cortes a hold in this situation.

This wonderful Hardball Times article spells out the many differences in how holds and blown saves are calculated. It turns out no two sources agree on any of the league totals for holds or blown saves.

Posted in Announcement, Baseball-Reference.com, FAQ, Ridiculousness, Stat Questions, Statgeekery | 10 Comments »

4 Matchups That Will Decide the 2015 NFL

10th September 2015

There will be 256 regular season games in the NFL this year and within each of those games, there are countless matchups. However, some matchups matter more than others. It's not likely that the AFC champion will be decided by how Tennessee's secondary contains Jacksonville's WR, for example. With that in mind, here are 4 matchups that could end up deciding this year's Super Bowl champions. Read the rest of this entry

Posted in Pro-Football-Reference.com, Stat Questions, Super Bowl | Comments Off on 4 Matchups That Will Decide the 2015 NFL

SRS Calculation Details

3rd March 2015

One of the more common subjects for queries we receive at Sports-Reference is our SRS (Simple Rating System) figures. For some background, the first of our sites to add SRS was Pro-Football-Reference, when Doug Drinen added it to the site in 2006 and provided this excellent primer. The important thing to know is that SRS is a rating that takes into account average point differential and strength of schedule. For instance, the 2006-07 Spurs won games by an average of 8.43 points per game and played a schedule with opponents that were 0.08 points worse than average, giving them an SRS of 8.35. This means they were 8.35 points better than an average team. An average team would have an SRS of 0.0. The calculation can be complicated, but the premise is simple and it produces easily interpreted results.

However, there are some variations in the way we calculate SRS across our various sites. We'll break down these differences below.

Pro-Football-Reference.com SRS: PFR's SRS is unique in that a home-field advantage is included as a part of the calculation because of the short schedule compared to the other sports (we don't want a team to look relatively weak at the halfway point because they've only played 3 of their first 8 at home, for instance). This HFA fluctuates yearly based on game results, but it is generally somewhere between 2 and 3 points (2006 being an outlier, as you'll see). Below is a look at the HFA numbers we have used since 2001. If you'd like to calculate these HFAs yourself, just sum up every team's home point differential and then divide by the total number of games played across the league that season. This data can easily be found in the Play Index for each season:

  • 2001: 2.0081
  • 2002: 2.2461
  • 2003: 3.5547
  • 2004: 2.5078
  • 2005: 3.6484
  • 2006: 0.8477
  • 2007: 2.8672
  • 2008: 2.5586
  • 2009: 2.2070
  • 2010: 1.8945
  • 2011: 3.2656
  • 2012: 2.4336
  • 2013: 3.1055
  • 2014: 2.4883

College Football SRS: Our CFB SRS does not contain a home-field advantage element, but it does have some other quirks. Most importantly, we have capped the margin of victory considered for the formula. Due to the number of mismatches seen in college football, the maximum point differential a team can be credited with in a game is 24. We also credit all wins as a minimum of plus-7 margin of victory (so if you win by 1 point, it's treated the same as a 7-point win). The same logic is applied to losses, as well. One other wrinkle for CFB is that all non-major opponents are included as one team for the sake of the ratings.

College Basketball SRS: SRS for college hoops is straight forward (no HFA & no adjusted MOV), but one item to note is that games against non-major opponents are not counted in our calculations.

MLB, NBA & NHL: All of these SRS calculations are straight forward with no adjustments for HFA and no capping of MOV. It should be noted, however, that no special consideration is given for extra-innings, overtimes or shootouts, either.

We'll close with a quick rundown of the various merits and weaknesses of SRS, from Drinen's original 2006 post. These bullet points were created to describe the system used for NFL SRS, but many of the strengths and weaknesses can applied to the other sports, as well:

  • The numbers it spits out are easy to interpret - if Team A's rating is 3 bigger than Team B's, this means that the system thinks Team A is 3 points better than Team B. With most ranking algorithms, the numbers that come out have no real meaning that can be translated into an English sentence. With this system, the units are easy to understand.
  • It is a predictive system rather than a retrodictive system - this is a very important distinction. You can use these ratings to answer the question: which team is stronger? I.e. which team is more likely to win a game tomorrow? Or you can use them to answer the question: which of these teams accomplished more in the past? Some systems answer the first questions more accurately; they are called predictive systems. Others answer the latter question more accurately; they are called retrodictive systems. As it turns out, this is a pretty good predictive system. For the reasons described below, it is not a good retrodictive system.
  • It weights all games equally - every football fan knows that the Colts' week 17 game against Arizona was a meaningless exhibition, but the algorithm gives it the same weight as all the rest of the games.
  • It weights all points equally, and therefore ignores wins and losses - take a look at the Colts season. If you take away 10 points in week 3 and give them back 10 points in week 4, you've just changed their record, but you haven't changed their rating at all. If you take away 10 points in week 3 and give back 20 points in week 4, you have made their record worse but their rating better. Most football fans put a high premium on the few points that move you from a 3-point loss to a 3-point win and almost no weight on the many points that move you from a 20-point win to a 50-point win.
  • It is easily impressed by blowout victories - this system thinks a 50-point win and a 10-point loss is preferable to two 14-point wins. Most fans would disagree with that assessment.
  • It is slightly biased toward offensive-minded teams - because it considers point margins instead of point ratios, it treats a 50-30 win as more impressive than a 17-0 win. Again, this is an assessment that most fans would disagree with.
  • This should go without saying, but - I'll say it anyway. The system does not take into account injuries, weather conditions, yardage gained, the importance of the game, whether it was a Monday Night game or not, whether the quarterback's grandmother was sick, or anything else besides points scored and points allowed.

 

Posted in Announcement, Baseball-Reference.com, Basketball-Reference.com, CBB at Sports Reference, CFB at Sports Reference, Data, FAQ, Features, Hockey-Reference.com, Pro-Football-Reference.com, SRS, Stat Questions, Statgeekery, Uncategorized | 2 Comments »

Fielding Independent Pitching (FIP) added to Baseball-Reference.com

17th April 2014

Last night, I added FIP (short for Fielding Independent Pitching) to the site. This is a sabermetric stat for pitchers that approximates ERA without the effect of their team's fielding ability. FIP actually correlates to future ERA better than ERA itself making it a superior indicator of future performance.

The idea is that the pitcher most directly controls the number of walks, home runs and strikeouts that occur and that the batters and fielders have a bigger say on whether balls in play are turned into outs and that most pitchers' Batting Average on Balls in Play (BAbip) reverts to a league average from one year to the next.

FIP is (13*HR + 3*(BB+HBP) - 2*SO)/IP + Constant(year). The constant is set so the yearly avg FIP equals the yearly avg ERA.

FIP can be looked at exactly like ERA and is scaled to exactly the same league average as ERA, but it's range will be slightly smaller.

Often a player with a low FIP and high ERA will improve, while a low ERA and high FIP indicates a likely regression as more hits start falling. I've placed FIP next to ERA to make this comparison more obvious, but if it begins making the ERA lookup too hard, I may move it further right on the pitching tables.

I've also added FIP and K% to the Play Index Season Finder. Right now, I don't believe we will add xFIP given the inconsistency in batted ball data, but that could change.

Player with big gap in FIP and ERA: Ricky Nolasco.

Posted in Advanced Stats, Announcement, Baseball-Reference.com, Most Wanted, Stat Questions | 10 Comments »