Sports Reference Blog

Who is the Best March Madness Coach in the NCAA?

Posted by Jonah Gardner on March 24, 2016

The Michigan State Spartans' defeat at the hands of noted powerhouse Middle Tennessee was a shocker for a number of reasons. MSU wasn't just the 8th 2-seed to lose to a 15-seed; they also had, according to our pre-tournament March Madness forecast, the best odds of winning the title. Even after the loss, Simple Rating System ranks the Spartans as the 3rd best team in the nation. And that's not to mention their star player, Denzel Valentine, who is the only person since 1994-95 to average 19 points, 7 rebounds, and 7 assists per game. Read the rest of this entry

Posted in Uncategorized | 2 Comments »

The 10 Biggest March Madness Upsets

Posted by Jonah Gardner on March 16, 2016

You've done the research, read the tea leaves, reviewed the charts in our paradigm-shifting blog post on how seeding affects tournament performance, and locked in your March Madness bracket. Now all that's left to do is sit back, invent a creative excuse to give when your boss asks why you need Thursday and Friday off work, and watch as a college that sounds like the name of a fictional school from a novel about MFA students eliminates your top seed of choice.

Upsets are a crucial part of the fabric of the NCAA Tournament. In a differently structured postseason, like one where teams played multiple best-of-7 series over the course of a couple months, the team that's favored on paper would win a lot more. But what fun would that be?

So, as we head into the weekend most primed for upsets, let's take a look back. Here are the 10 biggest March Madness upset wins in history. And perhaps we can find some lessons to identify who might be this year's March Madness Cinderella team. Read the rest of this entry

Posted in Announcement, CBB at Sports Reference | Comments Off on The 10 Biggest March Madness Upsets

Is It Better To Be a 12-Seed Than a 9 in the NCAA Tournament?

Posted by Jonah Gardner on March 9, 2016

Have you tested your office computer's streaming capabilities? Made sure your Venmo account is all set? Refamiliarized yourself with terms like Net Rating, Simple Rating System, and RPI? If not, you'd better get on it, because March Madness is almost here!

This Sunday, the Selection Committee will announce the 68 teams that comprise this year's race to the Final Four. And, immediately following that, millions of people will begin filling out their bracket. Every year, you carefully research the teams, read up on the players, scout the tactics, and then inevitably see your bracket obliterated in the first weekend when FREAKING DUKE LOSES IN THE ROUND OF 64 AGAIN. Read the rest of this entry

Posted in CBB at Sports Reference | 1 Comment »

Search for Largest College Football Margins of Victory

Posted by Mike Lynch on March 4, 2016

We recently added a point differential search option to the School Game Finder in the Play Index at College Football Reference. So you can now search for things like college football's biggest blowouts since 2000. You can also see that 2013 Florida State has the best cumulative season point differential since 2000. And that 2008 Florida has the most double-digit wins in a season since 2000. We hope you enjoy these new search capabilities.

Posted in Announcement, CFB at Sports Reference, Play Index | Comments Off on Search for Largest College Football Margins of Victory

Looking at the 2016 NBA MVP Projections

Posted by Jonah Gardner on March 2, 2016

Once a day, the Basketball Reference Twitter account sends out a message like this one:

If you click on the link, it takes you to the Basketball Reference MVP projections. Using a model based on previous voting results, the MVP Tracker projects the odds that every player has of winning the 2016 NBA MVP race, if voting were held today. Here's how it shapes up today:

Rk Player Tm Prob%
1 Stephen Curry GSW 76.3%
2 Russell Westbrook OKC 7.8%
3 Kevin Durant OKC 5.0%
4 Kawhi Leonard SAS 2.9%
5 Draymond Green GSW 2.8%
6 LeBron James CLE 2.2%
7 Chris Paul LAC 1.3%
8 Kyle Lowry TOR 0.8%
9 LaMarcus Aldridge SAS 0.4%
10 James Harden HOU 0.4%
Provided by Basketball-Reference.com: View Original Table
Generated 3/2/2016.

 

Invariably, that Tweet gets replies like this:

Or, when it pulls from the bottom half of the Top 10, ones like this:

At the risk of making things awkward during Sports Reference lunch breaks, I largely agree with the replies. Stephen Curry could probably sit out the rest of the season, declare that he believes Seth Curry would beat Oscar Robertson one-on-one, and announce that his favorite Star Wars movie is The Phantom Menace and still win the NBA MVP Award.

At the same time, I think these projections do tell us a lot about which players voters have targeted in the past and why certain candidates might or might not be catching on.

So, with that said, let's go through some of the major candidates, what their case for MVP is, and what our projection system thinks of them. I'll try to apply my own, imperfect human brain to the matter and see if I can bridge the gap between man and machine.

Chef Curry

So let's start with Stephen Curry. The model gives Steph a 76.2% chance of winning MVP. His case is simple: he's the best player, on the best team, having one of the best seasons of all-time.

Curry's eFG% of .643 is not only the best ever by a 30 PPG scorer, it's also the first time a 30 PPG scorer has even broken .600. His Player Efficiency Rating is the best ever, while his WS/48 are merely the 2nd best ever. Oh, and his team has already clinched a playoff spot, 2 months out. So, what the heck, computer?

For starters, the best player doesn't always win. MVP voters are very smart, but they aren't necessarily looking at stats like PER. Steph would hardly be the first 30 PPG scorer to lose MVP and, even if you factor in his bonkers efficiency, the 1990 MVP race featured 2 players (MJ and Malone) scoring 30+ PPG and shooting over 51% from the field and neither won.

Of course, neither of those players were on the best team in their conference that year, let alone one that would challenge for the best record in history. The MVP projection accounts for the Warriors' record, but not the historic implications of it or the fact that they have a shot at the best record of all-time.

This gets at, perhaps, the biggest difference between the projections and the perceptions: there's no way a model can adequately account for narrative. On paper, the Warriors are just 4 games up in their conference, yet that dramatically undersells their once-in-a-generation dominance.

They have the best record in NBA history through 59 games, they haven't lost to a title contender all season, and they blew out the Spurs the one time they played. Because none of that is going into the model, the Warriors' lead seems larger to us than the projection can recognize.

It also can't account for the tactical advantage that Steph's off-the-dribble shooting gives the Warriors. When someone can do this, it bends defenses past their breaking point, creating easy looks for teammates. Steph's jump shooting is the flux capacitor that powers the entire Warriors Machine.

At the same time, I don't want to undersell the model. Weird things happen, voters do get MVP wrong sometimes (just ask any Kobe or LeBron fan about the 2006 MVP race), and the model wants to account for that. 76.2% seems low to me, but it's still very, very high for a (rather conservative) projection system.

A Sound of Thunder

2nd and 3rd in the projections are Russell Westbrook and Kevin Durant, which is in itself another reason why Steph's chances are higher than they may appear. If there were a convincing 2nd option (say, KD putting up his numbers on a team with the Spurs record), we might be more willing to accept the idea of Curry's odds sitting at 75%.

However, KD and Russ are running so close that neither one has a huge advantage to us as observers. The projections don't account for the fact that they're teammates, but that will likely lead to vote-splitting, further solidifying Curry's lead. Seeing which one you prefer, however, presents a fascinating Rorschach Test.

Westbrook's famously reckless game is among the NBA's great aesthetic joys. However, after Durant's injury, Westbrook found a way to amp up the production without sanding down his edges. The result is that Westbrook has risen from supporting actor to lead.

He has 9 triple-doubles this year, trailing only Draymond (and he's the first player since Jason Kidd in 06-07 and 07-08 to have 2 seasons in row with at least 9). He's 2nd behind Rajon Rondo in Assists Per Game, while also scoring twice as many points as The Yoga Instructor and out-rebounding him. Barring a slowdown, Russell Westbrook will be the 1st player not named Oscar Robertson to average 24 PPG, 10 APG, and 7 RPG in a season.

Durant's game is less like a Swiss Army Machete and more like a lightsaber. He's scoring 27.9 PPG with an eFG% of .575 and a True Shooting Percentage of .635, both of which are historic achievements we'd be able to more fully appreciate if Steph weren't regularly lighting basketball courts on fire.

KD's precision and efficiency mean that some advanced stats, like Offensive Rating and WS/48, prefer him to Westbrook. Of course, Westbrook leads in PER and the Thunder's Net Rating is 13.7 points per 100 possessions better when Russ is on the floor, versus 11.2 for KD

Durant's getting blocks while Russ is getting steals; Westbrook crashes the boards on offense while KD cleans the defensive glass. The question, as it is, seems to come down to quantity vs quality. You could say that Westbrook does (ever so slightly) more, while KD does less, but does it all (ever so slightly) better.

In that case, the projection backs Westbrook because, traditionally, the voters are looking for that quantity. Only 4 players have won MVP while putting up numbers at or below KD's mark of 8.1 rebounds per game and 4.6 assists. 3 of them outscored Durant in points per game, 3 played all 82 games, and 3 played more minutes per game than KD is averaging.

Westbrook and Durant are as close to a true elite partnership, without a clear alpha dog, as we've seen. Unfortunately, that fact will probably cost both of them any kind of shot at MVP.

How Much Should Defense Matter?

Despite being ranked 4th, Kawhi Leonard may have a clearer path to 2nd in the MVP voting than either Westbrook or Durant, thanks to context. In his favor are two things that our model can't really account for.

First, Leonard is probably the NBA's best defender. However, he does it in ways that are largely absent from box scores. His 1.8 steals and 0.9 blocks are impressive, but Paul Millsap actually has better box score numbers.

What Millsap doesn't do is hold opposing teams to 96.1 points per 100 possessions when he's on the floor. That's Kawhi's mark and the only one that's better this year is Tim Duncan, who's spent roughly 2/3 of the time on the floor this year that Leonard has. And that's what's quantifiable by advanced stats. Watching the games, it's clear that Leonard's hard work and shutdown-D is the linchpin of the best defense of the last 10 years.

What's important to note, however, is that ignoring defense is probably the correct approach for a projection system to take. For instance, here are three candidates from a recent MVP race:

Totals Per Game Shooting
Rk G MP FG FGA 3P 3PA FT FTA TRB AST STL BLK PTS eFG%
1 79 38.8 9.6 18.8 1.2 3.5 6.4 8.4 7.5 7.0 1.6 0.6 26.7 .541
2 78 37.6 7.9 13.4 0.0 0.1 7.0 11.7 14.1 1.4 1.4 2.4 22.9 .593
3 81 37.4 8.8 19.7 1.6 4.8 5.9 6.9 4.1 7.7 1.0 0.6 25.0 .485
Provided by Basketball-Reference.com: View Original Table
Generated 3/1/2016.

 

Player 1 is LeBron James, just months removed from The Decision and the "Not 4, Not 5, Not 6" pep rally. Player 3 is 2010-11 MVP Derrick Rose. And Player 2 is Dwight Howard. Given how close the race is on the numbers, Dwight's earth-shattering defense (he had a 94 Defensive Rating that year) should have put him over the top, but voters largely ignored it, in favor of Rose's superior offensive burden.

In general, voters emphasize offensive production over defense. You can argue this is changing, or that Kawhi's once in a lifetime talent could transcend this pattern, but the projection system can't hear you.

The second factor is a more tactical one. The race between KD and Westbrook is so close that, even if a voter were likely to pick one, it's not clear which one they'd pick. It's not hard to see them splitting the vote, or even that lack of a clear choice pushing voters to Kawhi.

It's also unlikely that a voter would pick one member of the Thunder and then turn around and vote for the other in 3rd place. Since the MVP award was created in 1956, only one pair of teammates -- Jerry West and Wilt Chamberlain in the 1972 MVP race -- have both finished in the top 3.

Our model is trying to predict who will win, not necessarily predicting the Top 5 in order, which means it isn't interested in accounting for the fact that MVP voters tend to spread the love among multiple teams as they work their way down the ballot.

Kawhi's MVP case is even more interesting when set against someone who isn't getting very much buzz at all: LeBron James.

LeBron is currently in 6th in the MVP projections, with just a 2.2% chance of winning. Yet he looks a lot more like a traditional MVP than Kawhi. Here they are, side by side:

Rk Player Tm G MP FGA FG% 3PA 3P% eFG% TRB AST STL BLK PTS
4 Kawhi Leonard SAS 54 32.5 14.4 .511 3.8 .488 .575 6.7 2.4 1.8 0.9 20.5
6 LeBron James CLE 57 35.9 18.8 .505 3.8 .284 .534 7.2 6.6 1.4 0.6 24.9
Provided by Basketball-Reference.com: View Original Table
Generated 3/1/2016.

You can see Kawhi's only real advantage on box score numbers is his efficiency. LeBron is scoring more, rebounding more, and assisting more. He's running fairly close on steals and blocks. What about the narrative that LeBron is somewhat coasting through the regular season to save his production for the playoffs? That could be true, but he's playing more minutes per game, and more games, than Kawhi.

What we have is basically the KD-Westbrook debate with 2 new variables, one that the model picks up and one it doesn't. On the one hand, Kawhi has a massive advantage in defense, which historically hasn't factored in voting that much. On the other, there's the fact that Kawhi's team is 8 games ahead of LeBron's, something which has historically mattered a great deal to voters. As a result, the model has Kawhi ahead, matching the conventional wisdom.

But wait, there's one more multi-positional defensive wizard who needs to have a say in this discussion. The model gives Draymond Green a 2.8% shot at the trophy, which paradoxically feels both high and low to me.

It's high for the simple reason that Steph is going to beat him in MVP voting. With that out of the way, however, I'd like to argue that Draymond is more deserving of consideration than he's getting. It's Green's freakish positional versatility that fuels Golden State's Death Star lineup. Green leads the league in triple-doubles, is the first player in nearly 20 years to average 13/9/7, and has played more minutes this year than Curry.

The model doesn't consider the fact that Green and Curry are teammates (other than the fact that it won't include more than 2 players from any team), but it's helpful to look at the Bulls. In years where both Michael Jordan and Scottie Pippen were eligible, and both received votes, Scottie finished 9th, 5th, 11th, and 10th in the MVP race. The only year he cracked the Top 8 was 1995-96, a year when the Bulls finished with a record you might be familiar with.

Like Pippen, Green's contributions are largely off the box score and secondary to the transcendent scorer he plays with. Kawhi gives us a look at what it would look like if a Pippen/Draymond type player was the best player on a transcendent team. While the model is probably underrating his ranking relative to the competition, it is also correctly assessing that his defensive contributions aren't typically valued highly enough by voters to make him a real threat to Steph's coronation.

Some years you get a close MVP race, and some years Stephen Curry does things no one has ever seen on a basketball court before. But there's value in projecting out an MVP field that's 10 candidates deep, just like there's value in voting for a Top 5 for MVP, instead of just a winner.

Durant, Westbrook, and Leonard are all having historic seasons in their way. LeBron is adding another amazing year to an astonishing career. Draymond is the glue that holds the Warriors beautiful art project in place. And that's not to mention what guys like Kyle Lowry, Damian Lillard, and Chris Paul are doing.

Posted in Announcement, Awards, Basketball-Reference.com | 2 Comments »

New Box Scores and Play-By-Plays Added to Baseball Reference

Posted by Mike Lynch on March 1, 2016

Thanks to the efforts of our friends at Retrosheet, we have added box scores for the 1913 MLB season to Baseball Reference. Additionally, we have added play-by-play for games as far back as 1930. Before this update, our oldest play-by-plays went back to 1938. In addition to the boxes and PBPs themselves, this update allows for a variety of new information searchable in the play index, as well as new rows of information in team/player/league statistics tables.

Here's a quick breakdown of the data coverage for the Play-By-Plays we've added from 1930 to 1937:

  • 1930 - 77% of games
  • 1931 - 82% of games
  • 1932 - 75% of games
  • 1933 - 81% of games
  • 1934 - 72% of games
  • 1935 - 71% of games
  • 1936 - 65% of games
  • 1937 - 82% of games

And here are some examples of some of the new information/searches available on the site:

We're very excited about these new additions and hope you enjoy them, as well. Please let us know if you have any comments, questions or concerns.

And thanks again to Retrosheet!

Posted in Announcement, Baseball-Reference.com, Data, History, Play Index, Uncategorized | 4 Comments »

NFL Combine Results Added to Pro Football Reference

Posted by Mike Lynch on February 26, 2016

We're happy to announce that we've added NFL Combine results since 2000 to Pro Football Reference. Tables showing results for each season are available in our NFL Draft section, but we're most proud of the new Play Index tool we've created with this data. Our new NFL Combine Results Finder allows users to run customized queries through combine results since 2000. For instance, you can run searches like:

The above is a small sampling of what is possible. You can also slice, dice and sort by spans of years, heights, weights, positions, team drafted by, college and whether or not the player appeared in the NFL.

2016 NFL Combine Results will be added once the combine is over.

If you have any questions or comments, please let us know.

Posted in Announcement, Data, Draft, Features, History, Play Index, Pro-Football-Reference.com | Comments Off on NFL Combine Results Added to Pro Football Reference

The Sports Oscars

Posted by Jonah Gardner on February 25, 2016

Sunday night, I will be taking a brief break from sports to watch the Academy Awards. And while my favorite film of 2015, It Follows, won't be competing, there are plenty of excellent films to root for (and some very bad ones to root against).

As someone who loves both the Oscars and sports, I've been thinking about ways in which we can combine the two. Sure, there are the ESPYs, but those categories aren't exactly equivalent to the Academy Awards. So, I decided to make my own!

Using stats that you can find across all of the Sports-Reference sites, I've created the Sports Oscars, adapting (and, admittedly in some cases, stretching) categories that we'll see on Sunday to fit with sports teams. For our purposes, we'll be using the stats from the 2015 MLB season and 2015 NFL season, which are complete, as well as the 2015-16 NBA season and 2015-16 NHL season, still in progress. Unlike the actual Oscars, these awards are determined by stats, although I'm sure you'll still find plenty to argue with.

Don't worry, I won't subject you to my terrible singing voice in an elaborate musical number. Instead, let's get to the awards! Read the rest of this entry

Posted in Announcement, Ridiculousness | 2 Comments »

Explaining our Handling of “Holds”

Posted by Mike Lynch on February 24, 2016

UPDATE (Feb. 25, 2016): MLB has informed us that they will be updating Brach's 2015 holds total to 15 (matching us). MLB's Cory Schwartz commented: "We do credit Holds whenever the pitcher enters in a Save situation and leaves with the lead intact, so this was an oversight on our part."

It recently came to our attention that for the 2015 season, we credited Brad Brach with 15 holds. MLB, meanwhile, credited Brach with just 14 holds (NOTE: After reading this post, MLB has agreed that 15 is the correct number of holds for Brach in 2015). It was discovered that the difference was in the handling of the Orioles 5-4 win over the Mariners on May 21. Before we jump into the details, let's examine MLB's definition of a hold (bolding is ours, for emphasis):

"The hold is not an official statistic, but it was created as a way to credit middle relief pitchers for a job well done. Starting pitchers get wins, and closers -- the relief pitchers who come in at the end of the game -- get saves, but the guys who pitch in between the two rarely get either statistic. So what's the most important thing one of these middle relievers can do? "Hold" a lead. If a reliever comes into a game to protect a lead, gets at least one out and leaves without giving up that lead, he gets a hold. But you can't get a save and a hold at the same time."

UPDATE (Feb. 26, 2016): Please see MLB's updated Holds definition here

As you can see, this isn't really much of a definition at all. There's little in the way of criteria here, and it's also pointed out that the statistic isn't even official, anyways. In fact, there's enough confusion that MLB.com credits Cory Rasmus with 2 holds in 2015, but Elias (MLB's official statistician) credits him with 1 hold in 2015. We credit him with 2, for what it's worth. This "definition" provides enough room for interpretation that variance in recorded totals is not uncommon.

Being that the statistic is unofficial, explaining all of this might be a pointless exercise, but in an effort to be transparent, we at least want to point out what standard we are using to assign holds.

Our standard is to give a pitcher a hold any time they protect a lead in a save situation (meaning they could have been eligible for a save if they finished the game). Brach presents an interesting study in that May 21 game. Starter Chris Tillman pitched 3 innings and left with a 4-1 lead. Obviously, he was not eligible for the win due to Rule 10.17(b), as he did not complete 5 innings. Tillman was relieved by Brian Matusz, who allowed 2 runs in the 4th, but completed the inning of work and left the game leading 4-3, when Brach took the mound for the 5th inning. Brach completed 2 scoreless innings, but the Mariners tied it up in the 7th after Brach left the game. The Orioles eventually won the game.

With the benefit of hindsight, you could say that Brach would have been in line for the win (not the save) if he had finished the game, since he ended up being more "effective" than Matusz, which would make it nearly a lock that the official scorer would have given him the win. But, hypothetically, Brach could have given up 20 runs in relief, but maintained the lead, and earned the save (with Matusz getting the win). As unlikely as that scenario is, the point here is that we're not using hindsight in assigning holds. In our opinion, the opportunity for a hold is defined when you enter the game and is only removed retroactively if you are given the win.

To be as clear as possible: our policy is to credit a hold when a pitcher enters the game in a save situation and leaves with the lead (and is not later given the win by the official scorer).

As we bolded in MLB's definition of a hold, "If a reliever comes into a game to protect a lead, gets at least one out and leaves without giving up that lead, he gets a hold." It would sure seem to us that Brach's May 21st appearance fits that criteria.

2019-07-09 Update on Long Holds

We were silent on this issue earlier, but we do not give a hold in the situation where a reliever is only in line for a "long save". A long save would be the type where they pitch three innings with the lead to end the game. For example, on June 29th in London Yankee Nester Cortes entered a 14-6 game in the bottom of the 4th with two outs, and a runner on first. This is not a standard save situation. He then pitched three innings in relief and did not relinquish the lead. If he'd gone on to complete the game he would have received a save, but his appearance did not begin as a save situation as the save is dependent on him pitching three innings. We do not include these situations as save situations and do not credit holds in these cases. This is true of most record keepers, but we are aware that MLB gameday did give Cortes a hold in this situation.

This wonderful Hardball Times article spells out the many differences in how holds and blown saves are calculated. It turns out no two sources agree on any of the league totals for holds or blown saves.

Posted in Announcement, Baseball-Reference.com, FAQ, Ridiculousness, Stat Questions, Statgeekery | 10 Comments »

NFL Cap Hits Added to Pro Football Reference

Posted by Mike Lynch on February 22, 2016

One of the most common terms searched for on Pro Football Reference is "salary." Since we believe in giving the people want they want, we've gone ahead and added some of this data. We've created a page showing 2015 cap hits, and have also added cap hits to the 2015 team roster pages. We do not quite have full coverage. 1,979 players played in an NFL game in 2015. We have cap hits for 1,777, which is just shy of 90% coverage. We will update with 2016 cap figures as they become available.

Posted in Announcement, Data, Features, Pro-Football-Reference.com | Comments Off on NFL Cap Hits Added to Pro Football Reference