Sports Reference Blog

Archive for the 'Baseball-Reference.com' Category

Baseball-Reference Simulating 2020 Season with Out of the Park Baseball 21

26th March 2020

Today would have been Opening Day across Major League Baseball, with all 30 teams scheduled to start their seasons.  With the season on hold, we don’t get to find out how those games would have turned out. Or do we?

To help our users get their fix of new baseball, we are simulating the 2020 season using Out of the Park Baseball 21 and posting the results on Baseball-Reference.com.  Starting today, we’ll update the site each day around noon ET with the results of that day’s games. Check out player pages across the site to see their simulated stats update as the season progresses.  Additionally, we’ve set up a daily digest page where you can see each day’s scores and the current standings, as well as team and league pages with all the simulated statistics pulled together into one place.

OOTP is a full-featured simulation with a lot of settings, and we asked for your help on Twitter with a few decisions we needed to make.  Your feedback was overwhelming that you wanted to see free agents like Yasiel Puig sign with teams and that you want to see the game’s AI make trades throughout the season.  We have disabled injuries in the simulation, though, since no one wants to risk having a superstar player suffer a major injury and end up missing from the simulation.

 

Posted in Announcement, Baseball-Reference.com, Fantasy, Ridiculousness | No Comments »

2020 WAR Update

16th March 2020

As we approach the beginning of the 2020 season, we have made some updates to our Wins Above Replacement calculations.  You may notice some small changes to figures as you browse the site. As always, you can find full details on how we calculate WAR here.

Defensive Runs Saved Changes

Last week, we updated Defensive Runs Saved (DRS) totals across the site with new figures from Baseball Info Solutions.  The new methodology involves breaking down infielder defense using the PART system - assigning run values to Positioning, Air Balls, Range, and Throwing.  Under the new system, an infielder’s total DRS is the sum of his Air Balls, Range, and Throwing runs saved, while Positioning runs saved are credited to the team as a whole.  You can read more about the updates in the Sports Info Solutions blog.  The PART system applies to all infielders since 2013.

Folding these numbers into WAR, we see some significant changes for individual player seasons.  The 2019 Oakland A’s get even more recognition for defense on the left side of their infield, with shortstop Marcus Semien gaining 0.7 WAR and third baseman Matt Chapman gaining 1.6 WAR from the new DRS numbers, lifting both players above Mike Trout and into second and third place respectively on the 2019 AL WAR leaderboard.  Chapman’s 1.6 additional WAR represents the largest single-season change in this update.

On the other end of the spectrum, we see Adrian Beltre with the most significant drop in this update, losing 1.5 WAR in 2015.

Since we use DRS to measure the quality of a team’s defense, these new values also impact pitcher WAR values.  Team total DRS changed by as much as 46 runs for a given team and season - the 2019 Dodgers defense improved from 75 DRS to 121 DRS by non-pitchers under the new system.  Once applied to a specific pitcher, however, the changes to WAR are much smaller in magnitude than the changes to individual fielders. The most extreme example is Hyun-Jin Ryu, who pitched 182.2 innings in front of the 2019 Dodgers defense.  Considering the Dodgers defense to be 46 runs better across the entire season, and considering that Ryu was the pitcher for 13.52% of the Dodgers’ balls in play in 2019, we adjust our expected runs allowed for Ryu by 6.2 runs for the season. After following the rest of the steps in our pitching WAR calculation, the end result is a drop of 0.3 WAR for the season.  All other changes to pitching WAR from this change to team defense are smaller than Ryu’s 0.3 WAR drop in 2019.

Park Factors

Park factors for 2018 have been re-computed to include the 2019 season, since WAR uses a three-year average for park factors when computing pitching WAR.  The most significant change here is the Miami Marlins, whose pitching park factor rose from 90 to 95 (where <100 represents a pitcher’s park and >100 represents a hitter’s park).  José Ureña sees the biggest benefit from this, with his 2018 WAR rising by 0.7 wins. All other changes to pitching WAR from updated park factors are smaller than Ureña’s 0.7 WAR gain in 2018.

New Game Logs from Retrosheet (1904-1907)

Last month, we updated the site with new data from Retrosheet, including new game logs for players from 1904 to 1907.  Having game-level data allows us to be more precise in our WAR calculations, since we can consider the specific ballparks a pitcher played in and the opponents he faced.

Take Christy Mathewson in 1907 as an example.  Prior to this change, we used the league average (excluding his team) of 3.36 runs per nine innings as the expected quality of his opposition.  However, with game-level data, we can see that Mathewson’s actual opponents averaged 3.55 runs per nine innings, showing that Mathewson was probably used strategically and started more games against better opponents.  Indeed, Mathewson pitched in 10 of the Giants’ 22 games against the league’s best offense, the Pirates, as well as 7 of the Giants’ 22 games against the Cubs, the NL’s second-best offense. Against the Dodgers and Cardinals, who each struggled offensively and scored fewer than 3 runs per game, Mathewson pitched in just 8 games total.

Knowing this about his usage, we can set more accurate expectations for how many runs an average player would have allowed under Mathewson’s circumstances.  By adjusting the quality of his opposition, we expect an average pitcher to have allowed about 7 more runs over the course of the season, resulting in a bump of 0.9 WAR in 1907.  All other changes to pitching WAR from new game log data are smaller than Mathewson’s 0.9 WAR gain in 1907.

Baserunning and Double Plays from Play-by-Play Data (1931-1947)

When calculating runs from baserunning and double plays, we use play-by-play data from seasons where it is complete enough to credit players for things like scoring from first on a double, advancing from first to third on a single, and hitting into fewer double plays than expected.

In the past, we have taken play-by-play data into account back to 1948 for baserunning and double plays, because the data further back than that has been incomplete and could give players an advantage in their WAR simply by having more complete play-by-play records than their peers.  As this data has become more complete over time, we have moved this cutoff back to 1931. The data is still somewhat sparse for games that took place during World War II (1943-45), but we felt it was worth including those years as well.

Pete Reiser of the Brooklyn Dodgers was skilled at taking extra bases, and it showed in the play-by-play accounts.  In 1942, he took extra bases at a rate of 55%, compared to the league average of 45%. Additionally, the Dodgers were tied with the Cardinals as the league’s top scoring offense, so Reiser had many opportunities to put his speed to use.  He scored from first on doubles a league-leading ten times in just 15 opportunities, and also scored from second on a single 24 times, good for 5th in the NL that year, in just 29 opportunities. Using this play-by-play data while computing WAR gives Reiser an additional 1.2 WAR in 1942.  All other changes to batting WAR from this change are smaller than Reiser’s 1.2 WAR gain in 1942.

Caught Stealing Totals from Game Logs (1926-1940)

When crediting runners for how many runs they contributed with their baserunning, we take into account their stolen base and caught stealing totals.  Caught stealing totals are missing for many players between 1926 and 1940, but we have complete game logs for players in that span.

In the past, when we didn’t have a caught stealing total for a player, we would estimate how many times they were likely to have been caught stealing based on the league’s stolen base success rate and the ways the player reached base during the season.

We are now using actual caught stealing totals from the players’ game logs, so there are some changes for players who did considerably better or worse than we had been estimating.

Take, for example, Freddie Lindstrom.  In 1928, the Giants third baseman stole 15 bases, but his official season stat line does not have caught stealing available.  Previously, we had estimated that he was caught stealing 11.57 times, based on everything else we knew about his performance and the league he played in.  However, game logs indicate that Lindstrom was caught 21 times, nearly twice as often as we had estimated. This difference gets folded into our baserunning runs calculation and results in a drop of 0.4 WAR.  All other changes to batting WAR from this change are smaller than Lindstrom’s 0.4 WAR drop in 1928.

Biggest Career Movers

Hall of Famer Ernie Lombardi sees the biggest change to his career WAR with this update, sinking from 46.8 WAR to 39.5 WAR, a drop of 7.3 wins.  The largest gain goes to infielder Lonny Frey, who picks up 5.2 wins. Both these players played in the 1930s and 1940s and saw big changes because of their baserunning.  Lombardi is known for being one of the slowest runners in baseball history, and this update shows that the numbers back that reputation. Frey was a fast runner in an era where stolen bases were rare, so he has been underrated to this point when it comes to his baserunning contributions.

On the mound, previously cited Hall of Famer Christy Mathewson is the big winner.  As discussed above, his WAR now recognizes how his manager would use him against tougher opponents, and he sees his career WAR jump by 2.2 wins.  Barney Pelty experiences the biggest drop of 1.9 wins.

We’ve highlighted some of the more extreme changes here, but to see full lists of the largest changes to season and career WAR totals, please see the spreadsheet here.

We're very excited about these new additions and hope you enjoy them as well. Thanks to Baseball Info Solutions for their contributions. Please let us know if you have any comments, questions or concerns.

Posted in Advanced Stats, Announcement, Baseball-Reference.com, Data, Features, History, Leaders, Play Index, Statgeekery, WAR | 5 Comments »

Ad-Free and Play Index Changes Coming to Baseball-Reference.com

4th March 2020

The Play Index launched on Baseball-Reference.com over thirteen years ago and has been one of the most used research tools for baseball ever since. We've made a few additions over the years, but the tools have largely stayed the same and the price has only gone from $29/year to $36/year during those thirteen years.

The Sports Reference sites have continued to grow in traffic and advertising revenue over that time to the extent that the Play Index and our ad-free options are a very, very small portion of our revenue. Most of that is on us, as we have not done a great job of promoting and marketing tools that are highly valued by a dedicated group of users. The Baseball Play Index represents less than 4% of our revenue and ad-free memberships are less than 1%. In addition, the Play Index tools are complicated to maintain and manage, and quite frankly are a money-loser for us at this time. It's well past time to re-think how these tools are setup within our constellation of sites.

While Sports Reference is doing quite well overall, I'm not comfortable with having so much of our revenue dependent on advertising. We are very beholden to search engines continuing to send us traffic, and likewise the ad market can be fickle and difficult for a small to medium size operator to navigate.

Also, advertising on the sites does not make it easier for you to answer the questions you have. This is our primary mission. We maintain a relatively low level of advertising on the sites (at least compared to your regional newspaper), and we are loathe to add additional advertising units or more intrusive units. Some of you may use an ad blocker, in which case we are making no money from your use of the site at all, and the audience for our ad-free product has proven to be very small as well.

A subscription model aligns our interests much better with our users' interests as well. I realize that users are being asked to sign up for lots of subscriptions these days, but we feel the tools within the Play Index are so specialized and useful that they warrant a paywall.

So we are making some changes. The Play Index for each site will be moving to Stathead.com. Stathead.com will become the center for all of our subscription products. We expect these products to include tools and information beyond just a redesigned set of Play Index tools. This won't happen all at once, but we'll start with baseball and then proceed through the remainder of our sports. Also, we will be ending our ad-free product and instead Stathead memberships will have ad-free built-in. There just aren't enough users to justify a separate ad-free product. These changes will begin this month and continue through April on baseball and then continue with the other sites after that.

If you are a subscriber, we will make every effort to make certain you are happy with the options we provide to convert your ad-free or Play Index subscription over to Stathead including the option of a refund on your subscription. You will be hearing more from us about the changes over the next few weeks as we will email users directly.

During the deployment of these changes, the Play Index on Baseball-Reference.com (and the to be launched Stathead.com Baseball) will be free. They will continue to be free through at least April 30th. If you are a current subscriber to either of our products, we have already extended your subscription by an additional two months during this free period.

--sean forman

Posted in Announcement, Baseball-Reference.com, Play Index, Redesign, Statgeekery | 14 Comments »

Box Scores Since 1904 & Play-by-Play Since 1918 Now on Baseball Reference

20th February 2020

Thanks to the efforts of our friends at Retrosheet, we have added box scores back to the 1904 season to Baseball Reference. Previously, our game log coverage was back to 1908. Additionally, we have added partial play-by-play coverage for games games as far back as 1918. Previously, our oldest play-by-plays were from 1925. Since our last major Retrosheet update, the final two missing full play-by-plays of 1973 were added which means we now have complete PBP data back to that season now. In addition to the boxes and PBPs themselves, this update allows for a variety of new information searchable in the play index, as well as new rows of information in team/player/league statistics tables.

Here are some examples of the new information/searches available on the site.

If you have any questions about our data coverage, you can always see it here.

We're very excited about these new additions and hope you enjoy them, as well. Please let us know if you have any comments, questions or concerns.

And thanks again to Retrosheet!

Posted in Announcement, Baseball-Reference.com, Data, Features, General, History, Play Index | 7 Comments »

Reliever of the Month Award Added For Baseball-Reference

22nd January 2020

For April 2017, MLB awarded Reliever of the Month honors to Greg Holland (NL) and Tommy Kahnle (AL) and has kept up this award through the 2019 season. With 3 years of the award in the books, we have added it to Baseball-Reference's repertoire of awards history. This will appear in the Leaderboards, Awards and Honors section of player pages as Monthly Awards. Here's a link to Edwin Diaz, who currently leads with 5 Reliever of the Month honors.

Check out the full list of Reliever of the Month recipients at Baseball-Reference.com. If you have any questions or suggestions, feel free to contact us through our feedback form or Baseball Reference's official Twitter account. Thanks for following us!

Posted in Announcement, Awards, Baseball-Reference.com, Data, Trivia | 2 Comments »

Row Isolation Added to Sports-Reference Sites

16th October 2019

Sports-Reference has added a feature to tables that will make it a lot easier to compare teams and players in an easily scannable fashion. Now, when you select a row on a table, a popup will appear with a button: "Show Only Selected Rows". Highlight the rows you want to isolate, and once you're ready, click the button. The site will then fade out the unselected rows so the only rows displayed are the ones you highlighted. If you want to return to viewing the full table, just click on the "Show All Rows" button.

This feature applies to any tables that don't have row summing capabilities. This will work on both desktop and mobile. You can see a video of the feature in action at Baseball-Reference's Twitter account. You can contact us through our feedback form if you have any questions or suggestions.

Posted in Announcement, Baseball-Reference.com, Basketball-Reference.com, CBB at Sports Reference, CFB at Sports Reference, FBref, Features, General, Hockey-Reference.com, Pro-Football-Reference.com, Tips and Tricks | 2 Comments »

College Baseball Stats Added to Baseball Reference

6th June 2019

Baseball Reference has added college baseball stats to the Register section of the site.

The statistics cover NCAA Division I back to the 2013 season and also four collegiate summer leagues back to 2015 (Cape Cod League, New England Collegiate Baseball League, Northwoods League and Perfect Game Collegiate Baseball League).

So you can now see, for instance, Aaron Nola's numbers at LSU in 2013 and 2014, where he teamed with Alex Bregman, or Kris Bryant's 31 HR 2013 season at San Diego.

Additionally, you can see how 2019 #1 overall pick Adley Rutschman performed both at Oregon State and in the Cape Cod League.

Read the rest of this entry

Posted in Announcement, Baseball-Reference.com, Data | Comments Off on College Baseball Stats Added to Baseball Reference

Find Best Performances vs *Any* Opponent With Split Finders

31st May 2019

In honor of Gleyber Torres's ridiculous performance against the Orioles this season, we've made a subtle, but very useful, tweak to the Split Finder tools in the Baseball Reference Play Index.

Previously, when you selected split by opponent and selected "match any listed" you were served results that included all opponents .500 or better, all opponents under .500 and interleague opponents lumped together. While those are technically opponent split options, it didn't really match the spirit of the search, which was users searching for outstanding (or awful) performances against any single opponent. So we've made a tweak to "match any listed" opponent split searches and now show only splits vs individual teams in those searches.

Here's a few examples of searches you can now run easily, without having to sift through records against teams above and below .500:

We hope you enjoy this addition. Please let us know if you have any questions, comments or concerns.

Posted in Announcement, Baseball-Reference.com, Play Index | Comments Off on Find Best Performances vs *Any* Opponent With Split Finders

Baseball-Reference Adds Playoff Odds

14th May 2019

Starting today, while you browse Baseball-Reference, you can find probabilities of each team to reach the postseason, win the division, and advance to each playoff round including winning the World Series.

To compute these odds, we simulate the rest of the season and the postseason 1,000 times each day. The methodology relies on Baseball-Reference’s Simple Rating System (SRS), which provides a strength-of-schedule-adjusted rating of each team, expressed in runs per game better or worse than an average team.

Prior to going into the details, we should tell you what our goals were for the system. Systems can vary in what they focus on, so having a clear idea of the questions we are trying to answer can add some insight and guide you in how you might use the system. We wanted a relatively simple system that would most accurately estimate the team's end of the year win total. This system could answer questions such as: Should a team go for it at the trade deadline? or Is a team in second place at the All-Star Break likely to fall off or contend for the division? or Is it too early to be certain a hot start will continue? This system is not designed to predict World Series win odds as well as possible since it's tuned with regular season data only. We are assuming that teams are as likely to win in the postseason as they are in the regular season and this is probably a poor assumption given the increased importance to bullpens and superstar starting pitchers.

Additionally, since we wanted a simple system, we are not considering player movement at the trade deadline or individual pitcher matchups which could become relevant during the final games of the season. If you want a more complicated system that considers roster composition, we would point you to the fine system at Baseball Prospectus or FanGraphs.

Typically, SRS is calculated and displayed (for example, on the standings page) based on the season to-date.  For the purposes of the playoff odds simulation, though, we are calculating a value of SRS using each team’s previous 100 games, adding in 50 games of .500 ball for regression to the mean. After a lot of backtesting, these are the numbers that provided the most predictive value. Running the simulation as of July 15 and August 15 of each year from 2009 to 2018, the simulation produced a root-mean-square error of 4.63 wins when compared to teams’ actual end-of-season win totals. For example, last season, both the July and August simulations predicted the Atlanta Braves within 1 win of their eventual season total of 90. This error was the lowest of any of the 50 potential inputs we considered. It was lower than a system that used just the current season SRS, any system with no regression to the mean, and, as a sanity check, a system that just flipped a coin for each game.

Of course, using past performance to predict future performance has its quirks, especially early in the season. For instance, look at the Philadelphia Phillies, who experienced significant roster turnover this past winter. The Phillies added Jean Segura and J.T. Realmuto via trade, as well as David Robertson and Andrew McCutchen via free agency (I think that’s everybody). Looking back over their final 100 games of 2018, Philadelphia’s SRS comes in at -0.7. In other words, they were 0.7 runs per game worse than a league average team.

As we get further into the season, the numbers start to shift, as 2019 performance makes up a larger portion of that 100-game population. Through the games of May 12, Philadelphia’s SRS value over the past 100 games is -0.6, boosted by their 0.4 value in the current season.

While teams like Philadelphia have obvious additional context to keep in mind, using a system that takes into account last season’s performance as well as this season’s prevents the simulation from being fooled too early on by a team that’s simply off to a hot start. The result is a more skeptical simulation that needs to be convinced over time that a club’s new success is legitimate.

Check out this season’s current playoff odds for all teams here, and be sure to check out team pages to see how a team’s odds have changed over time.

If you have any questions or suggestions, feel free to contact us through our feedback form.

 

Posted in Announcement, Baseball-Reference.com, Features, SRS | Comments Off on Baseball-Reference Adds Playoff Odds

2018 and 2019 KBO Stats on Baseball-Reference

8th May 2019

In addition to MLB and U.S. minor leagues, Baseball-Reference also tracks statistics from leagues in other countries, such as the Mexican League, Japan's Central and Pacific Leagues, and the Australian Baseball League.

We also cover the Korean Baseball Organization, and we recently added 2018 full season statistics for the league. We are also tracking the ongoing 2019 season and will be updating KBO stats daily.

Merrill Kelly made his MLB debut with the Arizona Diamondbacks this season, coming over after spending 4 years in the KBO. In 2018 Kelly had a 12-7 record for SK Wyverns and finished 2nd in the league in SO/9 among pitchers with at least 100 innings pitched.

With 2019 statistical coverage, you can also keep track of former MLB players who are playing for the first time in the KBO this year. José Miguel Fernández made his MLB debut with the Los Angeles Angels in 2018 but now he's in the top 5 of batting average and home runs in the KBO. Tommy Joseph spent two seasons in the majors with the Philadelphia Phillies, and is now playing for the LG Twins this season. On the pitching side, William Cuevas, Deck McGuire and Jake Thompson all find themselves in the top 5 in strikeouts in their first KBO seasons.

If you have any questions or suggestions, feel free to contact us through our feedback form.

Posted in Announcement, Baseball-Reference.com, Data, Features | Comments Off on 2018 and 2019 KBO Stats on Baseball-Reference