Sports Reference Blog

Updated Play-By-Play Data on Baseball Reference

Posted by Mike Lynch on March 30, 2023

We have run an update on our data from our friends at Retrosheet on Baseball Reference. Retrosheet, which provides box scores and play-by-play covering most of AL/NL history, has significantly expanded the availability of play-by-play for the 1921, 1922 and 1923 seasons, and, for the first time, added play-by-play for 1914 season. Thanks to their efforts, we now have play-by-play for 65.41% of 1914 games (53% in the AL and 79% in the NL), 97.72% of 1921 games, 97.35% of 1922 games and 93.35% of 1923 games. Additionally, 38 other seasons between 1917 and 1971 have seen smaller improvements in play-by-play coverage. This is our first play-by-play for the 1914 season, so many of our statistical tables that need play-by-play in order to be calculated (win probability, situational batting/pitching, base-running/miscellaneous, batting against, etc) will now have rows for 1914 for the first time. The 1921-23 seasons previously had ~80% coverage and are now very nearly complete.

Check out our full play-by-play data coverage map.

The additional play-by-play is also available in Stathead. For example, we now have 9 walk-off home runs from 1914. Two of those were hit by Gavvy Cravath of the Philadelphia Phillies. On Cravath’s Home Run Log, you can see we now have play-by-play data (and therefore win probability added) for 18 of his league-leading 19 home runs. The most important of these 18 home runs (by WPA) was his walk-off home run on June 11 against the Reds. In the bottom of the ninth with two outs and the bases empty, he ended the game with a deep blast to center off Earl Yingling. It had a WPA of .468.

The one Cravath home run from 1914 that we’re missing (August 4) was also a walk-off home run. We know this thanks to the Tattersall/McConnell Home Run Log, a database of all homers hit in the major leagues since 1876. This third walk-off home run is not in the Event Finder because we do not have any other events for that game.

Cravath’s season WPA is 4.3, tops in the NL and second in the majors to Eddie Collins’ 4.7. On the pitching side, Jeff Pfeffer of the Brooklyn Robins leads the way at 6.4. You can check out the complete WPA stats for the AL and NL. These are subject to change as play-by-play for more games is added.

This new data, along with our recent Seamheads update, will also allow us to re-run our WAR numbers, so keep an eye out for that update this week.

One Response to “Updated Play-By-Play Data on Baseball Reference”

  1. John Steele Says:

    Great news on adding more PBP data!