Sports Reference Blog

New Bot Filtering and Content Delivery Network for Sports Reference Sites

Posted by sean on October 7, 2022

Our devops department finished our move from AWS's Cloudfront Content Delivery Network (CDN) to Cloudflare's CDN this week. This will be undecipherable technical jargon to most of you, so I'll explain what it means.

When you request a page or file from the site, the request first goes to a CDN that is located geographically close to you and if a neighbor of yours (within several hundred miles) has requested the page earlier in the day, you'll get a cached copy of that page much faster than you would otherwise. If your neighbors are lame and don't use our site or just haven't visited that page, then the request will be passed on to our servers and we'll send you the page.

This has many advantages in that it takes a load off our servers and also makes things faster in general for our users the more the sites are used.

Cloudflare also includes bot filtering with this offering. It actively searches for badly behaved bots and proactively blocks them from our site. We are now utilizing this feature because we get scraped. A LOT. And if your bot is badly behaved then we don't want you impacting the performance of the site for our other users. We've gotten a number of emails around this this week as people who have been scraping us have found themselves blocked.

As of today, this filtering will remain in place for our:

  • Soccer/Football
  • Basketball
  • Hockey
  • College sites

We are turning off the active bot filtering on our baseball and pro football sites at this time. We will re-enable bot filtering the week after the end of their respective seasons.

What is a well-behaved bot? Please see our Data Use page and our Bot Traffic page for guidance. Note that if you are blocked by our servers, the blocks reset after 24 hours, so with some more polite settings on your bot you may be able to try again the next day.

We can not provide an API (per our data licensing agreements), and you should not view as a data provider (on par with SportRadar or Genius Sports). This is a role that we do not and can not support with company resources.

Comments are closed.