AI’s GPTBot has been blocked by dozens of large brands.
According to a recent analysis, at least 69 out of the top 1,000 websites have blocked GPTBot the new web crawler OpenAI launched on Aug. 7.
According to the AI content and plagiarism services Originality.ai, the number of sites increases by around 5% each week.
Why do we care? Block or not block ChatGPT. Many SEOs have been asking themselves this question. GPTBot has been blocked by several popular websites, most likely because they do not want OpenAI to scrape their data without compensation. ChatGPT also does not link or cite its sources.
According to an analysis, the 15 most popular websites blocking ChatGPT are:
- amazon.com
- quora.com
- nytimes.com
- shutterstock.com
- wikihow.com
- cnn.com
- foursquare.com
- healthline.com
- scribd.com
- businessinsider.com
- reuters.com
- medicalnewstoday.com
- goodhousekeeping.co
- amazon.co.uk
- tumblr.com
But. Although many sites block GPTBot Common Crawl ‘s web crawler, they do not block . Common Crawl provides some of the data that OpenAI, Google, and other companies use to train their algorithms.
The New York Times is one notable exception, as doesn’t want its content to be used for AI training systems. Other popular websites blocking CCbot include shutterstock.com, reuters.com and goodhousekeeping.com.
Limitations. During this analysis, 241 robots.txt were not identified/inspected. This is why I used “at least” as the first sentence.
Originality.ai’s analysis. Websites that have blocked OpenAI’s GPTBot.
Dig deeper. Should you prevent ChatGPT from accessing your site?
The post Dozens brands have blocked GPTBot OpenAI’s web crawler first appeared on Search Engine Land.