York Times: Do not use our content for AI training systems
The New York Times wants to opt-out of Google’s training for AI.
The Times has made a number of changes to its Terms of Service to prevent AI companies from using content created by the media organization to train their own systems.
Why do we care? Many language models, including large ones, are trained by using content from websites (see: Browse the 15.7 millions websites in Google’s C4 dataset). Google is looking at alternatives and supplemental methods of controlling crawling, indexing, and search beyond robots.txt. However, some brands, such as Reddit, are stating that they do not want their content to be used for the improvement of products or increasing profits for Google Microsoft OpenAI. Consider adding similar AI-related messages to your terms page.
What’s changed. The New York Times has updated its Terms of Service page on Aug. 3. The New York Times has updated its terms of service page on Aug. 3.
The “Prohibited Use of Services” section contains the following:
- Use the Content to develop any software program. This includes, but is not limited, to training a machine-learning or artificial intelligence system.
Will AI compensate publishers? OpenAI signed a contract with the Associated Press last month. OpenAI licensed AP’s archive of news articles dating back to 1985.
Google and The New York Times Co. have an existing lucrative ” Commercial Agreement “, but this deal is about collaborating on “tools and subscriptions for content distribution.”
Microsoft also promises publishers some form of revenue sharing. The majority of benefits are expected to go to its Start program members.
The article New York Times – Don’t train AI systems with our content first appeared on Search Engine Land.