Skip links

How news websites block makers of AI tools

Close to half – 48% –  of the most widely used news websites across ten countries were blocking OpenAI’s crawlers at the start of the year. 24%, were blocking Google’s AI crawler. 97% that decided to block Google’s AI crawler were also blocking OpenAI’s crawlers, a survey by Reuters Institute shows.

The global focus on generative artificial intelligence has emphasised the need for giant amounts of data to train the software. Publishers’ archives are one attractive source for big tech companies developing software but publishers are stepping up their demand to get paid for this new use of their data. 

Social media platform Reddit has struck a deal to make its content available for training of Google’s artificial intelligence models, news agency Reuters reports referring to three people familiar with the matter said. The contract is reported to be worth about $60 million per year, according to one of the news agency’s sources.

The proportion of news websites that blocked OpenAI varied considerably by country, ranging from 79% in the USA to just 20% in Mexico and Poland. 

For Google, the figures ranged from 60% in Germany to 7% in Poland and Spain.

“During 2023, none of the websites we examined had reversed their decision after deciding to block£, the institute reports.

“News outlets with a relatively large online news reach were slightly more likely to be blocking AI crawlers than those with a relatively small reach.”

“All types of news outlets were blocking, but the websites of legacy print publications were more likely to be blocking than those of either broadcasters or digital-born outlets.”

Read Also:  Big tech and publishers in conflicts about data for GenAI tools

The institute says that comparing its findings to other work suggests that news publishers are more likely to block compared to popular websites more generally.

The countries in the survey are: Brazil, Denmark, Germany, India, Mexico, Norway, Poland, Spain, the UK, and the US.

With the GenAI boom, many publishers have announced that they are now blocking generative AI tool makers from using their content to power artificial intelligence.

Others have instead decided to join the GenAI boom like publisher Springer that recently announced an agreement with OpenAI. 

OpenAI’s ChatGPT will produce news summaries based on content from Springer’s media brands including Politico.

The collaboration also involves the use of content from Axel Springer media brands for training of OpenAI’s large language models. 

 US-based news agency Associated Press (AP) and OpenAI earlier announced an agreement to share access to news content for generative AI in news products and services.

“The arrangement sees OpenAI licensing part of AP’s text archive, while AP will leverage OpenAI’s technology and product expertise. Both organisations will benefit from each other’s established expertise in their respective industries, and believe in the responsible creation and use of these AI systems”, AP said.

Read Also:  The race to find data for machine learning and generative AI

Moonshot News is an independent European news website for all IT, Media and Advertising professionals, powered by women and with a focus on driving the narrative for diversity, inclusion and gender equality in the industry.

Our mission is to provide top and unbiased information for all professionals and to make sure that women get their fair share of voice in the news and in the spotlight!

We produce original content, news articles, a curated calendar of industry events and a database of women IT, Media and Advertising associations.

    Do you want an experienced opinion on a job issue?
    Moonshot Manager is here to answer!

      Moonshot community sharing thoughts and ideas, in a anonymous, safe environment.