Skip links
The machine learning and access to data.

The race to find data for machine learning and generative AI

The global focus on generative artificial intelligence has created an intensified demand for data. It started late last year with the launch of OpenAI’s ChatGPT and has multiplied with tech companies fighting for a top position in offering customers benefits from generative AI as the machine learning requires enormous amounts of data and process capacity for software that double as humans creating text, pictures and music.

This data hunger is now followed by discussions around copyright with content producers protecting their assets and demanding extra payment for new multiple use of what they have created. 

During the normally calm holiday season, companies have been active taking positions in the data race.

So has US-based news agency Associated Press (AP) and OpenAI announced an agreement to share access to news content for generative AI in news products and services.

“The arrangement sees OpenAI licensing part of AP’s text archive, while AP will leverage OpenAI’s technology and product expertise. Both organizations will benefit from each other’s established expertise in their respective industries, and believe in the responsible creation and use of these AI systems”, AP said.

But referring to doubts around genAI and trustworthiness, the news agency stressed that “AP continues to look closely at standards around generative AI and does not use it in its news stories.”

“Generative AI is a fast-moving space with tremendous implications for the news industry. We are pleased that OpenAI recognizes that fact-based, nonpartisan news content is essential to this evolving technology, and that they respect the value of our intellectual property,” said Kristin Heitmann, AP senior vice president and chief revenue officer. 

Read Also:  Generative AI hype continues despite worries

“AP firmly supports a framework that will ensure intellectual property is protected and content creators are fairly compensated for their work. News organizations must have a seat at the table to ensure this happens, so that newsrooms large and small can leverage this technology to benefit journalism.” 

“OpenAI is committed to supporting the vital work of journalism, and we’re eager to learn from The Associated Press as they delve into how our AI models can have a positive impact on the news industry,” said Brad Lightcap, chief operating officer at OpenAI. 

The Associated Press has used AI technology for nearly a decade to automate some rote tasks “and free up journalists to do more meaningful reporting”.  AP began automating corporate earnings reports in 2014 and subsequently added automated stories previewing and recapping some sporting events. 

Additionally, AP uses AI technology to aid in the transcription of audio and video from live events like press conferences.

But the data focus is not only around text, it is also for pictures and music. It has during the holiday season been reported that Google is in discussions with Universal Music about possible songwriting using artificial intelligence and stock photo service Shutterstock has signed agreements about using AI to make pictures. 

“Shutterstock is revolutionizing the way visuals are created for campaigns, projects, and brands by making generative AI accessible to all. We’re the first to support a responsible AI-generation model that pays artists for their contributions, making us your trusted partner for generating and licensing the visuals you need to uplevel your brand”, Shutterstock said.

Read Also:  Generative AI chatbots and the focus on fact checking

“AI-generated content represents new content that is created using AI technology trained on millions of real content assets, descriptions, and keywords. AI content generators require some human input like a description, prompt, or parameters.”

 Shutterstock says users can generate, license, and download new images using Shutterstock AI-generated content capabilities, which will compensate contributors through the Contributor Fund. 

However, Shutterstock does not accept AI-generated content being directly uploaded to its library “because we want to ensure the proper handling of IP rights and artist compensation.” 

“Because AI content generation models leverage the IP of many artists and their content, AI-generated content ownership cannot be assigned to an individual and must instead compensate the many artists who were involved in the creation of each new piece of content”, Shutterstock says.

Music streaming service Spotify says it is exploring new ways to use AI for its platform, “after the success of DJ, a generative AI-powered feature that creates personalised playlists based on listening habits.”

But Spotify says it has deleted songs after discovering that the AI-generated songs were being used to collect royalties on behalf of fraudulent accounts.

So, the fast growing interest in generative AI is also leading to copyright discussions with several rightsholders already having announced they will go to court if not properly compensated for this additional use of content they have created.

But legal experts writing in Harvard Business Review say that “ the legal implications of using generative AI are still unclear, particularly in relation to copyright infringement, ownership of AI-generated works, and unlicensed content in training data.” 

Read Also:  Is using generative AI moving too fast and risk breaking things?

“Courts are currently trying to establish how intellectual property laws should be applied to generative AI, and several cases have already been filed”, write  Gil Appel, Assistant Professor of Marketing at the GW School of Business, Juliana Neelbauer, partner at Fox Rothschild LLP and David A. Schweidel, Professor of Marketing at Emory University’s Goizueta Business School. 

“To protect themselves from these risks, companies that use generative AI need to ensure that they are in compliance with the law and take steps to mitigate potential risks, such as ensuring they use training data free from unlicensed content and developing ways to show provenance of generated content.”

“Generative AI will change the nature of content creation, enabling many to do what, until now, only a few had the skills or advanced technology to accomplish at high speed. As this burgeoning technology develops, users must respect the rights of those who have enabled its creation – those very content creators who may be displaced by it.” 

“And while we understand the real threat of generative AI to part of the livelihood of members of the creative class, it also poses a risk to brands that have used visuals to meticulously craft their identity.”  

“At the same time both creatives and corporate interests have a dramatic opportunity to build portfolios of their works and branded materials, meta-tag them, and train their own generative-AI platforms that can produce authorized, proprietary, (paid-up or royalty-bearing) goods as sources of instant revenue streams”, the three legal specialists write.

Read Also:  Majority of YouTube users agree to content produced by artificial intelligence

Moonshot News is an independent European news website for all IT, Media and Advertising professionals, powered by women and with a focus on driving the narrative for diversity, inclusion and gender equality in the industry.

Our mission is to provide top and unbiased information for all professionals and to make sure that women get their fair share of voice in the news and in the spotlight!

We produce original content, news articles, a curated calendar of industry events and a database of women IT, Media and Advertising associations.

    Do you want an experienced opinion on a job issue?
    Moonshot Manager is here to answer!

      Moonshot community sharing thoughts and ideas, in a anonymous, safe environment.