The first wave of the leading generative AI tools have been largely trained on “publicly accessible“data — basically anything that can be retrieved from the Internet. Today, training data sources are increasingly restrict access and pushing for license agreements. With the search for additional data sources intensifying, new licensing startups emerged to maintain the flow of raw materials.
THE Dataset Providers Alliancea trade group formed this summer, wants to make the AI industry more standardized and fairer. To that end, it has just published a position paper presenting its positions on key AI-related issues. The alliance is composed of seven AI licensing companies, including a music copyright management company RightsizeJapanese Stock Photo Market Pixtaand a startup specializing in copyright management in generative AI Calliope Networks(At least five new members will be announced in the fall.)
The DPA advocates an opt-in system, meaning that data can only be used after consent has been explicitly given by creators and rights holders. This represents a significant departure from how most large AI companies operate. have developed their own Opt-out systemswhich place the burden on data owners to remove their work on a case-by-case basis. Others offer no option for removal at all.
The DPA, which expects its members to adhere to its membership rule, sees this as the most ethical path. “Artists and creators should be on board,” says Alex Bestall, CEO of Rightsify and music data licensing company Global Copyright Exchangewho spearheaded the initiative. Bestall sees membership as a pragmatic and moral approach: “Selling publicly available datasets is a way to get sued and have no credibility.”
Ed Newton-Rex, a former AI executive who now runs the nonprofit AI Ethics Fairly well trainedcalls opt-outs “fundamentally unfair to creators,” adding that some may not even know when opt-outs are offered. “It’s particularly positive to see the DPA requiring opt-ins,” he says.
Shayne Longpre, the leader of the Data Provenance Initiativea collective of volunteers who vet AI datasets, sees the DPA’s efforts to ethically obtain data as admirable, though he worries that the opt-in standard is a tough sell, given the sheer volume of data most modern AI models require. “In this regime, you’re either going to run out of data or you’re going to have to pay a lot,” he says. “It could be that only a few players, big tech companies, can afford to license all of this data.”
In its report, the DPA argues against government licensing and instead advocates for a “free market” approach in which data creators and AI companies negotiate directly. Other guidelines are more detailed. For example, the alliance suggests five potential compensation structures to ensure that creators and rights holders are appropriately compensated for their data. These include a subscription-based model, a “usage-based license” (in which fees are paid per use), and a “results-based license,” in which royalties are tied to profits. “These models could work for everything from music to images to film, television, or books,” Bestall says.