Seven licensors of data sets of music, images, text, video and other content used to train AI systems have joined together to form the Dataset Providers Alliance (DPA) to foster ethical data sourcing and use practices.
The group’s goals include promoting transparency and standardization for licensing intellectual property (IP) content for AI and ML datasets, while also ensuring rights protection.
Founding members include music licensing company Rightsify, image licensing service vAIsual, Japanese stock photo provider Pixta, AI music generation company Global Copyright Exchange and data marketplace Datarade.
Alex Bestall, CEO of Rightsify, said the DPA “will serve as a powerful voice for dataset providers, ensuring that the rights of content creators are protected while AI developers have access to vast amounts of data of high-quality AI training”.
The DPA’s first initiative will be a white paper outlining licensing standards for datasets.
One of the concerns about using generative AI chatbots is the lack of clarity about the IP source code or open language used to collect information to develop the platforms.
Generative AI companies such as OpenAI are accused of mining the internet for data that can be used to train their large language models for free, leading to lawsuits over allegations copyright violation.
Google And Microsoft provide customers with protection against copyright infringement claims covering finished products or use of their AI training data.