Dataset Providers Alliance Releases Extensive Position Paper on AI Data Licensing

The Dataset Providers Alliance (DPA), a coalition of leading companies in AI data licensing, has released a comprehensive position paper on the ethical use of AI datasets.

This document establishes the DPA’s views on key issues in the field and provides a framework for promoting sustainable and responsible practices.

The early GenAI systems were primarily built using data scraped from publicly available internet sources. However, with increased awareness of privacy and intellectual property (IP) issues, data providers are now tightening restrictions and requiring formal licensing agreements.

In response to this shift, new startups are emerging to meet the growing demand for properly licensed data. While the startups are focused on solving the immediate need for licensed data, the DPA is working on a broader framework to ensure long-term fairness, transparency, and ethical practices in the AI industry.

DPA was established in June 2024, uniting various dataset providers and IP owners across various content types, including music, voice, text, video, and images. The founding members of the DPA include Calliope Networks, Global Copyright Exchange (GCX), Rightsify, vAIsual, Pixta AI, and Datarade.

“This position paper marks a significant step in articulating a unified vision for an ethical and pro-innovation approach to AI data licensing,” said Alex Bestall, CEO of Rightsify and GCX. “We’re outlining a clear and viable path forward that balances the needs of rights holders, dataset providers, and AI developers”

The position paper from DPA addresses key issues including content-based licensing, opt-ins, protection of likeness rights, direct licensing agreements for AI data, and ethical use of synthetic data.

In the paper, DPA argues against government-mandated licensing, arguing instead for a “free market” approach where AI companies can directly negotiate with data originators. According to the paper, government-mandated collective licensing could act as an “AI Tax” and stifle innovation.

DPA proposes five licensing models including usage-based licensing, outcome-based licensing, subscription model, hybrid licensing, and domain-specific licensing tailored for different industries.

Dave Davis, CEO of Calliope Networks, added, “The DPA is leading an important conversation to establish a legal and moral framework to ensure that content owners are compensated when their works are used for AI model training while supporting the development of artificial intelligence generally.”

The paper also highlights the importance of informed consent in data collection for AI. It proposes a robust and transparent opt-in mechanism that makes it clear how data will be used in AI training.

Additionally, opt-ins should enable individuals to choose specific use for their data and they should have access to resources that help them understand the potential implications of their choices.

To help balance privacy and innovation in AI, for issues such as likeness rights, DPA recommends implementing anonymization techniques, limited use agreements, and using AI to create realistic but consensual artificial likenesses for training purposes. DPA also advocates clear labeling for AI-generated likeness and the use of minimum necessary likeness data for AI training.

(hafakot/Shutterstock)

The DPA views licensed synthetic data as a key solution to addressing the anticipated shortage of real data for AI training, often referred to as the “data wall.” However, the alliance emphasizes proper licensing of the original data used to create synthetic data. It also proposes regular evaluations to address biases and ethical concerns.

The US introduced the NO FAKES Act last year and the Generative AI Copyright Disclosure Act this year to address the growing concerns about authenticity and intellectual property in the digital age. Trade associations such as DPA can play a key role in supporting legislative efforts by advocating for effective policy implementation.

The DPA faces the challenge of encouraging major industry players to adopt emerging ethical data licensing standards. However, its formation and proposals provide a framework for promoting greater transparency and responsible AI practices.

Related Items

AI Ethics Issues Will Not Go Away

Rapid GenAI Progress Exposes Ethical Concerns

Deloitte Study Reveals C-Level Executives’ Commitment to Ethical AI Frameworks

The post Dataset Providers Alliance Releases Extensive Position Paper on AI Data Licensing appeared first on Datanami.