Disclosure: The views and opinions expressed herein belong solely to the author and do not represent the views and opinions of crypto.news editorial.
Elon Musk sued OpenAI for its alleged misappropriation of the mission to develop AGI’for the good of humanity.“Carlos E. Pérez suspects the lawsuit could transform the current market leader in generative AI into the next WeWork.
OpenAI’s transformation to for-profit is at the center of this legal battle. However, the excess the focus is on profit betrays corporate interests. It also distracts from more critical concerns for end users, namely ethical AI training and data management.
Grok, Elon’s brainchild and ChatGPT competitor, can access “real-time information” from tweets. OpenAI is infamous for grabbing copyrighted data left, right and center anyway. Now, Google has struck a $60 million deal to access Reddit user data to train Gemini and Cloud AI.
Simply promoting open source does not serve the user’s interest in this environment. They need ways to ensure meaningful consent and compensation to help train LLMs. Emerging platforms creating tools to collect AI training data, for example, are key in this regard. We’ll talk about this later.
It is mainly non-profit for users
On 5.3 billion people use the internet all over the world and about 93% of them use centralized social media. It is therefore likely that most 147 billion terabytes of data produced online in 2023 were generated by users. The volume is expected to exceed 180 billion by 2025.
Although this massive body of data or “publicly available information” powers the training and evolution of AI, users do not reap the benefits for the most part. They have neither control nor real ownership. The way of giving consent “I agree” also makes no sense: it is deception at best and coercion at worst.
Data is the new oil. It is not in Big Tech’s interest to give end users more control over their data. On the one hand, paying users for data would significantly increase LLM training costs, which is on 100 million dollars anyway. However, like Chris Dixon argues In “Read, write, own”, five large companies controlling and potentially “to spoil everything” is the fast track to dystopia.
However, given the evolution of blockchains as a distributed data layer and source of truth, the best era for users is only just beginning. More importantly, unlike big companies, new-age AI companies are adopting such alternatives for better performance, profitability, and ultimately, the betterment of humanity.
Crowdsourced data for ethical AI training
Web2 reading-writing-confidence the model is based on entities and stakeholders not to be mean. But human greed knows no bounds: we are all a bunch of “”interested rascals‘, according to the 18th century philosopher David Hume.
Web3 read-write-own The model therefore uses blockchain, cryptography, etc., so that participants in the distributed network I can’t be bad. Chris explores this idea in depth in his book.
The Web3 technology stack is fundamentally community-oriented and user-driven. Providing the toolkit for users to regain control of their data (financial, social, creative and others) is a fundamental principle in this field. Blockchains, for example, serve as distributed, verifiable data layers to settle transactions and immutably establish provenance.
Additionally, viable privacy and security mechanisms, such as proofs of zero knowledge (zkProofs) or multi-party calculation (MPC) have evolved in recent years. They open new avenues in terms of validation, sharing and management of data by allowing counterparties to establish truths without revealing their content.
These broad capabilities are very relevant in an AI training PoV. It is now possible to obtain reliable data without relying on centralized providers or validators. But more importantly, the decentralized, non-intermediary nature of Web3 makes it possible to directly connect those who produce data (i.e. users) and the projects that need it to train AI models.
Removing “trusted intermediaries” and gatekeepers significantly reduces costs. It also aligns incentives so projects can reward users for their efforts and contributions. For example, users can earn cryptocurrencies by completing microtasks such as recording scripts in their native dialect, recognizing and labeling objects, sorting and categorizing images, structuring data unstructured, etc.
Businesses, on the other hand, can create more accurate models using high-quality data validated by humans in the loop and at a fair price. It’s a win-win.
Bottom-up advancements, not just open source
Traditional frameworks are so hostile to individuals and user communities that simply being open source means nothing as such. Radical changes to existing business models and training frameworks are needed to ensure ethical AI training.
Replacing top-down systems with a grassroots, bottom-up approach is the way forward. It is also about establishing a meritocratic order that places great importance on ownership, autonomy and collaboration. In this world, equitable distribution is the most profitable solution, not maximization.
Interestingly, these systems will benefit large businesses as well as small businesses and individual users. Because after all, high-quality data, fair prices, and accurate AI models are things everyone needs.
Now, with incentives aligned, it is in the common interest of the industry to embrace and embrace new-age models. Sticking to narrow, short-sighted gains won’t help in the long run. The future has different demands than the past.