Google is making SynthID Text, its technology that lets developers watermark and detect text written by generative AI models, widely available.
SynthID Text can be downloaded from the AI platform Cuddly face and Google updated Responsible GenAI Toolkit.
“We are open-sourcing our SynthID Text watermarking tool,” the company wrote in a statement. job on X. “Available for free to developers and businesses, it will help them identify their AI-generated content. »
So how exactly does SynthID Text work?
From a prompt like “What is your favorite fruit?” “, text generation models predict which “token” is likely to follow another, one token at a time. Tokens, which can be a single character or word, are the building blocks that a generative model uses to process information. A model assigns each possible token a score, which is the percentage chance that it will be included in the output text. SynthID Text inserts additional information into this token distribution by “modulating the probability of token generation,” Google explains.
“The final pattern of scores for the model’s word choices combined with the adjusted probability scores is considered the watermark,” the company wrote in a statement. blog post. “This pattern of scores is compared to the expected pattern of scores for watermarked and non-watermarked text, helping SynthID detect whether an AI tool generated the text or whether it may have come from other sources.”
Google claims that SynthID Text, which has been integrated into its Gemini models since this spring, does not compromise the quality, accuracy or speed of text generation and even works on text that has been cropped, paraphrased or edited.
But the company also admits that its approach to tattooing has limitations.
For example, SynthID Text does not work as well with short text, with text rewritten or translated from another language, or with answers to factual questions. “When it comes to responding to factual prompts, there is less opportunity to adjust token distribution without affecting factual accuracy,” the company explains. “This includes prompts such as ‘What is the capital of France?’ ” or requests for which little or no variation is expected, such as “recite a poem by William Wordsworth.” »
Google isn’t the only company working on AI text watermarking technology. OpenAI has for years research watermark methods, but delayed their release based on technical and commercial considerations.
Watermarking techniques for text, if widely adopted, could help reverse the trend of inaccurate – but increasingly popular – “AI detectors” that false flag essays and articles written in a more generic voice. But the question is, will they be widely adopted and will one organization’s proposed standard or technology prevail over others?
There may soon be legal mechanisms that will force developers’ hands. The Chinese government has introduced mandatory watermarking of AI-generated content, and the state of California is I’m trying to do the same.
The situation is urgent. According to According to a report by the European Law Enforcement Agency, 90% of online content could be synthetically generated by 2026, leading to new law enforcement challenges around disinformation, propaganda, fraud and deception. Already, according to AWS, almost 60% of all sentences on the web could be generated by AI. study — thanks to the widespread use of AI translators.
TechCrunch offers a newsletter focused on AI! Register here to receive it in your inbox every Wednesday.