logo
ResearchBunny Logo
Scalable watermarking for identifying large language model outputs

Computer Science

Scalable watermarking for identifying large language model outputs

S. Dathathri, A. See, et al.

Discover how Sumanth Dathathri and colleagues at Google DeepMind have tackled the challenges of identifying AI-generated content through their innovative SynthID-Text watermarking scheme. This groundbreaking research ensures high-quality synthetic text generation while maintaining detection accuracy and speed.

00:00
00:00
Playback language: English
Abstract
Large language models (LLMs) generate high-quality synthetic text, posing challenges for identifying AI-generated content. This paper introduces SynthID-Text, a production-ready text watermarking scheme that preserves text quality and enables high detection accuracy with minimal latency. SynthID-Text modifies the LLM's sampling procedure, integrating watermarking with speculative sampling for scalability. Evaluations across multiple LLMs demonstrate improved detectability over existing methods without impacting LLM capabilities. A live experiment with nearly 20 million Gemini responses confirms text quality preservation.
Publisher
Nature
Published On
Oct 23, 2024
Authors
Sumanth Dathathri, Abigail See, Sumedh Ghaisas, Po-Sen Huang, Rob McAdam, Johannes Welbl, Vandana Bachani, Alex Kaskasoli, Robert Stanforth, Tatiana Matejovicova, Jamie Hayes, Nidhi Vyas, Majd Al Merey, Jonah Brown-Cohen, Rudy Bunel, Borja Balle, Taylan Cemgil, Zahra Ahmed, Kitty Stacpoole, Ilia Shumailov, Ciprian Baetu, Sven Gowal, Demis Hassabis, Pushmeet Kohli
Tags
large language models
text watermarking
AI-generated content
synthetic text
detection accuracy
sampling procedure
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny