DeepfakeA synthetic image, video, or audio created using artificial intelligence to replace a person's likeness with someone else's, often making it difficult to distinguish from authentic content. detection has become one of the most urgent challenges in digital security, and the numbers tell a stark story. Financial losses from deepfake fraud have reached $1.56 billion, with over $1 billion of that occurring in 2025 alone[s]. The technology industry’s primary solution, invisible watermarks that tag AI-generated content, has a fundamental flaw: researchers have demonstrated they can remove these watermarks without even knowing they exist.
The Promise of Watermarks
The basic idea behind watermarking sounds reasonable. AI companies like Google, Meta, and OpenAI embed invisible digital signatures into content their systems generate. These signatures are supposed to be imperceptible to humans but readable by detection tools. The Coalition for Content Provenance and Authenticity (C2PA), a consortium of over 6,000 members including Adobe, Microsoft, and Intel, has created a standard for tracking where digital content came from[s].
Governments have embraced this approach. The EU AI Act, with transparency provisions requiring synthetic media to carry machine-readable labels taking effect in August 2026, represents the regulatory push in this direction[s]. In the US, legislation requiring watermarks on AI-generated content is moving through Congress. The assumption is that if we can tag fake content at the source, we can identify it downstream.
Why Watermarks Fail
In July 2025, researchers at the University of Waterloo published a tool called UnMarker that exposes the core weakness of this entire approach. The tool can remove any AI image watermark without needing to know how the watermark was encoded, or even whether the image is watermarked at all[s].
UnMarker works by analyzing where pixel frequencies in an image are unusual, the signature left by watermarking systems. It then distorts those frequencies slightly, making the image unrecognizable to watermark detectors while appearing identical to human eyes. In tests, it succeeded more than 50% of the time against major systems including Google’s SynthID and Meta’s Stable Signature[s].
The C2PA standard has its own problems. Its provenance data is stored as metadata attached to files rather than embedded in the content itself. Images frequently lose their metadata when shared across platforms[s]. Converting a file from one format to another, or simply taking a screenshot, strips all provenance information entirely[s].
A Fragmented System
Even if watermarks were robust, deepfake detection through watermarking only works if everyone uses the same system. They do not. Google’s SynthID only detects content made with Google’s AI services. Meta has its own system. OpenAI has another[s]. Someone can generate a deepfake using an open-source model or a lesser-known tool, and none of these detection systems will flag it.
Commercial services already exist that will remove watermarks for a fee[s]. The University of Maryland found that watermarks can not only be removed, but added to real images to falsely flag them as AI-generated[s]. This means watermarks could be weaponized to discredit legitimate content.
Real-World Consequences
These technical failures translate directly into real harm. In February 2024, a finance worker at Arup, the engineering firm behind the Sydney Opera House and Beijing’s Bird’s Nest stadium, received a video call invitation from someone claiming to be the company’s chief financial officer. On the call, every participant, the CFO and several colleagues, appeared and sounded exactly as the employee expected. All of them were deepfakes. The employee authorized 15 wire transfers totaling $25 million[s].
The cost of creating such deepfakes has collapsed. Voice cloning now costs as little as $0.01 per minute, and only three seconds of recorded audio is needed to clone someone’s voice[s].
What Actually Works
Deepfake detection methods that analyze the content itself, rather than looking for watermarks, show more promise. Intel’s FakeCatcher examines subtle color changes in facial pixels caused by blood flowing through veins, a signal called photoplethysmographyA technique for measuring blood flow by detecting subtle color changes in skin caused by the heart pumping blood through veins.. Real human faces show microscopic color fluctuations as the heart pumps blood; deepfakes do not replicate this pattern[s]. In testing, FakeCatcher achieved 91% accuracy[s].
A key advantage of this approach: it cannot be easily reverse-engineered. Attackers training AI systems to evade detection need to understand exactly what the detector is looking for. FakeCatcher’s method is mathematically non-differentiable, meaning attackers cannot simply train their deepfake generators to defeat it[s].
The deepfake detection market is projected to grow from $5.5 billion in 2023 to $15.7 billion in 2026[s]. That growth reflects a hard truth: watermarking was always a compliance measure, not a security measure. Protecting against sophisticated fraud requires detection systems that work regardless of whether the attacker cooperates.
The forensic science of deepfakeA synthetic image, video, or audio created using artificial intelligence to replace a person's likeness with someone else's, often making it difficult to distinguish from authentic content. detection faces a fundamental asymmetry. Defenders rely primarily on watermarking schemes that assume adversary cooperation, while attackers need only one successful evasion method. Financial losses from deepfake-enabled fraud have reached $1.56 billion, with over $1 billion occurring in 2025 alone[s], a trajectory that exposes the structural inadequacy of current authentication standards.
The Watermarking Architecture
The Coalition for Content Provenance and Authenticity (C2PA) specification uses X.509 digital certificates and cryptographic hashing to sign provenance manifestsCryptographically signed metadata records that track the creation tools, authorship, and edit history of digital content.. These manifests record creation tools, declared authors, and edit histories. The architecture has three components: assertions about provenance, cryptographic signatures binding those assertions to identities, and content hashes linking manifests to specific files[s].
Google’s SynthID works differently depending on content type. For text, it adjusts token probability distributions during generation, creating statistical patterns invisible to readers but detectable algorithmically. For images and video, it embeds invisible watermarks designed to survive cropping, filtering, and lossy compressionA method of reducing data size by permanently discarding some information, accepted because the result is close enough for practical use. Contrasted with lossless compression, which preserves the original exactly.. For audio, it embeds inaudible signatures that persist through noise addition and format conversion[s].
Deepfake Detection via Watermarks: The Attack SurfaceThe total set of points in a system where an attacker can attempt to enter, extract data, or cause damage.
UnMarker, published in the proceedings of the 46th IEEE Symposium on Security and Privacy, demonstrates a universal attack on defensive watermarking. The tool requires no knowledge of the watermarking algorithm, no access to internal parameters, and no interaction with detectors[s].
The attack exploits a constraint inherent to all watermarking schemes. To preserve image quality, watermarks must be invisible to humans. To resist manipulation, they must be robust against common transformations. These requirements force watermarks to operate in the spectral domain, subtly manipulating how pixel intensities vary across the image[s]. UnMarker identifies these spectral anomalies statistically, then applies targeted frequency distortion that destroys the watermark while remaining imperceptible to human vision.
In empirical tests, UnMarker achieved greater than 50% success rates against Google’s SynthID and Meta’s Stable Signature without prior knowledge of watermarking methods or image origins[s].
C2PA Metadata Vulnerabilities
The C2PA standard stores manifests as metadata attached to files in JUMBF format for JPEG, or dedicated boxes for PNG and MP4. This metadata-based approach has several failure modes:
- Platform stripping: Images commonly lose C2PA metadata when shared across social platforms[s]
- Format conversion: Converting from WebP to PNG, or any similar transformation, breaks the provenance chain entirely[s]
- Screenshot bypass: Screen capture creates a new file with no reference to the original manifest[s]
- Trust model weakness: The specification permits self-signed certificates and certificates from non-trusted CAs, allowing anyone to sign content with manifests that appear technically valid[s]
Research at the University of Maryland demonstrated that watermarks can be added to human-generated images, triggering false positives that could be weaponized to discredit authentic content[s].
Ecosystem Fragmentation
SynthID only detects content generated by Google’s AI services: Gemini for text, Veo for video, Imagen for images, Lyria for audio. Content from ChatGPT, open-source models like Stable Diffusion, or custom pipelines produces no SynthID signal[s]. Each major AI provider has developed proprietary watermarking, creating a fragmented landscape where verification requires multiple tools that may produce conflicting results.
Case Study: Multi-Participant Deepfake Fraud
In February 2024, an Arup employee in Hong Kong received what appeared to be a video conference with the company’s CFO and colleagues. All participants were deepfake recreations generated from publicly available video and audio. The employee authorized 15 wire transfers totaling $25 million before the fraud was discovered[s].
Arup’s global CIO noted that “the number and sophistication of these attacks has been rising sharply in recent months”[s]. The economics favor attackers: voice cloning costs $0.01-$0.20 per minute, and three seconds of recorded audio suffices to clone a voice[s].
Content-Based Deepfake Detection
Detection methods that analyze content itself, rather than metadata or watermarks, show structural advantages. Intel’s FakeCatcher uses remote photoplethysmographyA technique for measuring blood flow by detecting subtle color changes in skin caused by the heart pumping blood through veins. (PPG) to detect blood flow signals in facial video. PPG signals appear across all skin regions, not just specific facial features, and cannot be eliminated by changing illumination[s].
Critically, generative operations destroy the spatial, spectral, and temporal correlations that characterize genuine PPG signals. Any synthetic manipulation introduces noise patterns that disrupt these correlations. FakeCatcher achieved 91% accuracy in testing, nearly nine percentage pointsA unit of measure for arithmetic differences between percentages, distinct from percentage change. above the next-best system[s].
The method has an additional security property: it is non-differentiable, meaning adversarial training cannot be easily applied. Attackers using gradient-based optimization to evade detection require a differentiable detection function. FakeCatcher’s PPG analysis pipeline resists this attack vector[s].
Detection Arms Race
Current deepfake detection tools claim accuracy rates above 90%, but these benchmarks face a moving target. Open-source generative models allow attackers to iterate rapidly, and automated content generation can overwhelm detection pipelines that require human review for edge cases[s].
The deepfake detection market is projected to grow 42% annually, from $5.5 billion in 2023 to $15.7 billion in 2026[s]. This growth reflects institutional recognition that watermarking, while useful for provenance tracking in cooperative scenarios, cannot serve as a primary defense against adversarial deepfakes. Robust detection requires analyzing biological and physical signals that current generative models cannot faithfully reproduce.



