Evergreen 13 min read

How Deepfakes Are Made and Why Deepfake Detection Is Structurally Failing

Neural network visualization representing deepfake detection technology challenges
Reading mode

Deepfake detection is failing. Not because the detectors are poorly built, but because the problem they are trying to solve is getting structurally harder with every generation of AI model. The volume of deepfakes has grown from roughly 500,000 in 2023 to an estimated 8 million in 2025, with annual growth nearing 900%. And each new fake is harder to catch than the last.

This is not a temporary gap that better software will close. It is a fundamental asymmetry: creating a convincing fake is getting cheaper and easier, while detecting one is getting more expensive and less reliable. Understanding why requires looking at how deepfakes are actually made, and what detection systems are up against.

How deepfake detection targets are built

At its core, a deepfake replaces one person’s face, voice, or body in media with someone else’s. The most common approach uses a type of AI called an autoencoderA neural network that compresses data into a compact representation, then reconstructs it. Used in deepfakes to map one person's expressions onto another person's identity.. Think of it as a two-part system: an encoder that compresses a face into a kind of abstract sketch, and a decoder that rebuilds a face from that sketch.

The trick is training a single encoder on two different faces, while giving each face its own decoder. Once trained, you feed Face A’s sketch into Face B’s decoder, and out comes Face B’s features mapped onto Face A’s expressions. The result: a video where one person appears to be someone else entirely.

A more powerful approach uses Generative Adversarial Networks, or GANs. Here, two AIs compete: a “generator” creates fakes and a “discriminator” tries to spot them. They train each other relentlessly until the fakes become indistinguishable from real media, even to the discriminating AI itself.

The newest generation uses diffusion modelsAn AI model that generates images or video by learning to gradually remove noise from random data. It produces highly realistic synthetic media and underlies tools like Stable Diffusion., the same technology behind image generators like Stable Diffusion. These models work by learning to add and remove noise from images, and they produce results with unprecedented detail and consistency.

Why deepfakes got so good so fast

Three shifts converged to accelerate the problem dramatically.

First, modern video models learned to separate identity from motion. Earlier deepfakes mapped one face onto another frame by frame, producing telltale flicker, warping, and distortions around the eyes and jawline. Current models understand a person’s identity as an abstract concept separate from how they move, so the same identity can be animated with entirely different motions. The result: stable, coherent faces without the structural distortions that once served as reliable forensic evidencePhysical evidence collected from a crime scene and analyzed scientifically to establish facts or reconstruct events; includes biological materials, trace evidence, and physical objects examined by forensic specialists..

Second, voice cloning crossed what researchers call the “indistinguishable threshold.” A few seconds of audio now produce a convincing clone, complete with natural intonation, rhythm, pauses, and breathing noise. Three seconds of audio can produce an 85% voice match to the original speaker.

Third, consumer tools pushed the technical barrier almost to zero. Tools like OpenAI’s Sora 2 and Google’s Veo 3 mean anyone can describe an idea, have a large language modelA machine learning system trained on vast amounts of text that predicts and generates human language. These systems like GPT and Claude exhibit surprising capabilities but also make confident errors. draft a script, and generate polished video in minutes.

Deepfake detection: why it keeps falling behind

Early deepfake detection worked by looking for artifacts: unnatural blinking, mismatched lighting, blurry edges around the face. As the fakes got better, those artifacts disappeared. Detectors adapted by looking for subtler statistical signatures invisible to the human eye. But this approach has a fundamental problem.

Detection models are trained on known deepfake datasets. When tested on fakes from a different source or a newer model, their accuracy collapses. This is the cross-dataset generalization problem, and it is the Achilles heel of the entire detection paradigm. A detector that scores above 90% accuracy on the data it was trained on can drop significantly when tested on a different dataset, a well-documented challenge in deepfake forensics research.

In real-world conditions, the picture is even worse. AI detection tools lose 45 to 50% of their effectiveness when deployed against deepfakes outside controlled lab settings. Compression, resizing, social media re-encoding, and screen recordings all strip away the subtle signals that detectors rely on.

Meanwhile, humans fare no better. A 2025 iProov study tested 2,000 consumers and found that only 0.1% could accurately identify all deepfakes and real media across images and video. Human detection rates for high-quality video deepfakes sit at just 24.5%, barely above chance. And despite their poor performance, people remain overconfident in their detection abilities, rating themselves above 60% regardless of whether their answers were correct.

The real-world damage is already here

In February 2024, an employee at British engineering firm Arup was tricked into transferring $25 million after a video call where the CFO and other colleagues were all deepfakes. The employee had initially been suspicious of a phishing email, but the video call overrode his doubts because the participants looked and sounded exactly like people he knew.

That incident is part of a broader trend. Fraud attempts using deepfakes have increased by 2,137% over the last three years. Businesses lost an average of nearly $500,000 per deepfake incident in 2024, and U.S. fraud losses from generative AI are projected to climb from $12.3 billion in 2023 to $40 billion by 2027.

The NSA, FBI, and CISA have jointly warned that threats from synthetic media have “exponentially increased,” presenting a growing challenge to national security systems and critical infrastructure.

What comes next: provenance over deepfake detection

If detecting fakes after the fact is a losing game, the alternative is proving authenticity at the source. This is the idea behind the Coalition for Content Provenance and Authenticity (C2PA), an open standard that attaches cryptographic provenance data to media at the moment of creation. Think of it as a tamper-evident seal: not checking whether something is fake, but proving that something is real.

C2PA embeds a signed manifest into images, video, and audio, recording where the media was created, what tools were used, and whether AI was involved. If any part of the content or its provenance data is tampered with, the signature breaks. Major technology and media companies have begun adopting the standard.

But the C2PA specification is explicit about its own limitations: it is “not a cure-all for misinformation” and “complements media literacy, fact-checking, and digital forensicsThe practice of extracting, preserving, and analyzing electronic evidence. In criminal investigations, digital forensics can recover deleted files, trace communications, and authenticate digital materials. approaches.” It only works when the entire chain, from camera to platform, supports it. Media without provenance data is not automatically fake, it is simply unverified.

As deepfake researcher Siwei Lyu puts it: “Simply looking harder at pixels will no longer be adequate.” The defense has to move from analyzing content to authenticating it.

Deepfake detection is structurally losing ground to deepfake generation. This is not a resource or talent problem. It is an asymmetry built into the mathematics of the task itself: generative models optimize for perceptual indistinguishability, while detection models must generalize across an unbounded space of generation techniques. The volume of deepfakes has grown from roughly 500,000 in 2023 to an estimated 8 million in 2025, with annual growth nearing 900%, and each generation of model closes the gap between synthetic and authentic media further.

Generation architectures: autoencodersA neural network that compresses data into a compact representation, then reconstructs it. Used in deepfakes to map one person's expressions onto another person's identity., GANs, and diffusion modelsAn AI model that generates images or video by learning to gradually remove noise from random data. It produces highly realistic synthetic media and underlies tools like Stable Diffusion.

The original deepfake pipeline used paired autoencoders. A shared encoder maps face images to a latent space, while separate decoders reconstruct specific identities from that shared representation. Face-swapping works by routing a source identity’s latent code through a target identity’s decoder. The shared encoder forces both decoders to agree on a common latent structure for facial attributes like pose, expression, and lighting, which means the swap preserves the source’s expressions while rendering the target’s identity.

GANs improved on this by adding adversarial training. A generator produces synthetic faces while a discriminator learns to distinguish them from real images. The two networks are trained jointly in a minimax game: the generator minimizes the discriminator’s accuracy while the discriminator maximizes it. At convergence, the generator’s output distribution should theoretically match the real data distribution. Architectures like StyleGAN introduced style-based synthesis, allowing fine-grained control over identity, pose, and texture at different resolutions through adaptive instance normalization.

Diffusion models represent the current state of the art. These models learn the reverse of a fixed Markov chain that progressively adds Gaussian noise to data. During generation, the model iteratively denoises a random noise vector, conditioned on text prompts or reference images, to produce the output. The denoising process operates in a learned latent space (in latent diffusion models like Stable Diffusion) rather than pixel space, making generation both faster and more controllable. Diffusion models have demonstrated superior mode coverage compared to GANs, reducing artifacts like mode collapse while achieving higher fidelity.

Why temporal coherence changed everything

Early deepfake video suffered from frame-level inconsistencies: flicker, warping, and structural distortions around high-frequency regions like the eyes and jawline. These artifacts were reliable forensic signals. Modern video generation models have eliminated these tells by disentangling identity representation from motion.

The key architectural innovation is separating the latent space into identity and motion subspaces. The identity encoder captures appearance-related features that remain constant across frames, while the motion encoder captures pose, expression, and dynamics. This disentanglement means the same motion sequence can be mapped to different identities, or a single identity can be animated with arbitrary motions, producing stable, coherent faces with temporally consistent lighting, skin texture, and micro-expressions.

Voice synthesis followed a parallel trajectory. Current systems need as little as three seconds of reference audio to generate an 85% voice match, capturing not just pitch and timbre but intonation patterns, rhythm, emphasis, pauses, and breathing noise. Researchers describe this as having crossed the “indistinguishable threshold” where perceptual tells have effectively disappeared for non-expert listeners.

Deepfake detection: the generalization crisis

Detection methods broadly fall into two categories: artifact-based and learning-based. Artifact-based detectors look for specific inconsistencies (blending boundaries, unnatural eye reflections, frequency-domain anomalies). Learning-based detectors train neural networks to classify media as real or synthetic.

Both approaches share a critical weakness: they overfit to the generation method present in their training data. This is the cross-dataset generalization problem. A CNN trained on one benchmark can achieve high accuracy on its test set but suffer significant degradation on fakes from a different generation pipeline. The detector learns to recognize the fingerprint of a specific generator, not the general property of being synthetic.

This problem is structural, not merely practical. Each new generation architecture leaves different statistical traces. A detector trained on GAN artifacts (periodic frequency patterns, truncation artifacts in the latent space) will miss diffusion-model artifacts entirely, and vice versa. The space of possible generation techniques is unbounded and expanding, while each detector is trained on a fixed, retrospective snapshot of that space.

Real-world deployment compounds the problem. AI detection tools lose 45 to 50% of their effectiveness outside controlled lab conditions. Social media re-encoding (typically JPEG compression at quality factors of 70-85 or H.264 re-encoding at variable bitrates), resolution downscaling, and screen capture all destroy the subtle statistical signatures that detectors rely on. Adversarial perturbations add another dimension: techniques like FGSM (Fast Gradient Sign Method) can significantly degrade detection accuracy in cross-dataset settings by adding imperceptible noise that exploits the detector’s learned decision boundaries.

The human detection failure

Automated detection is not the only line of defense failing. A 2025 iProov study testing 2,000 consumers found that only 0.1% could accurately identify all deepfakes and real media across image and video stimuli. Participants were primed to look for fakes, yet were 36% less likely to correctly identify a synthetic video compared to a synthetic image. For high-quality video deepfakes specifically, human detection rates are 24.5%, which is below the 50% baseline you would expect from random guessing in a balanced dataset.

This has direct operational consequences. In the Arup incident, a finance worker in Hong Kong joined a video call where the CFO and multiple colleagues were all deepfakes. Despite initial suspicion from a phishing email, the video call overrode his doubts. The result: 15 transactions totaling HK$200 million (approximately $25.6 million) sent to accounts controlled by fraudsters.

The broader fraud statistics reflect this vulnerability. Deepfake fraud attempts have increased by 2,137% over three years. Businesses lost an average of nearly $500,000 per incident in 2024. Attacks bypassing biometric authentication increased by 704% in 2023. The NSA, FBI, and CISA have jointly warned that synthetic media threats have “exponentially increased.”

The structural asymmetry

The core problem is an asymmetry in the optimization landscape. Generators are trained against a well-defined objective: minimize the statistical distance between generated and real data distributions. This is a convergent process. As training progresses, the generator’s output distribution approaches the real data distribution, and any detectable difference between synthetic and real media shrinks toward zero.

Detectors, by contrast, must solve an open-ended classification problem against an adversary that is constantly evolving. Every new architecture, training technique, or post-processing pipeline creates a new distribution of synthetic media. The detector must generalize across all of them, including ones that did not exist when it was trained. This is fundamentally harder than generation.

The market dynamics reflect this. While AI detection tool development grows at 28 to 42% annually, the threat expands at 900% or more. U.S. fraud losses from generative AI are projected to reach $40 billion by 2027, up from $12.3 billion in 2023.

Provenance as the architectural alternative

If post-hoc detection is structurally disadvantaged, the alternative is pre-hoc authentication. The Coalition for Content Provenance and Authenticity (C2PA) defines an open standard for cryptographically binding provenance metadata to digital assets. A C2PA Manifest contains assertions about the asset’s origin, modification history, and AI involvement, signed with the private key of the creating or editing software. The manifest is typically embedded directly in the asset, with optional soft binding through invisible watermarks for durability across format conversions.

Verification checks three properties: the manifest is structurally valid (well-formed), the content has not been modified since signing (hash integrity), and the signer is on a recognized trust list (chain of trust). If any part of the asset or manifest is tampered with, the cryptographic hash breaks and the verification fails.

The C2PA specification is explicit about scope: it is “not a cure-all for misinformation” and makes no value judgments about content. Media without a manifest is not flagged as fake; it is simply unverifiable. The approach shifts the question from “is this synthetic?” to “can this be traced to a trusted source?” It only works when the entire chain, from capture device to distribution platform, supports the standard.

The adoption gap remains significant. Most existing media lacks provenance data, and bad actors have no incentive to attach credentials to their output. But as Siwei Lyu notes, the meaningful line of defense will depend on “infrastructure-level protections” rather than human judgment or pixel analysis. The NSA and allied agencies have recommended content credentials as part of a layered defense strategy against synthetic media threats.

Detection will remain part of the toolkit, particularly for forensic investigation. But as the primary gatekeeping mechanism against synthetic media at scale, it is structurally overmatched. The long-term answer is not better detectors. It is an ecosystem where authenticity is the default, and media without provenance is treated with appropriate skepticism.

How was this article?
Share this article

Spot an error? Let us know

Sources