Deepfakes scale in 2025 – here's what's going to happen next

During 2025, deepfakes improved dramatically. The quality of AI-generated faces, voices, and full-body performances that mimic real people has grown far beyond what many experts expected just a few years ago. They also started being used increasingly to deceive people.

For many everyday scenarios – particularly low-resolution video calls and media shared on social media platforms – their realism is now high enough to credibly fool non-expert viewers. In practice, synthetic media have become indistinguishable from authentic recordings to ordinary people and, in some cases, even institutions.

And this leap is not limited to quality. Volume of deepfakes has grown explosively: Cybersecurity fiRM DeepStrike estimates that online deepfakes will increase from about 500,000 in 2023 to about 8 million in 2025, with annual growth approaching 900%.

I'm a computer scientist who researches deepfakes and other synthetic media. From my vantage point, I think the situation is likely to get worse in 2026 as deepfakes become synthetic actors capable of reacting to people in real time.

dramatic improvement
There are many technological changes behind this dramatic increase. At first, video realism made a signfiThanks to a video generation model specifically designed to maintain temporal stability, jumps cannot be made. These models produce videos that have consistent motion, consistent identity of the people depicted, and understandable content from one frame to the next. Models separate information about representing a person's identity from information about movement so that the same movement can be mapped to different identities, or the same identity can have multiple types of movement.

These models produce stable, consistent faces without the flicker, distortion or structural distortions around the eyes and jaw that once served as reliable forensic evidence of deepfakes.

Second, voice cloning has crossed what I would call the “impenetrable limit.” now a few seconds of audio suufffiCE to generate a solid clone – complete with natural tone, rhythm, loudness, emotion, pauses and breathing noises. This capability is already giving rise to large-scale fraud. Some major retailers report that they receive more than 1,000 AI-generated scam calls per day. Perceptive states that the synthetic sounds once taken for granted have largely disappeared.

Third, consumer devices have reduced the technological barrier to almost zero. A wave of upgrades and startups from OpenAI's Sora 2 and Google's Veo 3 mean anyone can describe an idea, let a big language model like OpenAI's ChatGPT or Google's Gemini draft a script, and produce polished audio-visual media in minutes. AI agents can automate the entire process. The ability to generate coherent, story-driven deepfakes at scale has been effectively democratized.

This combination of increasing volume and personalities that are almost indistinguishable from real humans creates serious challenges for detecting deepfakes, especially in a media environment where people's attention is divided and content moves faster than we can keep up with.fiEd. Damage has already been done in the real world – from misinformation to targeted harassment. financial scams – Enabled by deepfakes that spread before people have a chance to understand what is happening.

future is real time
Looking ahead, the trajectory for the next year is clear: deepfakes are moving toward real-time synthesis that can produce videos that closely resemble the nuances of human appearance, making it easier for them to evade detection systems. The boundary is shifting from static visual realism to temporal and behavioral coherence: models. Generate live or near-live content Instead of pre-rendered clips.

Identity modeling is converting to unified systems that represent not only what a person looks like, but also how they move, sound, and speak in different contexts. The results move from “This looks like person X” to “This behaves like person X over time.” I expect the entire video-call participants to be synthesized in real time; Interactive AI-powered actors whose faces, voices and mannerisms adapt instantly to a prompt; And scammers are deploying reactive avatars instead fixed video.

As these abilities mature, the perceptual difference between synthetic and authentic human media will diminish. A meaningful line of defense will be overcome by human judgment. Instead, it will rely on infrastructure-level security. These include secure provenance, such as cryptographically signed media, and AI content tools that use alliances to verify content provenance and authenticity. technology and innovationfiCation. It will also rely on multimodal forensic tools like my lab's DeepFake-O-Meter.

Simply looking at pixels more carefully is no longer enough.

Chat via Reuters Connect

 

Siwei Liu is a professor of computer science and engineering, and director of the UB Media Forensics Lab at the University at Buffalo.

Source link