How Does Deepfake Video Detection Work?
Video deepfake detection is the most technically demanding form of synthetic media forensics because it requires analyzing both spatial (per-frame) artifacts and temporal (frame-to-frame) inconsistencies simultaneously. AIGeneratedIt's video pipeline begins with intelligent frame sampling: rather than processing every frame (which is computationally prohibitive), the system identifies keyframes using motion vector analysis and samples at a density sufficient to detect manipulation in any segment of the video. Each sampled frame is run through EfficientNetV2-B3 and CrossEfficientViT classifiers trained on the FaceForensics++ and DFDC datasets.
A RestNext+LSTM temporal model then analyzes the sequence of per-frame classification scores to detect patterns that indicate manipulation. Real videos have smooth temporal coherence in facial textures, lighting, and skin tone. Deepfake face-swaps introduce micro-inconsistencies at blending boundaries that fluctuate between frames in characteristic ways. The LSTM layer learns to recognize these temporal fingerprints even when individual frames may look convincing in isolation.
For lip-sync deepfakes — where audio is replaced and the mouth region is re-synthesized to match — AIGeneratedIt applies audio-visual synchronization analysis inspired by SyncNet. This measures the statistical alignment between jaw movement velocity in the video and the phoneme timing of the audio track. Synthesized lip movements from tools like Wav2Lip and SadTalker produce subtle mismatches in this alignment that are imperceptible to viewers but detectable by the model.
Frequently Asked Questions
What is the FaceForensics++ benchmark?
FaceForensics++ (FF++) is the most widely used academic benchmark for video deepfake detection. It contains 1,000 original YouTube videos and their corresponding deepfake versions created with five manipulation methods: DeepFakes, Face2Face, FaceSwap, FaceShifter, and NeuralTextures. Videos are available at four compression levels. AIGeneratedIt achieves 97% accuracy on the hardest (c40, heavily compressed) subset of this benchmark.
Can the detector handle low-quality or heavily compressed video?
Yes. Heavy video compression (like H.264 at low bitrates used by social media platforms) is one of the main challenges for deepfake detection because it destroys high-frequency artifacts. AIGeneratedIt's compression forensics layer analyzes DCT block artifacts and macroblock boundaries to separate compression noise from manipulation artifacts, maintaining 94% accuracy on social media-quality video.
Does it detect full AI-generated videos like OpenAI Sora?
Yes. Fully AI-generated video from Sora, Runway Gen-3, Kling, Pika, and similar systems is detected through temporal coherence analysis and generative model fingerprinting. These systems produce subtle but consistent motion artifacts — particularly in hair, water, and background texture — that differ from natural video physics. Our models are updated within two weeks of major new video generation tool releases.
How long does a video scan take?
Videos under 3 minutes return results in under 30 seconds on the free tier. Pro users receive priority processing with results typically in under 10 seconds. Videos longer than 10 minutes are processed asynchronously and results are emailed. The system processes approximately 30 frames per second on GPU hardware.
Is this tool used by law enforcement or journalists?
Yes. AIGeneratedIt's video deepfake detector is used by investigative journalists, social media trust and safety teams, legal professionals, and government digital forensics units as a first-pass screening tool. Each scan generates a forensic report with per-frame scores, temporal consistency graphs, and identified manipulation regions that can be cited in formal investigations.
Deepfake Video Tools This Detector Covers
Our training dataset covers deepfake videos produced by the following manipulation and generation systems:
- Face Swap — DeepFaceLab, FaceSwap (open source), SimSwap, FaceShifter
- Face Reenactment — Face2Face, NeuralTextures, First Order Motion Model
- Lip Sync — Wav2Lip, SadTalker, DiffTalk, VideoReTalking
- Full Video Generation — OpenAI Sora, Runway Gen-2 and Gen-3, Kling, Pika Labs, Luma Dream Machine
- Avatar Generation — D-ID, HeyGen, Synthesia
- Consumer Apps — Reface, ZAO, FaceApp deep swap features