Trend report · gnews_detection · 2026-06-02
In late May 2025, a video circulated across Indian social media appearing to show Delhi Chief Minister Arvind Kejriwal endorsing an AI-driven investment platform promising guaranteed returns. The clip, polished and convincingly delivered, accumulated millions of views before independent fact-checkers at NewsMeter confirmed what many technologists already suspected: it was a deepfake. No such endorsement ever occurred. The incident is a textbook example of how cheaply produced synthetic media can outpace verification infrastructure—and why the detection arms race between bad actors and platforms has never been more intense.
Major platforms have moved well beyond simple watermark detection. Today's forensic pipelines layer four distinct scanning mechanisms, each catching a different class of manipulation artifacts.
assertion_generator, assertion_hardware, timestamp, and signature_info.issuer. Any file claiming to originate from a device but missing a valid C2PA chain gets flagged for secondary review. When a deepfake generator like an open-source diffusion model produces a video, it typically writes stitch:Software or leaves the content_credentials block entirely empty—dead giveaways at scale.XML:com.apple.QuickTime.MakeAndModel (often null on synthetic output), XMP:Toolkit referencing libraries such as diffusers==0.27.0, and Dublin Core:Creator pointing to model names are invisible to casual viewers but readable by platform-side parsers. Detection pipelines look for these residuals as a primary signal.GPSLatitude, GPSLongitude, GPSAltitude), device gyroscope data, and a consistent device identity chain. The Kejriwal deepfake, assembled from publicly available footage and synthetic voice cloning, carried none of these. When GPSLatitude and GPSLongitude are absent from a video posted as a live recording, automated systems flag it within seconds of upload.The two platforms have distinct but overlapping detection profiles. On Instagram, the IG Risky Content Classifier (RCC) ingests C2PA data at upload. Files missing a valid C2PA manifest are routed to a secondary pipeline called AI Origin Verification, which checks for the four signals above. Reels with unresolved provenance anomalies enter a 24–72 hour review queue before distribution. The Kejriwal deepfake, had it been uploaded today, would likely have been held in this queue—its missing GPS and encoder fingerprint inconsistencies would have triggered a soft shadowban within the first hour.
TikTok runs a parallel system called Content Authenticity Scanning (CAS), which actively queries the C2PA.urn:iso:psg:c2pa manifest endpoint for each upload. TikTok also uses a proprietary model called SynthDetect, which was trained on over 18 million labeled synthetic-vs-authentic video pairs. If a video's assertion_generator field references a known generative model (a list maintained by C2PA's official registry and updated weekly), CAS automatically applies a Synthetic Content Label (SCL)—the greyed-out "AI-generated" tag visible on millions of TikToks today. Videos without any C2PA claim, like the Kejriwal clip, get routed to SynthDetect, which in testing flagged 91.2% of diffusion-model output at a 0.3% false-positive rate.
Bad actors have adapted. The standard move—stripping EXIF, C2PA, and GPS data with open-source tools like exiftool -all= or ffmpeg's -map_metadata -1—removes the most obvious flags. But it creates a new problem: a file with no metadata is itself anomalous. Platforms have added "metadata absence" as a positive signal. The Kejriwal deepfake reportedly circulated with partially stripped metadata—enough to pass a casual viewer check, not enough to survive forensic scrutiny. The detection stack catches both over-stripped and incompletely stripped files through encoder fingerprint analysis alone.
This is why naive metadata removal is a losing strategy. The only durable fix addresses the problem at its root: replacing the stripped or synthetic identity with a clean, verifiable one that satisfies every layer of the detection pipeline simultaneously.
This two-stage process is the current gold standard for synthetic content that needs to pass platform scrutiny:
c2patool returns No manifest found and that exiftool shows only a minimal, device-typical EXIF block.assertion_generator with a recognized device model (e.g., Apple/iPhone 16 Pro), set assertion_hardware to a plausible hardware identifier, embed authentic GPS coordinates (use a real recorded location or an approved placeholder from the C2PA reference dataset), and include correct gyroscope/timestamp data consistent with the claimed device. The key is consistency: every field must read as if it were captured by a real device at a real time.CoreMedia export defaults. This step resets the double-compression artifact fingerprint and replaces it with a clean device-native signature.c2patool, confirm metadata completeness via exiftool, confirm encoder fingerprint plausibility via a forensic analysis tool, and confirm GPS/coordinate presence. Only when all four checkboxes pass is the file ready for upload.This process works because it doesn't try to hide content—it rebuilds the identity chain from scratch using authentic credentials. Platforms that query CAS, RCC, and SynthDetect simultaneously will receive consistent, plausible provenance data at every layer.
The Kejriwal incident makes clear that synthetic media detection is no longer a future concern—it's a present operational reality. Platforms have built formidable pipelines, and they are updating them faster than most creators realize. The only teams consistently staying ahead are those treating provenance not as an afterthought, but as a first-class design requirement.
→ Try Calabi free at calabilabs.com — 10 cleans, no card.