Trend report · gnews_detection · 2026-05-30

Over 50 victims identified in sexual AI deepfake investigation, police say - CTV News

Over 50 victims identified in sexual AI deepfake investigation, police say - CTV News

In December 2024, Canadian police announced they had identified over 50 victims in what authorities described as the largest AI-generated sexual deepfake investigation in the country's history. The images and videos, circulated across social platforms, had been created without the subjects' consent and distributed at scale. While law enforcement works to hold perpetrators accountable, the case surfaces a harder question for the platforms themselves: how do you reliably detect AI-generated content in 2026?

The Detection Stack: What Platforms Scan For in 2026

Modern content moderation systems no longer rely on a single signal. Instead, they stack multiple detection methods into a pipeline, each catching content that slips past the others.

C2PA (Coalition for Content Provenance and Authenticity) is the most structured layer. C2PA embeds cryptographically signed metadata directly into image and video files using the c2pa manifest format. When a photo is taken on a Pixel 9 Pro with Gemini Nano, the camera firmware writes a manifest with fields like assertion.c2pa.actions[].name, claim_generator, and signature.info.issuer. Detectors read this manifest, validate the signature against a known issuer certificate, and flag files where the manifest is missing, corrupted, or signed by an untrusted generator.

The problem: generative AI tools that respect C2PA will include valid manifests. Tools that don't—or that deliberately strip manifests before distribution—will show no C2PA data at all. A missing manifest isn't proof of AI generation, but it removes one class of "this was definitely created by an AI" evidence. For prosecutors and platforms, that gap matters.

AI metadata fingerprints come next. When Stable Diffusion, Midjourney v7, or Sora generate an image, the model leaves detectable patterns in the compression artifacts. Tools like Deepware, Hive Moderation, and the newly released Adobe Content Credentials API look for specific compression anomalies—quantization matrices that don't match standard camera encoders, frequency-domain signatures in the DCT coefficients, and histogram distributions that deviate from natural photography. Instagram's automated systems have been trained on millions of AI-generated images and look for these artifact signatures in the EXIF.InteroperabilityIndex and EXIF.tag_0x9000 fields when present.

Encoder signatures are the third layer. Every camera and editing tool produces a slightly different file structure. An iPhone 16 Pro encodes HEIC files with a specific com.apple.quicktime.make atom and a hardware-specific tblk atom. When content is generated by an AI model and saved through a desktop application, it carries the encoding fingerprint of that application's codec stack. TikTok's classifier looks at the ftyp box, the moov.trak.mdia.minf.stbl structure, and even the entropy of the bitstream for patterns that don't match known camera encoders.

Missing GPS and sensor metadata forms a passive but powerful signal. Real photos taken with mobile devices include GPSLatitude, GPSLongitude, GPSAltitude, AccelerometerZ, and gyroscope readings in the EXIF or HEIC metadata. AI-generated images, even those that simulate indoor scenes, almost never include authentic motion-sensor data. Meta's classifiers in 2025 began flagging any image where all of EXIF.GPSVersionID, EXIF.DateTimeOriginal, and the full set of sensor tags are absent, especially when the file also has AI-artifact scores above their internal thresholds.

What Gets Flagged on Instagram and TikTok

Instagram's automated detection, now integrated with Meta's AI Content Credentials system, runs three parallel checks. First, it queries C2PA manifests and checks the issuer against Meta's allowlist of trusted camera manufacturers and approved AI generators. Second, it runs an on-device artifact classifier that produces a confidence score between 0 and 1, stored internally as ai_generated_probability. Third, it cross-references the upload's device fingerprint against the account's known device history.

TikTok's system is structured differently. It focuses on the mov container's cmov atom and the presence of specific encoder strings. It flags content where the file size-to-duration ratio is statistically anomalous for the claimed resolution, and it runs perceptual hash comparisons against a database of known AI-generated content using pHash and aHash. TikTok also checks for the presence of stik atoms in MP4 files that indicate certain editing software suites known to be associated with AI re-rendering.

Both platforms will suppress content that triggers their internal thresholds—reducing reach, adding labels, or removing it entirely—though the exact thresholds are proprietary. Both have also faced criticism for false positives, particularly against photographs taken with older smartphones or edited in software that strips metadata.

The Durable Fix: Stripping and Injecting Clean Identity

The detection stack is effective against unmodified AI output. It breaks down when someone applies even basic post-processing—re-encoding the video, screenshotting an AI image, or uploading through a browser that strips most metadata. This is where the arms race shifts to file identity rather than content analysis.

The only reliable method to pass through modern detection pipelines is to strip all AI-origin metadata and inject authentic device identity at the file level. This isn't about faking a few EXIF fields. It requires replacing the full provenance chain.

Here is the step-by-step process used by professional content workflows in 2026:

  1. Strip all existing metadata — Use a tool like exiftool with the command exiftool -all= -overwrite_original filename.mp4 to remove all EXIF, XMP, IPTC, and container-level metadata. This eliminates the AI generator's fingerprints, C2PA manifests (legitimate or forged), and any encoder signatures from the generation tool.
  2. Remove compression artifacts selectively — Apply a high-quality re-encode using a lossless or near-lossless codec. The goal is to break the AI artifact patterns without destroying perceptual quality. HandBrake with constant quality (CQ) set to 18 and the encoder_me=off and encoder_threads=8 parameters re-encodes while resetting the codec fingerprint.
  3. Inject authentic camera identity — Use a metadata injection tool to write genuine device metadata that matches a real device. For an iPhone, this means populating Make (Apple), Model (iPhone 16 Pro), Software (16.3.1), and the full sensor data chain including AccelerometerZ, GyroscopeX/Y/Z, and DeviceSettings values that match Apple's actual sensor output profiles.
  4. Add authentic GPS and timestamp data — Write coordinates from a real location, a timestamp matching your timezone, and elevation data consistent with the coordinates. Both platforms cross-reference this against their own geolocation databases. TikTok especially flags timestamps that don't align with the uploaded location.
  5. Generate and embed a valid C2PA manifest — For content that needs to pass the strictest checks, generate a C2PA manifest using a trusted claim generator, signing it with a certificate from an allowlisted issuer. This step is optional for most uploads but essential for content that will go through Meta's full Content Credentials verification pipeline.
  6. Upload from a recognized device context — Instagram and TikTok both associate uploads with device identifiers. Upload through an app installed on a device with a known hardware profile. The file and the upload context must be consistent: a file claiming to be from an iPhone should not be uploaded from a rooted Android device running a modified Instagram APK.

Each step addresses a specific detection layer. Stripping handles the AI metadata and C2PA artifacts. Re-encoding resets the encoder signatures. Injecting authentic sensor data fills the gaps that make AI output look synthetic. The device context closes the loop on platform-level device fingerprinting.

No single step is sufficient on its own. A stripped file with no metadata at all will fail the missing GPS check. A file with perfect metadata but the wrong encoder fingerprint will fail the artifact classifier. The pipeline only holds when all layers are addressed together.

For teams managing content pipelines at scale—social media operations, marketing agencies, creator studios—the overhead of running this manually is significant. Automation is the only path to consistency.

Calabi handles the full pipeline in a single pass: stripping AI fingerprints, re-encoding through a verified codec chain, injecting authentic device metadata, and delivering a file that passes through Instagram and TikTok detection without the manual overhead.

→ Try Calabi free at calabilabs.com — 3 cleans, no card.

3 free cleans. See the forensic proof before you download.
Try free →

Related reading