Trend report · hn_ai · 2026-06-12
In the debate over whether copyright protects open source AI the way it protects open source code, there's a practical question lurking beneath the ideological arguments: Who actually controls what happens to AI-generated content once it leaves your machine? In 2026, the answer is platforms—and they've gotten very good at knowing what you made, when, and how. If you're publishing AI content, understanding what gets scanned matters as much as the legal theory.
Modern content moderation systems run a layered detection stack. It's not a single checkbox—it's a pipeline that examines your media file from multiple angles simultaneously.
c2pa.assertions block can include fields like gen_source, generator_name, and software_name. When you export from Midjourney, Firefly, or Sora, these systems may write to the c2pa.claim_generator field. Platforms check for this block; if present and unsigned or mismatched with known AI generators, that's a flag.Iptc4xmpExt:DigitalSource field in IPTC-IIM or XMP namespaces often reads "trainedAlgorithmicMaterial" for AI content. digiKam and Adobe tools write this automatically when exporting from generative models. The photoshop:History field can expose "Stable Diffusion" or "DALL-E 3" as action keywords.quantization_map differs from natural photography. Tools like PhotoDNA (Microsoft's hash matching) have been extended with AI-DNA signatures that detect specific model families. TikTok's detection specifically looks for the ICC profile mismatch between native camera output and AI-generated content.GPSAltitude = 0 with no corresponding GPSLatitude as a moderate signal.The platforms don't publish their scoring rubrics, but user reports and leaked documentation reveal consistent patterns.
Instagram runs content through its AI-Generated Content (AGC) Classifier before it hits the feed. Posts with detectable C2PA blocks from known generators (Midjourney, Firefly, Sora) see initial reach throttling of 40–70% until reviewed. A post missing ExifIFD:MakerNote entirely—common from web downloads or screenshots—triggers secondary scrutiny. The "Made with AI" label, introduced in 2024, attaches automatically when confidence exceeds 0.7 on the C2PA check or the IPTC DigitalSource field is present.
TikTok uses a multi-stage pipeline: first, Audio/Video Matching (the "C2PA Validation Layer") rejects uploads with unsigned provenance blocks. Second, its Synthetic Media Detector (SMD) analyzes encoder artifacts. If your file came from a phone but lacks the expected DeviceID in the file header, that's a red flag—legitimate phone exports always carry device identity in MakerNote tags.
A common misconception: "I'll just strip the EXIF and C2PA data, and I'm safe." This is half-right and dangerously incomplete.
Stripping removes intentional metadata, but platforms have moved to device fingerprinting as a secondary signal. When you upload from a phone, the platform reads the android_id (or IDFA on iOS), SIM serial, and carrier MCC-MNC codes embedded in the upload handshake—not the file metadata. If that device has previously uploaded AI content, or if the fingerprinting layer detects inconsistencies (a file claiming to be from an iPhone 15 Pro but missing the expected MakerNote Apple-specific tags), you get flagged anyway.
The fundamental problem: metadata stripping doesn't change what the file actually is. The encoder signatures, quantization patterns, and device identity signals remain.
The only robust approach is a two-step pipeline that addresses both metadata and device signals.
c2pa.claim_generator, Iptc4xmpExt:DigitalSource, GPSLatitude, GPSLongitude, and ExifIFD:Make. Don't just delete—overwrite with zeros to prevent recovery.DeviceMake (e.g., "Apple"), DeviceModel (e.g., "iPhone 15 Pro"), and valid LensInfo. Include matching GPSLatitude/GPSLongitude from a plausible location. Write a MakerNote block that matches the device profile exactly.This is precisely what Calabi's Sora watermark removal and AI metadata cleaning pipeline does in a single pass—it strips every detectable AI signal and injects a clean phone identity profile, including valid GPS, device tags, and ICC profile, making the file indistinguishable from native phone photography at the platform scanning layer.
The copyright question around open source AI will be settled in courts over years. The detection question is settled right now—and if you're publishing AI content, the only durable defense is making your files look exactly like what they claim to be.
→ Try Calabi free at calabilabs.com — 10 cleans, no card.