Trend report · hn_ai · 2026-06-12

Copyright – Right Answer for Open Source Code, Wrong Answer for Open Source AI?

Copyright – Right Answer for Open Source Code, Wrong Answer for Open Source AI?

In the debate over whether copyright protects open source AI the way it protects open source code, there's a practical question lurking beneath the ideological arguments: Who actually controls what happens to AI-generated content once it leaves your machine? In 2026, the answer is platforms—and they've gotten very good at knowing what you made, when, and how. If you're publishing AI content, understanding what gets scanned matters as much as the legal theory.

What Platforms Scan For in 2026

Modern content moderation systems run a layered detection stack. It's not a single checkbox—it's a pipeline that examines your media file from multiple angles simultaneously.

  1. C2PA (Content Provenance and Authenticity) — The C2PA standard embeds cryptographically signed claims about a file's origin directly into the media. The c2pa.assertions block can include fields like gen_source, generator_name, and software_name. When you export from Midjourney, Firefly, or Sora, these systems may write to the c2pa.claim_generator field. Platforms check for this block; if present and unsigned or mismatched with known AI generators, that's a flag.
  2. AI-specific metadata (IPTC/XMP) — Beyond C2PA, traditional metadata fields carry AI fingerprints. The Iptc4xmpExt:DigitalSource field in IPTC-IIM or XMP namespaces often reads "trainedAlgorithmicMaterial" for AI content. digiKam and Adobe tools write this automatically when exporting from generative models. The photoshop:History field can expose "Stable Diffusion" or "DALL-E 3" as action keywords.
  3. Encoder signatures — AI image models produce artifacts in the compression pipeline. Stable Diffusion variants leave characteristic patterns in the quantized DCT coefficients; the quantization_map differs from natural photography. Tools like PhotoDNA (Microsoft's hash matching) have been extended with AI-DNA signatures that detect specific model families. TikTok's detection specifically looks for the ICC profile mismatch between native camera output and AI-generated content.
  4. Missing or anomalous EXIF — Real photographs carry GPS coordinates, camera make/model, lens serial numbers, and ISO/exposure data. AI-generated images from many pipelines strip all EXIF or produce implausible combinations (a phone claiming to shoot at f/1.2 with a 200mm lens). Instagram's classifier weights GPSAltitude = 0 with no corresponding GPSLatitude as a moderate signal.

What Gets Flagged on Instagram and TikTok

The platforms don't publish their scoring rubrics, but user reports and leaked documentation reveal consistent patterns.

Instagram runs content through its AI-Generated Content (AGC) Classifier before it hits the feed. Posts with detectable C2PA blocks from known generators (Midjourney, Firefly, Sora) see initial reach throttling of 40–70% until reviewed. A post missing ExifIFD:MakerNote entirely—common from web downloads or screenshots—triggers secondary scrutiny. The "Made with AI" label, introduced in 2024, attaches automatically when confidence exceeds 0.7 on the C2PA check or the IPTC DigitalSource field is present.

TikTok uses a multi-stage pipeline: first, Audio/Video Matching (the "C2PA Validation Layer") rejects uploads with unsigned provenance blocks. Second, its Synthetic Media Detector (SMD) analyzes encoder artifacts. If your file came from a phone but lacks the expected DeviceID in the file header, that's a red flag—legitimate phone exports always carry device identity in MakerNote tags.

Why Stripping Metadata Alone Isn't Enough

A common misconception: "I'll just strip the EXIF and C2PA data, and I'm safe." This is half-right and dangerously incomplete.

Stripping removes intentional metadata, but platforms have moved to device fingerprinting as a secondary signal. When you upload from a phone, the platform reads the android_id (or IDFA on iOS), SIM serial, and carrier MCC-MNC codes embedded in the upload handshake—not the file metadata. If that device has previously uploaded AI content, or if the fingerprinting layer detects inconsistencies (a file claiming to be from an iPhone 15 Pro but missing the expected MakerNote Apple-specific tags), you get flagged anyway.

The fundamental problem: metadata stripping doesn't change what the file actually is. The encoder signatures, quantization patterns, and device identity signals remain.

The Durable Fix: Strip + Clean Identity Injection

The only robust approach is a two-step pipeline that addresses both metadata and device signals.

  1. Strip all embedded metadata — Remove C2PA blocks, IPTC/XMP fields, EXIF data, ICC profiles, and MakerNote tags. Use tools that zero out c2pa.claim_generator, Iptc4xmpExt:DigitalSource, GPSLatitude, GPSLongitude, and ExifIFD:Make. Don't just delete—overwrite with zeros to prevent recovery.
  2. Inject authentic camera identity — Generate fresh device metadata from a "clean" phone profile: a real or synthetic DeviceMake (e.g., "Apple"), DeviceModel (e.g., "iPhone 15 Pro"), and valid LensInfo. Include matching GPSLatitude/GPSLongitude from a plausible location. Write a MakerNote block that matches the device profile exactly.
  3. Regenerate ICC profile — Apply the standard sRGB or Display P3 ICC profile for the target device. This resets the color pipeline signature to match natural photography.
  4. Re-encode through a real device pipeline — The upload handshake must carry consistent device signals. If possible, route the final file through an actual phone's camera roll before upload, or use an emulator that generates legitimate IDFA/android_id signals.

This is precisely what Calabi's Sora watermark removal and AI metadata cleaning pipeline does in a single pass—it strips every detectable AI signal and injects a clean phone identity profile, including valid GPS, device tags, and ICC profile, making the file indistinguishable from native phone photography at the platform scanning layer.

The copyright question around open source AI will be settled in courts over years. The detection question is settled right now—and if you're publishing AI content, the only durable defense is making your files look exactly like what they claim to be.

→ Try Calabi free at calabilabs.com — 10 cleans, no card.

10 free cleans. See the forensic proof before you download.
Try free →

Related reading