Trend report · hn_show · 2026-06-12

Show HN: Co-Authored-By Is a Lie: Cryptographic Provenance for AI Coding Agents

The Hacker News community is buzzing about a uncomfortable truth: "Co-Authored-By" labels on AI-generated code are largely theatrical. The attribution exists in metadata, but metadata is trivially stripped, rewritten, or spoofed. What HN user @rduffy exposed in Co-Authored-By Is a Lie applies far beyond code commits—it exposes the fundamental weakness in every platform's AI detection regime. If you want your AI-assisted content to survive 2026's increasingly aggressive scanners, you need to understand exactly what they're looking for and how to beat them.

What Platforms Scan For in 2026

Modern AI detection operates in layers. Each layer checks a different signal, and a piece of content fails if any layer flags it. Here's what's actually running under the hood.

C2PA Provenance Chains

The Coalition for Content Provenance and Authenticity standard has moved from draft to deployment. Major platforms now parse C2PA metadata embedded in JUMBF boxes. The critical fields:

uapf:claim_generator — Identifies the software that created the content. "Adobe Firefly 3.0" or "Stable Diffusion XL" lights up red.
c2pa:actions — Tracks edits. An action with digital_source_type set to "http://cv.iptc.org/newscodes/digitalSourceType/trainedAlgorithmicMedia" is a death sentence.
xmpMM:History — Software like Midjourney writes full edit stacks here. Even after EXIF stripping, forensic tools extract this from the XMP packet.

Instagram and TikTok parse C2PA on upload. If your image carries an Adobe Firefly provenance claim, expect an immediate label: "AI-generated" in small gray text. Some accounts get throttled; others get shadowbanned for "synthetic media."

AI Metadata Fingerprints

Beyond C2PA, each AI model leaves distinctive artifacts. These aren't official standards—they're machine learning fingerprints that detection models have learned to recognize:

Smooth gradient regions — AI-generated images often lack the sensor noise, demosaicing artifacts, and compression blocks that natural photographs carry.
Missing embedded metadata — A photo from a Canon R5 has predictable MakerNotes patterns. A "photo" with zero EXIF, zero MakerNotes, and no TIFF structure screams AI synthesis.

Missing GPS and Sensor Identity

Here's a subtle but critical signal: real photos have GPS coordinates or, failing that, timezone-inferred location data. They have:

EXIF DateTimeOriginal matching plausible timezone offsets
GPSAltitude and GPSLatitude (even if coordinates are redacted, the presence of the GPS IFD matters)
DeviceMake and DeviceModel matching known camera hardware

A "photo" with no GPS IFD, no DeviceMake, and DateTimeOriginal in UTC (rather than a device-set timezone) looks like a rendered frame, not a captured image.

What Actually Gets Flagged

Based on platform enforcement patterns documented across 2024-2025:

Instagram: Flags content with C2PA digital_source_type indicating AI generation. Applies "AI info" labels. For repeated offenders or "misleading synthetic media," reduces reach by 40-60%.
YouTube: Checks C2PA manifests for uploaded videos. AI-labeled content requires disclosure via the "贴着 AI-generated content" toggle or risks monetization review.
X/Twitter: No systematic AI detection as of early 2026, but third-party moderation tools (via transcript analysis) flag AI-written text more aggressively than images.

The Durable Fix: Strip and Inject

Metadata stripping alone doesn't work. You strip the Firefly provenance claim, but you're left with a file that has no metadata—the absence itself is suspicious. The solution requires two steps in sequence.

Step-by-Step: Content Sanitization for 2026 Platforms

Strip all provenance metadata. Remove C2PA JUMBF boxes, XMP packets, EXIF IFDs, and MakerNotes. Use tools that target specific blocks—generic "EXIF removal" often misses XMP or C2PA layers. Check with a hex editor for JUMBF byte sequences after running your stripper.
Inject clean phone identity. This means creating plausible metadata for a real device—Canon EOS R6 Mark II, iPhone 15 Pro, or similar. The fields that matter:
- Make: Canon, Apple, Sony
- Model: EOS R6 Mark II, iPhone 15 Pro
- Software: Adobe Lightroom Classic 13.0 (not "Firefly")
- DateTimeOriginal: Set to a plausible timezone (e.g., "2026:03:15 14:32:18" with UTC-5 for New York)
- GPSLatitude/GPSLongitude: Coordinates for a real location, ideally matching the DateTime timezone
- LensModel: "RF24-70mm F2.8 L IS USM" or "iPhone 15 Pro back camera 6.765mm f/1.78"
Add synthetic sensor noise. Real camera images have PRNU (Photo Response Non-Uniformity) patterns unique to each sensor. AI-generated images lack these. For high-stakes use cases, inject a subtle noise layer calibrated to the "claimed" device. This is optional for most creators but necessary for content facing rigorous review.
Verify before upload. Run your cleaned file through a platform pre-check tool. Confirm:
- Zero C2PA manifests present
- EXIF DateTimeOriginal, Make, Model, and GPS all populated
- No AI probability score from CLIP-based classifiers

Why Strip-and-Inject Beats Stripping Alone

The "Co-Authored-By is a Lie" analysis on HN showed that removing attribution is easy—but a file with no identity is more suspicious than one with a clean AI label. Platforms have learned this. They're not just looking for "AI present"—they're looking for "plausible natural origin." A Canon EOS R6 image with GPS, realistic timestamps, and standard LensModel metadata passes the smell test. The same content with zero metadata does not.

Phone identity injection works because it shifts the question from "Is this AI?" to "Does this look like a real photo from a real device?" A properly injected device identity makes the answer yes—even when the underlying image is AI-generated.

The arms race continues. C2PA adoption grows. Detection models sharpen. But right now, in 2026, the gap between "AI content" and "plausibly natural content" is bridgeable. The window is open. How you traverse it determines whether your content thrives or gets labeled into irrelevance.

→ Try Calabi free at calabilabs.com — 10 cleans, no card.

10 free cleans. See the forensic proof before you download.

Try free →