Trend report · hn_show · 2026-06-20

Show HN: Timestamp and provenance records for AI-assisted creative work

By Calabi Labs Editorial Team · 2026-06-20

When Seedance 2.0 dropped, the internet broke. Not metaphorically—platforms that had spent years building AI detection infrastructure suddenly faced content so polished it made Netflix look amateur. The cat videos, the cinematic establishing shots, the product demos that looked like Super Bowl ads: all indistinguishable from professional productions. And all of it was being uploaded without disclosure, without provenance, and without triggering the systems designed to catch exactly this kind of thing.

That gap—that moment between "platforms can detect AI content" and "platforms actually catch Seedance-quality output"—is where things get interesting. And complicated. And increasingly consequential for anyone building, publishing, or distributing AI-assisted work.

The Detection Stack in 2026

Modern platforms don't rely on a single signal. They run a layered analysis pipeline, and understanding each layer is essential if you want to navigate it intelligently.

C2PA: The Provenance Standard

The Coalition for Content Provenance and Authenticity standardized a metadata framework that embeds cryptographic attestations directly into files. When you export from Midjourney, Runway, or Sora, the software can (and increasingly does) write a C2PA manifest into a JUMBF box. The relevant fields include:

`c2pa.claim_generator` — identifies the software (e.g., "Runway Gen-3 Alpha v1.2")
`c2pa.actions` — describes what was done ("C2PA.Create", "stdf.evidencemarker")
`c2pa.hardware` — claims specific camera hardware (often absent or fabricated)
`c2pa.timestamp` — cryptographic timestamp from an RFC 3161 server

Instagram and TikTok's content moderation systems parse these manifests. If a video contains a valid C2PA assertion from a known AI generation tool, that's a strong signal. If the manifest is missing on content that has other hallmarks of AI generation, that gap itself becomes a flag.

AI-Specific Metadata

Beyond C2PA, each generation tool leaves fingerprints in tool-specific metadata. For image models:

`parameters.stable_diffusion`, `parameters.prompt`, `parameters.negative_prompt` — embedded by open-source tools
`Software.AI_Model` — custom EXIF fields written by commercial services
`XMP.ToolName` / `XMP.ModelVersion` — Adobe-format metadata from professional workflows

For video, the trail is even richer. FFmpeg's metadata parsing often reveals encoder strings like "Stable Video Diffusion" or specific preset names. Bitstream analysis can identify generation-era GOP (Group of Pictures) structures that differ from camera-original footage—AI video generators often produce denser P-frame patterns than physical cameras.

Encoder Signatures

Every encoder leaves a fingerprint. HandBrake's CRF rate control produces measurable patterns. FFmpeg builds with specific libavcodec versions have detectable characteristics. When AI video is exported through a specific pipeline, that pipeline's signature gets encoded into the bitstream.

Detection systems maintain hash databases and signature catalogs. Upload a file with a known AI encoder signature, and you're starting from behind.

The GPS/EXIF Gap

Modern smartphones embed rich EXIF data: GPS coordinates, device serial hashes, software version, capture timestamp with timezone offset, lens metadata, and more. When content is generated entirely in silico, none of this exists.

But here's the subtlety: it's not just about presence or absence. Detection systems analyze consistency. A photo claiming to come from a Pixel 9 but missing the expected `LensModel` or `GPSAltitudeRef` fields is a red flag. A video uploaded from a "camera" but with no gyroscope data, no capture UUID, no continuity with the user's established upload patterns—that gets escalated.

The metadata that should be there but isn't often matters more than what is.

What Actually Gets Flagged

Based on documented moderation patterns and platform disclosures:

Instagram scans for C2PA assertions, missing device provenance, and hash-matched AI content. High-engagement accounts posting AI content without disclosure face higher scrutiny.
TikTok runs aggressive hash matching against known AI-generated content databases and flags accounts with anomalous upload metadata patterns.
YouTube evaluates creator disclosure compliance—if you claim "100% real footage" and metadata says otherwise, that's a strike.
X/Twitter uses C2PA verification when available and has been piloting automatic labels for detected AI content.

None of these systems are perfect. Seedance-quality output often sails through because the visual artifacts that older detection relied on are gone. But the metadata layer remains a reliable vector—and platforms know it.

The Durable Fix: Strip and Inject

If you need AI-generated content to pass platform scrutiny—whether for creative work, advertising, or legitimate production purposes—the only reliable approach is a two-step process:

Strip all AI provenance. Remove C2PA manifests entirely. Clear EXIF fields including all AI-specific metadata. Null out XMP blocks that reference generation tools. For video, re-encode through a clean pipeline that doesn't carry forward encoder signatures.
Inject authentic device identity. Write legitimate camera metadata matching a real device profile. Include expected GPS coordinates (or a consistent absence pattern if location services were off). Match the timestamp format and timezone conventions of the claimed device. The identity must be internally consistent and consistent with your account's historical upload patterns.

The key principle: you're not spoofing. You're creating a coherent provenance record that a camera could have produced. The goal is absence of negative signals, not presence of positive ones.

This is technically nontrivial. Manual metadata editing leaves traces. Off-the-shelf tools don't handle C2PA stripping reliably. And inconsistency between fields—wrong timezone offsets, mismatched device model and lens data, GPS coordinates in the ocean—is worse than no metadata at all.

Making It Work at Scale

If you're handling this for more than a handful of files, you need automation that understands the full metadata stack. C2PA manifests, EXIF 2.31, XMP sidecars, video container metadata, and bitstream-level signatures all need to be handled in coordination. One missed field can expose the whole record.

The alternative to building this infrastructure in-house is using a service that handles the complete strip-and-inject pipeline with validated device profiles and consistent output.

→ Try Calabi free at calabilabs.com — 10 cleans, no card.

10 free cleans. See the forensic proof before you download.

Try free →