Trend report · hn_ai · 2026-06-04

Show HN: Black-box API bug detection across 7 AI systems

Show HN: Black-box API bug detection across 7 AI systems

In January 2025, a demo surfaced showing an AI agent finding API-level bugs across seven different systems simultaneously. The technique wasn't about testing individual endpoints—it was about watching how each system failed silently when fed malformed data across its entire pipeline. Content moderation platforms work the same way. They don't just scan one layer; they run seven or eight parallel checks, each watching for a different signal of AI origin. If you're publishing AI-generated content on social platforms in 2026, you need to understand every layer they're watching, because a fix that only handles metadata will still get caught by the encoder fingerprint.

The Seven-Layer Stack Platforms Actually Scan

Modern content moderation isn't a single check—it's a cascade. Here's what actually runs against your image or video before it ever reaches an algorithm trained on human faces and camera noise.

  1. C2PA (Coalition for Content Provenance and Accountability) — This is the content credentials standard adopted by Adobe, Microsoft, Google, and most major camera manufacturers. If an image was generated by an AI, the C2PA block will contain a c2pa.claim_generator field set to something like "Adobe Firefly 3" or "Sora v2". The Manifest Store embedded in the file will have an actions array with entries like generator_name and parameters. Platforms read this block even when users can't see it. If present, it's a red flag—not always a block, but it feeds the confidence score.
  2. EXIF/XMP AI metadata — Before C2PA, most AI tools stamped their output with standard EXIF fields. Look for tags like XMP:CreatorTool, Software entries containing "Midjourney" or "Stable Diffusion", or Generator fields. Some tools use normalized field names like prompt or ai_generated in the XMP packet. Even if C2PA is absent, these older fields still get scanned.
  3. Encoder signatures — Every generation model has a statistical fingerprint in how it compresses noise patterns. Stable Diffusion outputs have detectable compression artifacts at specific frequency ranges. Sora produces characteristic temporal artifacts in video frames. These aren't metadata—they're embedded in the pixel data itself. Platforms extract high-frequency spectral features and run them against a classifier trained on thousands of AI outputs. This is why simply deleting metadata doesn't work.
  4. GPS and device identity fields — A photo from a real phone includes GPS coordinates, a device make/model (like Make=Apple, Model=iPhone 16 Pro), and timestamps that match the phone's internal clock. A synthetic image generated in a datacenter has none of this. Even an AI image with manually added GPS can be checked against the device make—if your "iPhone photo" has GPS but no corresponding device model in the EXIF, that's a mismatch signal.
  5. Color matrix and CFA pattern analysis — Real camera sensors use a Color Filter Array (CFA)—typically RGGB or similar—leaving a detectable pattern in the demosaiced output. AI generation models don't simulate this accurately. Some platforms run CFA interpolation analysis to detect whether the image passed through a real sensor. Missing CFA artifacts is a strong AI indicator.
  6. Compression history (PNG chunk analysis, JPEG DCT tables) — When an image is saved in PNG format, the IHDR, PHYS, and tEXt chunks carry metadata about compression history. AI tools often leave inconsistent chunk ordering or non-standard PHYS (physical dimensions) entries. JPEG images show their quality factor and quantization tables—AI upscaling often produces characteristic DCT artifacts that differ from genuine camera noise.
  7. Behavioral patterns in upload metadata — The HTTP headers, IP geolocation, upload timing, and account age feed into a risk score. A brand-new account uploading high-resolution images with perfect metadata consistency but no prior upload history is a signal regardless of what the file itself contains.

What Actually Gets Flagged on Instagram and TikTok

Based on documented cases and platform policies as of early 2026:

Instagram's AI detection has been rolling out since mid-2025. It doesn't block AI content outright—it suppresses reach. A post with detectable AI characteristics can see 40-70% less reach in the algorithm, even if it doesn't violate community guidelines. The suppression is subtle—many creators notice their engagement dropping but don't realize why. Instagram checks for C2PA conformance, EXIF Software fields, and recently added encoder signature matching for content flagged as "AI-generated" by other users.

TikTok is more aggressive. Since the AI-generated content disclosure mandate, TikTok runs the full seven-layer stack and requires creators to self-label. If you don't label and the system detects AI content, you get a content warning—not a takedown, but a strike that affects your ability to monetize. The system also checks for missing GPS on videos tagged with location, and mismatched device identity is a common trigger for the "manipulated content" label.

YouTube has been the most aggressive on monetization. AI-generated content without disclosure gets demonetized under "reused content" policies. The detection there focuses heavily on encoder signatures and compression history—YouTube re-encodes everything on upload, so they analyze the DCT quantization tables from their own transcoded output against AI fingerprints.

Why Metadata Stripping Alone Fails

Most "AI watermark removal" tools stop at metadata. They'll strip the EXIF, remove the XMP packet, and claim the image is clean. It isn't. Here's what still flags it:

The Durable Fix: Strip, Then Inject Clean Phone Identity

The only approach that passes all seven layers is a two-step process. First, strip all AI origin data including the C2PA manifest, all EXIF/XMP, and any PNG chunk metadata. Second, inject a complete, consistent device identity that matches GPS coordinates and capture timestamps. This means writing:

Tools like Calabi perform this injection by simulating the full sensor pipeline of real phones, including the CFA interpolation artifacts, sensor noise characteristics, and lens distortion patterns that a real camera produces. This goes beyond metadata—it's reconstructing the physical fingerprint that AI detection models learned to identify.

Without this, you're fighting a detection system that has seven independent signals and will catch you on any one of them.

→ Try Calabi free at calabilabs.com — 10 cleans, no card.

10 free cleans. See the forensic proof before you download.
Try free →

Related reading