Calabi Labs · Guide · 2026-06-14
A YouTube video summarizer creates AI-generated text or audio from video content—but when you upload that output to YouTube, the platform's automated systems will often label it as AI-generated regardless of whether you wrote the summary yourself. That's because the detection isn't based on what the content looks like or says. It's based on invisible metadata signals embedded in your file.
Here is the complete page:
YouTube is no longer relying on creators to self-disclose AI usage. Starting in May 2026, the platform runs automated detection across uploaded files and applies an "AI-generated" label automatically—permanently attached to the video page. This means a perfectly polished AI-assisted summary or faceless video you generated can get flagged without you knowing why.
The label isn't applied based on what's visible in the video. It's applied based on data embedded in the file itself: cryptographic manifests, metadata tags, and encoder fingerprints that exist entirely outside the visual layer a viewer sees.
Three invisible signal categories trigger YouTube's AI detection:
C2PA / Content Credentials (JUMBF metadata)
The Coalition for Content Provenance and Authenticity embeds a structured data manifest inside AI-generated files. This is stored as JUMBF (JPEG Universal Metadata Box Format) atoms and says, cryptographically, "this content was created or significantly modified by AI." YouTube reads this directly and applies a permanent label. A single AI export can carry 18 or more of these JUMBF atoms—YouTube's detection reduces that count to zero on a cleaned file.
XMP and EXIF AI flags
When you export from Midjourney, Sora, Runway, Pictory, or any AI video tool, the file's XMP metadata layer gets tagged with fields like DigitalSourceType: trainedAlgorithmicMedia, GeneratorSoftware, and AIModel. These are standard EXIFTool-readable fields—exactly the ones forensic auditors and platform scanners use. A raw AI export carries roughly 144 metadata tags; after cleaning, that drops to around 94 neutral structural tags.
Encoder fingerprints
Software encoders like Lavc (FFmpeg's libavcodec), x264, or x265 embed SEI (Supplemental Enhancement Information) messages in the video bitstream. These aren't visible and don't appear in standard metadata viewers, but they are readable by forensic tools. AI video generators all use the same encoders, which creates a detectable pattern in the bitstream itself—separate from any visible watermark.
These are the approaches most creators try first, and none of them reliably remove the detection layer:
Cropping or trimming removes visible content but leaves the metadata intact. The C2PA manifest and XMP flags survive any geometric operation on the file because they live in the file header, not the pixel grid. YouTube's scanner reads the file data, not what you see on screen.
Re-encoding through HandBrake or FFmpeg can strip some metadata but leaves C2PA atoms embedded in the file structure. YouTube's scanner specifically looks for these. A simple re-encode without targeted stripping won't get you to zero.
Screenshots or screen recording is the most common workaround. It does remove most metadata, but it also degrades your video to a fraction of the original quality—and platforms like YouTube have started scanning perceptual hash signatures that survive re-compression. You lose quality and still may get flagged on high-confidence detections.
Calabi runs a one-pass pipeline that strips the detection signals and injects authentic phone-capture identity, so the file reads as a normal recording on a real device.
Step 1 — Upload your AI-generated file. You drop your video directly on the Calabi interface. No settings, no manual processing. The file stays private and runs through the pipeline automatically.
Step 2 — Automatic strip. Calabi removes every C2PA / Content Credentials atom, all DigitalSourceType: trainedAlgorithmicMedia XMP flags, generator/tool metadata tags, and encoder SEI fingerprints (Lavc, x264) from the bitstream. This isn't editing pixels—it's removing the invisible detection layer that platform scanners actually read.
Step 3 — Inject authentic phone identity. Calabi writes real device metadata into the file: a specific phone make and model (iPhone 15 Pro, Pixel 8 Pro, Galaxy S24 Ultra), software version, GPS coordinates, and a capture timestamp. It also assigns a real-phone encoder name. The file now looks like it was recorded on that device.
Step 4 — Review the forensic proof card. Before download, you receive a Calabi forensic proof card—a scan using ExifTool, the same tool newsrooms and forensic auditors use. It shows exactly what was stripped and what was injected: JUMBF atoms reduced from 18 to 0, C2PA references from 16 to 0, and AI flags removed. You see what YouTube's scanner will see.
Step 5 — Download and upload. The cleaned file is yours. Upload it to YouTube with the same workflow as any other video.
Does Calabi remove the visible AI label YouTube already attached to my video?
If YouTube has already applied an AI label based on past uploads or Content Credentials embedded in a previous version, you would need to upload a freshly cleaned file. Calabi's pipeline is designed for pre-upload use—clean the file before your first YouTube upload to avoid the label being applied in the first place.
Can I use Calabi on video summaries created with an AI summarizer tool?
Yes. If your summarizer exports a video file (e.g., an AI-generated voiceover with stock footage, a static image with text overlay, or an animated summary), that export carries the same metadata signals as any other AI-generated content. Calabi strips those signals and injects phone-capture identity so the file reads as a conventional recording.
Try Calabi free at calabilabs.com — 10 cleans, no card.