Calabi Labs · Guide · 2026-06-18

Ai diversity report race and gender bias in ai generated videos

What the AI Diversity Reports Actually Say About Race and Gender Bias in Generated Videos

Three major research efforts — Kapwing's 2026 AI Diversity Report, a peer-reviewed study published in Nature Scientific Reports, and a Brookings Institution analysis — all reach the same conclusion: AI video generators exhibit significant race and gender bias, and the problem is not merely visual. The bias lives in the metadata layer too, which is exactly what platforms scan to detect AI content in the first place.

What the Research Actually Found

Kapwing's AI Diversity Report, released January 2026, analyzed a large sample of videos from the leading AI video platforms — Google's Veo 3, OpenAI's Sora 2, Kling, and Hailuo Claude — to examine gender and racial representation across occupation types. The numbers are stark. Women appeared in high-paying jobs in only 8.67% of AI-generated workplace videos. In low-paying jobs, that figure jumped to 53.73%. The disparity is not subtle — it is a systematic skew baked into what the models consider "normal" to generate.

A Nature Scientific Reports study (April 2025) found something equally important: inclusive AI-generated faces measurably reduce people's existing racial and gender biases, while non-inclusive faces reinforce them. This means the problem compounds — AI models that overrepresent one group in professional roles and underrepresent others shape real-world perception, not just digital content.

The Brookings Institution analysis (April 2024) of AI image generators found a dual failure: models either reproduce existing stereotypes with a lack of diversity, or overcorrect in ways that feel artificial and inauthentic. Both outcomes are harmful. UNESCO's separate study (July 2024) confirmed that large language models and generative AI tools produce homophobia and racial stereotyping as a baseline behavior, not an edge case.

Why This Matters Beyond the Visual Output

Here is what most people miss: bias in AI-generated content operates on two layers simultaneously. There is the visible layer — who appears on screen, in what roles, with what skin tones. And there is an invisible layer — the metadata that travels with the file, identifying it as AI-generated at a forensic level.

Platforms like Instagram, TikTok, YouTube, and Reddit do not flag content by watching it. They scan files automatically using forensic tools — the same tools newsrooms and researchers use. Specifically, they look for:

C2PA / Content Credentials — A cryptographic manifest (stored as JUMBF atoms) that records "this file was generated by AI, here is the tool and version." Platforms like TikTok and Instagram now read this automatically. Kapwing's own report notes that AI video platforms are attaching these credentials to exported files.
XMP AI flags — Tags like DigitalSourceType: trainedAlgorithmicMedia explicitly label a file as AI-generated in its metadata structure.
Encoder fingerprints — Video files generated by AI models carry specific encoder signatures. Lavc (FFmpeg's libavcodec) and x264 SEI (Supplemental Enhancement Information) NAL units are common fingerprints in AI-exported video that do not appear in phone recordings.
Missing capture metadata — Real phone recordings contain GPS coordinates, capture timestamps, and device make/model in the EXIF headers. AI exports typically lack all three, which itself is a detection signal.

When a creator posts an AI-generated video showing diverse representation — exactly the kind of content the research says the field needs more of — the platform may still flag it as AI-generated because of these invisible signals, regardless of what appears on screen.

Why the Obvious Fixes Do Not Work

If you have tried to post an AI video only to have it flagged, you may have attempted some common workarounds:

Cropping — Removes the visible area of a visible watermark, but the metadata layer survives intact. The C2PA manifest and encoder fingerprints are not in the pixels you crop away. A platform scanning the file's metadata structure will still detect the AI origin.
Screenshotting or screen recording — This replaces the video stream but does not strip C2PA or XMP metadata from the resulting image file. Plus, a screen recording carries its own re-encoder fingerprint, which is itself a signal platforms catalog.
Re-exporting or re-encoding — A standard re-encode through Handbrake or FFmpeg often preserves or partially preserves metadata unless you specifically use flag sequences to strip it. The x264 SEI and Lavc fingerprints, in particular, persist through many re-encode passes.

The core issue is that the detection metadata — C2PA, XMP flags, encoder fingerprints — is structural, not visual. Cropping a frame does not change the file's manifest. Re-encoding does not automatically purge XMP namespaces. Only targeted metadata stripping addresses it.

How to Actually Clean an AI Video File Before Posting

Calabi is a one-pass web tool that strips the detection metadata from AI-generated video and image files and replaces it with authentic phone-capture identity. Here is how the process works:

Upload your AI-generated video or image file. The pipeline begins automatically — there is no manual configuration.
Calabi strips the detection signals. This means removing C2PA / Content Credentials JUMBF atoms (18 JUMBF atoms reduced to 0, 16 C2PA references reduced to 0 in testing), the DigitalSourceType: trainedAlgorithmicMedia XMP flag, and encoder fingerprints like Lavc and x264 SEI from video bitstreams. A raw AI export carrying 144 metadata tags is reduced to roughly 94 neutral structural tags.
Calabi injects authentic phone-capture identity. The tool writes GPS coordinates, capture timestamp, device make and model, and software version matching real phone profiles — iPhone 15 Pro, Pixel 8 Pro, Galaxy S24 Ultra, and others.
You receive a forensic proof card. This is an ExifTool readout showing exactly what was stripped and what was injected — the same scan platforms use. You can verify the clean result before downloading.

This matters for creators specifically because platforms like Instagram, TikTok, YouTube, and Reddit all run automated scans within seconds of upload. A diverse, well-crafted AI-generated video can still be flagged based on metadata alone. Calabi addresses the detection layer rather than the visual content.

Frequently Asked Questions

Does stripping metadata change how the video looks?

No. Calabi works entirely on invisible file signals and metadata. The visual content — the people, scenes, and diversity represented on screen — is untouched. This is not an editing tool; it operates at the forensic metadata layer.

I already crop my AI videos before posting. Is that enough?

Cropping removes visible elements from the frame, but the C2PA manifest, XMP flags, and encoder fingerprints are embedded in the file structure, not the pixels. Those signals survive cropping and will be detected by platform scanners. Targeted metadata stripping is required to remove them.

Will this guarantee my video is not flagged on TikTok or Instagram?

No tool can guarantee platform outcomes, which vary by platform and source model. Calabi fully removes the metadata detection layer — C2PA, XMP flags, encoder fingerprints — that automated scanners read. Visible content policies (terms about AI-generated content disclosure) are separate and must be managed independently.

Try Calabi free at calabilabs.com — 10 cleans, no card.

10 free cleans. See the forensic proof before you download.

Try free →