Calabi Labs · Guide · 2026-06-19
If you're using Veo 3 and wondering why your videos aren't turning out like the samples you've seen, you're probably making one of a handful of predictable mistakes. The good news: most of them are fixable once you know what to look for. The bigger picture most creators miss: even a perfectly crafted, great-looking Veo 3 video carries invisible metadata that platforms like YouTube, TikTok, and Instagram scan for — and that's a separate problem entirely from your prompt engineering.
Before diving into prompt mistakes, here's what trips up creators who think they've done everything right: platform detectors aren't looking at your video's content. They're reading the invisible layer underneath it — the metadata. When you export a video from Veo 3, it carries a C2PA (Content Credentials) manifest stored as JUMBF atoms. This is a cryptographic record that says, in machine-readable code, "this was generated by an AI model."
That manifest travels with your file even after you download it. Crop it, re-export it, screenshot it — the C2PA record survives. It embeds in the file structure itself. On top of that, XMP metadata fields like DigitalSourceType: trainedAlgorithmicMedia get stamped into the file. Video exports from AI generation tools also carry encoder fingerprints — Lavc (Libavcodec) and x264 SEI (Supplemental Enhancement Information) markers that trained detection models recognize. Add in the absence of GPS coordinates, capture timestamps, and real device metadata, and you have a file that reads as "AI-generated" to automated systems even if it looks completely natural to a human eye.
This matters because platforms run these checks in the seconds after upload, before any human moderator sees your content. A video that looks flawless can still get flagged, restricted, or shadowbanned purely from the metadata layer.
Creators often treat Veo 3 prompts like they're writing a novel. Detailed scene descriptions, camera movement notes, dialogue snippets, and mood tags all at once. The result is a confused output where the model tries to satisfy too many constraints simultaneously and compromises on all of them. A better approach: start with one strong visual direction, nail that, then layer in camera movement and pacing as separate refinements. Think of it like directing — you don't yell every instruction at once, you give focused direction for each take.
Exporting in a single format and assuming it works everywhere is a fast path to mediocre results. A 16:9 landscape export looked great on your monitor but gets cropped awkwardly on a TikTok vertical feed. Veo 3 outputs can handle different framing, but you need to regenerate or at least preview with the target platform's aspect ratio in mind before you commit to the full generation. A vertical cut optimized for mobile-first platforms typically performs 2–3x better in retention metrics than a repurposed horizontal video.
Most creators focus entirely on what they want and never tell the model what to avoid. If you're getting unwanted artifacts, text overlays, distorted faces, or unrealistic physics, a well-constructed negative prompt eliminates most of them without sacrificing your main subject. Common negative prompt additions: "blurry," "watermark," "logo," "text overlay," "distorted hands," "unrealistic lighting." This is especially important for product shots and talking-head content where realism matters.
Veo 3 handles different styles, but pushing it toward hyper-photorealistic while also asking for stylized animation often produces uncanny results. If you want a specific visual style — say, cinematic documentary look — lean into that direction fully rather than trying to hybridize. Describe the reference style, the lighting temperature, the lens character, and the color grade explicitly rather than hoping the model reads between the lines of a mixed description.
First-prompt perfection is rare. Treating Veo 3 as a one-shot tool instead of an iterative process is one of the biggest productivity mistakes. Generate a rough cut, identify what's wrong (motion looks stiff, lighting is flat, subject is off-model), and refine the prompt specifically for those issues in round two. Three to five focused iterations typically produces better work than one exhaustive mega-prompt.
Even if you've nailed the visuals and the prompt, your Veo 3 export still carries the detection fingerprint. You've solved the creative problem but created a distribution problem. Cropping doesn't help — it removes the visible frame but leaves the C2PA atoms and XMP flags intact in the file structure. Re-encoding through a video editor doesn't strip it reliably either, since the metadata can survive transcode depending on the tool. The only reliable approach is processing the file through a tool that strips the detection signals, injects authentic phone-capture identity, and produces a forensic proof card showing exactly what was removed.
The process isn't manual editing — it's file-level metadata processing. Here's how it works:
Calabi processes videos in one pass. No manual editing, no quality loss from re-encoding.
Will cropping my Veo 3 video remove the AI detection signals?
No. Cropping removes pixels from the visible frame but the underlying metadata — C2PA atoms, XMP flags, encoder fingerprints — survives untouched. The detection signals are in the file structure, not the visual content. A cropped AI video still reads as AI-generated to platform scanners.
Does re-exporting through a video editor strip the metadata?
It can reduce some metadata, but it's unreliable. Re-encoding through DaVinci Resolve or FFmpeg may remove some fields depending on your export settings, but C2PA manifests and encoder fingerprints often persist. You also lose quality from the transcode. A dedicated metadata strip tool is more thorough and doesn't touch your visual content.
Can platforms still detect my video if I've cleaned the metadata?
Calabi removes the metadata signals that automated systems scan for — C2PA, XMP flags, encoder fingerprints, and device identity gaps. No tool can guarantee a platform will never flag you, since platforms update their detection methods continuously. What Calabi does is remove the structured metadata layer that the current generation of automated scanners relies on.
Try Calabi free at calabilabs.com — 10 cleans, no card.