FilmGen Benchmarks Land: Runway, Pika, and Google Veo Report 12–35% Gains in AI Film Making
A new wave of standardized AI film-making benchmarks dropped this month, with MLCommons’ FilmGen suite and updated VBench scores setting a clearer pecking order. Runway, Pika, and Google Veo published double‑digit improvements in speed and quality, while NVIDIA and cloud providers detailed hardware gains that reshape cost-per-minute economics.
Executive Summary
- MLCommons introduced FilmGen, a standardized AI film-making benchmark suite, while VBench published updated scoring guidelines to align model quality with production use cases.
- Runway, Pika, and Google Veo reported 12–35% improvements in quality and throughput on recent releases, citing better temporal consistency and faster renders.
- NVIDIA detailed 30–55% throughput gains for generative video on B200/H200 GPUs via TensorRT optimizations; cloud providers highlighted cost-per-minute declines.
- Early studio pilots emphasize compliance checks and content provenance as performance metrics increasingly include safety and watermarking.
Benchmarking Arrives for AI Film Pipelines
Over the past four weeks, benchmarking efforts for AI film-making moved from ad hoc metrics to standardized suites. MLCommons unveiled FilmGen, a research-led benchmark focused on text-to-video and video-to-video tasks using composite measures that include Fréchet Video Distance (FVD), VBench categories, render throughput, and energy per minute generated, balancing quality with operational efficiency. The organization said the goal is to support procurement and production decisions across studios and agencies, outlining reproducible protocols and dataset governance published on December releases. MLCommons provided an overview of the methodology and submission process in its latest update.
A parallel update from VBench tightened real-world alignment by expanding categories for motion coherence, cinematic composition, and character consistency, and by publishing score normalization guidance to compare across model families. The new guidance, released in late November, is already cited by vendors as the reference metric for creator-facing quality, especially as productions move beyond short clips to multi-shot sequences. Documentation and metrics are available via the public repository and accompanying paper. See the latest materials from VBench and the benchmark write-up on arXiv.
Vendors Post Double-Digit Gains
On November 26, Runway...