FFP-300K: Scaling First-Frame Propagation for Generalizable Video Editing
1 Fudan University
2 Tencent Youtu Lab
* Equal Contribution
† Corresponding Author
Abstract
We introduce FFP-300K, a large-scale dataset of 300K high-fidelity video pairs at 720p resolution and 81 frames, constructed via a scalable two-track pipeline that supports both FFP-based and instruction-based video editing. Building on this dataset, we propose a guidance-free FFP framework with Adaptive Spatio-Temporal RoPE (AST-RoPE) and an identity propagation self-distillation objective, which balances first-frame appearance preservation and source video motion consistency. Comprehensive experiments on the EditVerseBench benchmark demonstrate that our method significantly outperforming existing academic and commercial models by receiving about 0.2 PickScore and 0.3 VLM score improvement against these competitors.