1Frequency-aware 3D RoPE Modulation
Adaptively interpolates under-trained low-frequency components while extrapolating high-frequency components to preserve multi-scale temporal discriminability.
More Consistent, More Dynamic!
1The University of Hong Kong · 2ByteDance · 3Institute of Information Engineering, Chinese Academy of Sciences
*Equal contribution · †Corresponding authors
Autoregressive video diffusion models have emerged as a scalable paradigm for long video generation. However, they often suffer from severe extrapolation failure, where rapid error accumulation leads to significant temporal degradation when extending beyond training horizons. In this paper, we propose FLEX(Frequency-aware Length EXtension) , a training-free inference-time framework that bridges the gap between short-term training and long-term inference.
On VBench-Long, FLEX outperforms state-of-the-art methods at 6x extrapolation (30s) and remains competitive with long-video fine-tuned baselines at 12x scale (60s). As a plug-and-play augmentation, FLEX extends the generation horizon of existing pipelines (e.g., LongLive) to consistent, dynamic synthesis at minutes level.
Adaptively interpolates under-trained low-frequency components while extrapolating high-frequency components to preserve multi-scale temporal discriminability.
A structured noise initialization strategy that injects high-frequency dynamic priors to encourage rich temproal variations.
Adds global structural anchors at inference time for long-range stability, without modifying training or finetuning model weights.
Scenes & Objects
Sports
Portraits
Animation
FLEX is compatible with existing long-video inference pipelines for further generation horizon extension. For instance, it supports more stable multi-minute video generation with LongLive.
Appearance Consistency
Scene Dynamics
LongLive
LongLive+FLEX
LongLive
LongLive+FLEX
@misc{li2026trainshortinferencelong,
title={Train Short, Inference Long: Training-free Horizon Extension for Autoregressive Video Generation},
author={Jia Li and Xiaomeng Fu and Xurui Peng and Weifeng Chen and Youwei Zheng and Tianyu Zhao and Jiexi Wang and Fangmin Chen and Xing Wang and Hayden Kwok-Hay So},
year={2026},
eprint={2602.14027},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2602.14027},
}