FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis: FlowVid

Written by kinetograph | Published 2024/10/09
Tech Story Tags: diffusion-models | image-to-image-synthesis | video-to-video-synthesis | temporal-consistency | v2v-synthesis-framework | spatial-conditions | temporal-optical-flow | flowvid

TLDRThis paper proposes a consistent V2V synthesis framework by jointly leveraging spatial conditions and temporal optical flow clues within the source video.via the TL;DR App

(1) Feng Liang, The University of Texas at Austin and Work partially done during an internship at Meta GenAI (Email: [email protected]);

(2) Bichen Wu, Meta GenAI and Corresponding author;

(3) Jialiang Wang, Meta GenAI;

(4) Licheng Yu, Meta GenAI;

(5) Kunpeng Li, Meta GenAI;

(6) Yinan Zhao, Meta GenAI;

(7) Ishan Misra, Meta GenAI;

(8) Jia-Bin Huang, Meta GenAI;

(9) Peizhao Zhang, Meta GenAI (Email: [email protected]);

(10) Peter Vajda, Meta GenAI (Email: [email protected]);

(11) Diana Marculescu, The University of Texas at Austin (Email: [email protected]).

Table of Links

4. FlowVid

For video-to-video generation, given an input video with N frames I = {I1, . . . , IN } and a text prompt Ļ„ , the goal is transfer it to a new video I ′ = {I ′ 1 , . . . , I′ N } which adheres to the provided prompt Ļ„ ′ , while keeping consistency across frame. We first discuss how we inflate the image-to-image diffusion model, such as ControlNet to video, with spatialtemporal attention [6, 25, 35, 46] (Section 4.1) Then, we introduce how to incorporate imperfect optical flow as a condition into our model (Section 4.2). Lastly, we introduce the edit-propagate design for generation (Section 4.3).

This paper is available on arxiv under CC 4.0 license.


Written by kinetograph | The Kinetograph's the 1st motion-picture camera. At Kinetograph.Tech, we cover cutting edge tech for video editing.
Published by HackerNoon on 2024/10/09