SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation

1University of Toronto 2Vector Institute

arXiv 2024


Ablation on U-Net feature maps (Paper Fig. 13)

Input Trajectory

Modified self-attn out
(Our choice)

Self-attn out

Self-attn query

Self-attn key

Self-attn value

Upsample block out

Temporal-attn out

Temporal-attn query

Temporal-attn key

Temporal-attn value

Input Trajectory

Modified self-attn out
(Our choice)

Self-attn out

Self-attn query

Self-attn key

Self-attn value

Upsample block out

Temporal-attn out

Temporal-attn query

Temporal-attn key

Temporal-attn value

Input Trajectory

Modified self-attn out
(Our choice)

Self-attn out

Self-attn query

Self-attn key

Self-attn value

Upsample block out

Temporal-attn out

Temporal-attn query

Temporal-attn key

Temporal-attn value

Input Trajectory

Modified self-attn out
(Our choice)

Self-attn out

Self-attn query

Self-attn key

Self-attn value

Upsample block out

Temporal-attn out

Temporal-attn query

Temporal-attn key

Temporal-attn value

Input Trajectory

Modified self-attn out
(Our choice)

Self-attn out

Self-attn query

Self-attn key

Self-attn value

Upsample block out

Temporal-attn out

Temporal-attn query

Temporal-attn key

Temporal-attn value