A Diffusion-Based Framework for
Occluded Object Movement

Zheng-Peng Duan^1,2 Jiawei Zhang² Siyu Liu¹ Zheng Lin^5,†
Chun-Le Guo^1,4 Dongqing Zou^2,3 Jimmy S. Ren² Chongyi Li^1,4,†

¹VCIP, CS, Nankai University ²SenseTime Research ³PBVR ⁴NKIARI, Shenzhen Futian
⁵BNRist, Department of Computer Science and Technology, Tsinghua University
^†Corresponding Authors

AAAI 2025

Paper (TBD) Code (TBD) BibTex

🔥Results

Put your mouse 🖱 on the image to reveal our results!

Abstract

Seamlessly moving objects within a scene is a common requirement for image editing, but it is still a challenge for existing editing methods. The main difficulty is the occluded portion needs to be completed before movement. To leverage the real-world knowledge embedded in the pre-trained diffusion models, we propose a diffusion-based framework specifically designed for occluded object movement, which consists of two parallel branches that perform object deocclusion and movement simultaneously. The deocclusion branch utilizes a background color-fill strategy and a continuously updated object mask to focus the diffusion process on completing the obscured portion of the target object. Concurrently, the movement branch employs latent optimization to place the completed object in the target location and adopts local text-conditioned guidance to appropriately integrate the object into new surroundings. Extensive evaluation using different metrics demonstrates the superior performance of our methods, which is further affirmed by a user study.

Method

Overview of our method (a) and LoRA tuning process (b).
(a) We decouple the task of occluded object movement into deocclusion and movement, handled by parallel branches. Both branches are built upon Stable Diffusion and operate simultaneously. The deocclusion branch leverages the prior knowledge within the diffusion models to complete the occluded portion, while the movement branch majorly places the restored object at the target position.
(b) To ensure the content generated by the deocclusion branch aligns with the characteristics of the target object, we equip this branch with LoRA, which is fine-tuned using a masked diffusion loss that applies exclusively to the visible portions of the object.

Integration with other Methods

Our dual-branch framework allows for the decomposition of the object from its background, enabling focused edits directly on the object. The deocclusion branch is compatible with most existing editing methods, enhancing their ability to perform precise edits in complex scenes. The main advantages of our framework in enhancing these methods are:
(1) In scenarios with multiple similar objects, our framework allows for the editing of a specific object without affecting others.
(2) By isolating the object from a complex background, our method permits more precise control over the object.

BibTex


@inproceedings{duan2025diffoom,
    title={A Diffusion-Based Framework for Occluded Object Movement},
    author={Duan, Zheng-Peng and Zhang, Jiawei and Liu, Siyu and Lin, Zheng and Guo, Chun-Le and Zou, Dongqing and Ren, Jimmy and Li, Chongyi},
    journal={AAAI},
    year={2025}
}

Contact

Feel free to contact us at adamduan0211[AT]gmail.com!