A Diffusion-Based Framework for
Occluded Object Movement

Zheng-Peng Duan1,2    Jiawei Zhang2    Siyu Liu1    Zheng Lin5,†   
Chun-Le Guo1,4    Dongqing Zou2,3    Jimmy S. Ren2    Chongyi Li1,4,†   
1VCIP, CS, Nankai University    2SenseTime Research    3PBVR    4NKIARI, Shenzhen Futian   
5BNRist, Department of Computer Science and Technology, Tsinghua University   
Corresponding Authors

AAAI 2025

🔥Results

Put your mouse 🖱 on the image to reveal our results!




Abstract

Seamlessly moving objects within a scene is a common requirement for image editing, but it is still a challenge for existing editing methods. The main difficulty is the occluded portion needs to be completed before movement. To leverage the real-world knowledge embedded in the pre-trained diffusion models, we propose a diffusion-based framework specifically designed for occluded object movement, which consists of two parallel branches that perform object deocclusion and movement simultaneously. The deocclusion branch utilizes a background color-fill strategy and a continuously updated object mask to focus the diffusion process on completing the obscured portion of the target object. Concurrently, the movement branch employs latent optimization to place the completed object in the target location and adopts local text-conditioned guidance to appropriately integrate the object into new surroundings. Extensive evaluation using different metrics demonstrates the superior performance of our methods, which is further affirmed by a user study.

Method

Overview of our method (a) and LoRA tuning process (b).
(a) We decouple the task of occluded object movement into deocclusion and movement, handled by parallel branches. Both branches are built upon Stable Diffusion and operate simultaneously. The deocclusion branch leverages the prior knowledge within the diffusion models to complete the occluded portion, while the movement branch majorly places the restored object at the target position.
(b) To ensure the content generated by the deocclusion branch aligns with the characteristics of the target object, we equip this branch with LoRA, which is fine-tuned using a masked diffusion loss that applies exclusively to the visible portions of the object.

Integration with other Methods


Our dual-branch framework allows for the decomposition of the object from its background, enabling focused edits directly on the object. The deocclusion branch is compatible with most existing editing methods, enhancing their ability to perform precise edits in complex scenes. The main advantages of our framework in enhancing these methods are:
(1) In scenarios with multiple similar objects, our framework allows for the editing of a specific object without affecting others.
(2) By isolating the object from a complex background, our method permits more precise control over the object.

BibTex


@inproceedings{duan2025diffoom,
    title={A Diffusion-Based Framework for Occluded Object Movement},
    author={Duan, Zheng-Peng and Zhang, Jiawei and Liu, Siyu and Lin, Zheng and Guo, Chun-Le and Zou, Dongqing and Ren, Jimmy and Li, Chongyi},
    journal={AAAI},
    year={2025}
}
                

Contact

Feel free to contact us at adamduan0211[AT]gmail.com!