Company Logo

All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent

Yandan Yang1✶    Baoxiong Jia1✶✉️    Shujie Zhang1,2    Siyuan Huang1✉️ 1State Key Laboratory of General Artificial Intelligence, BIGAI    2Tsinghua University

✶ indicates equal contribution     ✉️ indicates corresponding author

NeurIPS 2025

Abstract

Indoor scene synthesis has become increasingly important with the rise of Embodied AI, which requires 3D environments that are not only visually realistic but also physically plausible and functionally diverse. While recent approaches have advanced visual fidelity, they often remain constrained to fixed scene categories, lack sufficient object-level detail and physical consistency, and struggle to align with complex user instructions. In this work, we present SceneWeaver, a reflective agentic framework that unifies diverse scene synthesis paradigms through tool-based iterative refinement. At its core, SceneWeaver employs a language model-based planner to select from a suite of extensible scene generation tools, ranging from data-driven generative models to visual- and LLM-based methods, guided by self-evaluation of physical plausibility, visual realism, and semantic alignment with user input. This closed-loop reason-act-reflect design enables the agent to identify semantic inconsistencies, invoke targeted tools, and update the environment over successive iterations. Extensive experiments on both common and open-vocabulary room types demonstrate that SceneWeaver not only outperforms prior methods on physical, visual, and semantic metrics, but also generalizes effectively to complex scenes with diverse instructions, marking a step toward general-purpose 3D environment generation.

Model

Generated Scene of Different Room Type

Kitchen

Bedroom

Children Room

Meeting Room

Kitchen

Bathroom

Office

Office

Restaurant

Gym

Meeting Room

Living Room

Generated Scene of Complex Prompt

A bedroom rich of furniture, decoration on the wall, and small objects.

A laundromat with 10 machines. Add washing supplies on each machine. Add other related objects, such as baskets, and washthub in the room.

A garage with a car in the center. Add a work bench and shelf with related tools.

With room structure

We show some examples of scene generation with room structures like windows and doors, which we obmit in the main experiments.

Robot Interaction

Our code can easily export the generated scenes as USD files and load them into Isaac Sim. Through Apple Vision Pro, we remotely control a Unitree G1 humanoid robot to perform object interactions.
Three key advantages of SceneWeaver for embodied AI applications:
✓ High-fidelity simulation with preserved textures and geometric details.
✓ Robust physical interactions guaranteed by collision-free and boundary-constrained object placement.
✓ Task-aligned scene layouts that adapt to diverse EAI requirements through controllable synthesis.

Tool Cards

Examples of Different Tools

Results of different Initializer for bedroom generation.

Results of Add Crowd. We show samples of crowded shelf generation in two scenes: bookstore and kitchen.

Results of Add 2D Guidance. We show samples of crowded shelf generation in two scenes: bookstore and kitchen.

Compare with Previous Methods

BibTeX


        @inproceedings{yang2025sceneweaver,
          title={SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent},
          author={Yang, Yandan and Jia, Baoxiong and Zhang, Shujie and Huang, Siyuan},
          booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
          year={2025}
        }