PAPER_TITLE

FIRST_AUTHOR_LAST, FIRST_AUTHOR_FIRST; SECOND_AUTHOR_LAST, SECOND_AUTHOR_FIRST

SparseWorld: Enhancing End-to-End Autonomous Driving via World Models with Sparse Scene Representation

Ruoyu Wang^1,*, Jingke Wang^1,*, Yukai Ma^1,†, Yuehao Huang¹, Shuangming Lei¹, Guanglin Xu², Aixue Ye², Yong Liu^1,‡,

¹ Zhejiang University, ² 2012 Labs, Huawei
Accepted at IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2026)
^*Equal Contribution, ^†Project Leader, ^‡Corresponding Author

Paper Supplementary arXiv

SparseWorld in Action: Generates only agents and map layouts, improving efficiency while enhancing end-to-end planning.

Abstract

Recently, world models have made significant progress in enhancing end-to-end driving systems through both future situation forecasting and improved scene understanding. However, existing driving world models are typically built upon dense scene representations, causing high computational costs and redundant information. In this paper, we present SparseWorld, a lightweight world model that focuses on predicting only the critical layout of the scene, enabling efficient future forecasting for end-to-end driving systems. SparseWorld first performs autoregressive rollout to forecast future map elements and surrounding agents, enabling the model to learn how driving scenarios evolve over time. It then leverages these predicted futures to refine downstream motion prediction and trajectory planning. Specifically, we propose a Sparse Dreamer that anticipates future instances in the latent space through joint temporal and spatial attention. By interacting with predicted future instances, the motion planner captures more accurate motion patterns and generates more informed and safety-aware trajectories. Extensive experiments demonstrate that SparseWorld significantly reduces collision risk and achieves state-of-the-art performance on the open-loop planning metrics of the nuScenes dataset with a collision rate of 0.05%. Moreover, it substantially outperforms the baseline method in closed-loop planning metrics on the Bench2Drive benchmark.

Methodology

Second quantitative result visualization

The SparseWorld framework operates in three sequential stages. (a) Instance-Aware End-to-End Driving Baseline, (b) Future Forecasting with Sparse Dreamer and (c) Motion Planning Refinement.

Detailed structure of the Sparse Dreamer, which predicts the subsequent frame’s instance in an autoregressive manner, based on historical instances and expected action conditions.

Quantitative Results

Quantitative results for future instance forecasting, motion prediction, and trajectory planning tasks.

Future Forecasting Visualization

Qualitative results of future instance forecasting. The results are presented at various future timestamps.

Qualitative results of future instance forecasting under various weather and command conditions.

Motion Prediction Refinement Visualization

Qualitative results of motion prediction refinement on the nuScenes validation set. We visualize the most confident trajectory among the six predicted ones. The baseline result from SparseDrive is marked in orange, the refined result from SparseWorld-S is shown in purple, and the ground truth is indicated in black.

Trajectory Planning Refinement Visualization

Qualitative results of trajectory planning refinement on the nuScenes validation set, when a potential collision is detected, SparseWorld selects the safest trajectory among all candidates for execution.

BibTeX


@article{wang2026sparseworld,
  title={SparseWorld: Enhancing End-to-End Autonomous Driving via World Models with Sparse Scene Representation},
  author={Wang, Ruoyu and Wang, Jingke and Ma, Yukai and Huang, Yuehao and Lei, Shuangming and Xu, Guanglin and Ye, Aixue and Liu, Yong},
  journal={arXiv preprint arXiv:2605.24354},
  year={2026}
}

More Works from Our Lab

Paper Title 1

Paper Title 2

Paper Title 3