Interpolate start reference image. ExploreVLA

Dense World Modeling and Exploration for End-to-End Autonomous Driving

Zihao Sheng1,2★, Xin Ye1,, Jingru Luo1, Sikai Chen2, Liu Ren1
1Bosch Research North America & Bosch Center for Artificial Intelligence (BCAI)
2University of Wisconsin-Madison
Work was done during internship at Bosch   Corresponding Author

ExploreVLA uses the world model's prediction uncertainty as an intrinsic reward to guide safe, novelty-aware exploration via GRPO, achieving SOTA on NAVSIM.


🤩 Highlights

Interpolate start reference image.

1. World Model as Exploration Compass: We repurpose the world model's image prediction uncertainty as an intrinsic exploration signal, naturally measuring trajectory novelty relative to the training distribution.


2. Safety-Gated Exploration: A PDMS-gated reward ensures only safe out-of-distribution trajectories receive exploration bonuses, enabling the policy to expand its behavioral repertoire without compromising driving safety.


3. SOTA Performance: ExploreVLA achieves 93.7 PDMS and 88.8 EPDMS on NAVSIM using only a single front-view camera, outperforming multi-sensor methods.



Interpolate start reference image. ExploreVLA Architecture

Interpolate start reference image.

Image Generation and Trajectory Planning

Comparison

Each video shows our model on the top and the baseline without dense world modeling on the bottom.

Performance Results

Table 1: Quantitative comparison on NAVSIM v1. The best performance is marked in bold, and the second best is underlined.

Table 1

Table 2: Quantitative comparison on NAVSIM v2. The best performance is marked in bold, and the second best is underlined.

Table 2

BibTeX

@article{sheng2026explorevla,
  title={ExploreVLA: Dense World Modeling and Exploration for End-to-End Autonomous Driving},
  author={Sheng, Zihao and Ye, Xin and Luo, Jingru and Chen, Sikai and Ren, Liu},
  journal={arXiv},
  year={2026}
}