ExploreVLA

Dense World Modeling and Exploration for End-to-End Autonomous Driving

Zihao Sheng^1,2★, Xin Ye^1,, Jingru Luo¹, Sikai Chen², Liu Ren¹

¹Bosch Research North America & Bosch Center for Artificial Intelligence (BCAI)
²University of Wisconsin-Madison

^★Work was done during internship at Bosch Corresponding Author

ExploreVLA uses the world model's prediction uncertainty as an intrinsic reward to guide safe, novelty-aware exploration via GRPO, achieving SOTA on NAVSIM.

🤩 Highlights

1. World Model as Exploration Compass: We repurpose the world model's image prediction uncertainty as an intrinsic exploration signal, naturally measuring trajectory novelty relative to the training distribution.

2. Safety-Gated Exploration: A PDMS-gated reward ensures only safe out-of-distribution trajectories receive exploration bonuses, enabling the policy to expand its behavioral repertoire without compromising driving safety.

3. SOTA Performance: ExploreVLA achieves 93.7 PDMS and 88.8 EPDMS on NAVSIM using only a single front-view camera, outperforming multi-sensor methods.

Comparison

Each video shows our model on the top and the baseline without dense world modeling on the bottom.

Performance Results

Table 1: Quantitative comparison on NAVSIM v1. The best performance is marked in bold, and the second best is underlined.

Table 2: Quantitative comparison on NAVSIM v2. The best performance is marked in bold, and the second best is underlined.

BibTeX

@article{sheng2026explorevla,
  title={ExploreVLA: Dense World Modeling and Exploration for End-to-End Autonomous Driving},
  author={Sheng, Zihao and Ye, Xin and Luo, Jingru and Chen, Sikai and Ren, Liu},
  journal={arXiv preprint arXiv:2604.02714},
  year={2026}
}