Talk2Traffic

Interactive and Editable Traffic Scenario Generation for Autonomous Driving with Multimodal Large Language Model

CVPR 2025 WDFM-AD

Zihao Sheng1, Zilin Huang1, Yansong Qu2, Yue Leng3, Sikai Chen1,
1University of Wisconsin-Madison2Purdue University3Google
Corresponding Author

Case 1 (abstract command)

Language command: "Create a busy intersection with random weather and time."

Case 2 (edit road layout)


Initial scenario generation with Talk2Traffic

After editing command with Talk2Traffic

Case 3 (modify car behavior)


Initial scenario generation with Talk2Traffic

After editing command with Talk2Traffic

Case 4 (modify weather and time)


Initial scenario generation with Talk2Traffic

After editing command with Talk2Traffic

Abstract

Testing autonomous vehicles (AVs) requires diverse and escalating traffic scenarios, yet collecting real-world data remains prohibitively expensive. While simulation-based approaches offer cost-effective alternatives, most existing methods lack sufficient support for intuitive, interactive editing of generated scenarios. This paper presents Talk2Traffic, a novel framework that leverages multimodal large language models (MLLMs) to enable interactive and editable traffic scenario generation. Talk2Traffic allows human users to generate various traffic scenarios through multimodal inputs (text, speech, and sketches). Our approach employs an MLLM-based interpreter to extract structured representations from these inputs. These representations are then translated into executable Scenic code using a retrieval-augmented generation mechanism to reduce hallucinations and ensure syntactic correctness. A human feedback guidance component enables iterative refinement and editing of scenarios through natural language instructions. Experiments demonstrate that Talk2Traffic outperforms state-of-the-art methods in generating challenging scenarios across multiple dimensions. Qualitative evaluations further illustrate the framework can handle diverse input modalities and support scenario editing toward specific testing objectives.

Talk2Traffic Architecture

BibTeX

@inproceedings{sheng2025talk2traffic,
  title={Talk2Traffic: Interactive and Editable Traffic Scenario Generation for Autonomous Driving with Multimodal Large Language Model},
  author={Sheng, Zihao and Huang, Zilin and Qu, Yansong and Leng, Yue and Chen, Sikai},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
  year={2025}
}