I am a Research Engineer at Sprinklr working on Conversational AI chatbots.
I graduated from IIIT-Hyderabad with a Dual (B.Tech + MS) Degree in Computer Science. I worked under Prof. Madhava Krishna at the Robotics Research Center, where I worked on the areas of Autonomous Driving along with Prof. Arun K. Singh. Later, I also collaborated with Dr. Krishna Murthy Jatavallabhula and we studied application of some Vision-Language Models in Autonomous Driving. I also explored Swarm Robotics along with Prof. Harikumar Kandath.
If you wish to connect, please drop an email to vikrant.dewangan@research.iiit.ac.in
News
Mar, 2024 | Served as a reviewer for IROS-2024 |
Jan, 2024 | Our paper on Vision-language models in Autonomous Driving titled “Talk2BEV: Language-enhanced Bird’s Eye View maps” gets accepted into ICRA-2024 |
Dec, 2023 | Defended my Master’s thesis at IIIT-Hyderabad. |
Nov, 2023 | Our paper on Swarm Robotics titled “MPC-Based Obstacle Aware Multi-UAV Formation Control Under Imperfect Communication” gets accepted into ICARA-2024 |
Oct, 2023 | Served as a reviewer for ICRA-2024 and IEEE ICGVIP-2024 |
May, 2023 | Our paper on Uncertainty titled UAP-BEV: Uncertainty Aware Planning in Bird’s Eye View Representations accepted into CASE-2024 |
2022 | Joined Robotics Research Center as a researcher |
2018 | Started at IIIT Hyderabad as an undergraduate student |
Selected Publications
Talk2BEV: Language-Enhanced Bird's Eye View (BEV) Maps
ICRA 2024
Talk2BEV is a large vision-language model (LVLM) interface for bird's-eye view (BEV) maps in autonomous driving contexts. While existing perception systems for autonomous driving scenarios have largely focused on a pre-defined (closed) set of object categories and driving scenarios, Talk2BEV blends recent advances in general-purpose language and vision models with BEV-structured map representations, eliminating the need for task-specific models. This enables a single system to cater to a variety of autonomous driving tasks encompassing visual and spatial reasoning, predicting the intents of traffic actors, and decision-making based on visual cues. We extensively evaluate Talk2BEV on a large number of scene understanding tasks that rely on both the ability to interpret free-form natural language queries, and in grounding these queries to the visual context embedded into the language-enhanced BEV map. To enable further research in LVLMs for autonomous driving scenarios, we develop and release Talk2BEV-Bench, a benchmark encompassing 1000 human-annotated BEV scenarios, with more than 20,000 questions and ground-truth responses from the NuScenes dataset.