Vikrant Dewangan

Research

My research studies Bird’s Eye View (BEV) perception methods in autonomous driving under the supervision of under Prof. Madhava Krishna. With Dr. Krishna Murthy Jatavallabhula, I studied vision-language models (VLMs) for BEV methods in autonomous driving and extended the networks with semantic understanding and visual reasoning capabilities. Under Prof. Arun K. Singh, I also devised novel uncertainty-aware planners within end-to-end BEV frameworks.

I briefly explored distributed Model Predictive Control (MPC) algorithms for multi-UAV formation control under Prof. Harikumar Kandath. Within the MPC framework, I integrated collision avoidance and robustness to imperfect communication.

At Sprinklr AI, I studied methods efficient tokenization strategies to adapt large-language models (LLMs) to low-resource languages.

Publications

When Every Token Counts: Optimal Segmentation for Low-Resource Language Models

Bharath Raj S* , Garvit Suri* , Vikrant Dewangan* , Raghav Sonavane

Language Models for Low-Resource Languages (LoResLM) Workshop @ COLING 2025

HTML PDF VIDEO

Traditional greedy tokenization methods have been a critical step in Natural Language Processing (NLP), influencing how text is converted into tokens and directly impacting model performance. While subword tokenizers like Byte-Pair Encoding (BPE) are widely used, questions remain about their optimality across model scales and languages. In this work, we demonstrate through extensive experiments that an optimal BPE configuration significantly reduces token count compared to greedy segmentation, yielding improvements in token-saving percentages and performance benefits, particularly for smaller models. We evaluate tokenization performance across various intrinsic and extrinsic tasks, including generation and classification. Our findings suggest that compression-optimized tokenization strategies could provide substantial advantages for multilingual and low-resource language applications, highlighting a promising direction for further research and inclusive NLP.

Talk2BEV: Language-Enhanced Bird's Eye View (BEV) Maps

Vikrant Dewangan* , Tushar Choudhary* , Shivam Chandhok* , Shubham Priyadarshan , Anushka Jain , Arun K. Singh , Siddharth Srivastava , Krishna Murthy Jatavallabhula† , K. Madhava Krishna†

ICRA 2024

HTML PDF VIDEO

Talk2BEV is a large vision-language model (LVLM) interface for bird's-eye view (BEV) maps in autonomous driving contexts. While existing perception systems for autonomous driving scenarios have largely focused on a pre-defined (closed) set of object categories and driving scenarios, Talk2BEV blends recent advances in general-purpose language and vision models with BEV-structured map representations, eliminating the need for task-specific models. This enables a single system to cater to a variety of autonomous driving tasks encompassing visual and spatial reasoning, predicting the intents of traffic actors, and decision-making based on visual cues. We extensively evaluate Talk2BEV on a large number of scene understanding tasks that rely on both the ability to interpret free-form natural language queries, and in grounding these queries to the visual context embedded into the language-enhanced BEV map. To enable further research in LVLMs for autonomous driving scenarios, we develop and release Talk2BEV-Bench, a benchmark encompassing 1000 human-annotated BEV scenarios, with more than 20,000 questions and ground-truth responses from the NuScenes dataset.

MPC-Based Obstacle Aware Multi-UAV Formation Control Under Imperfect Communication

Vikrant Dewangan , Harikumar Kandath

ICARA 2024

HTML PDF VIDEO

We propose a distributed Model Predictive Controller to solve the problem of multiple fixed-wing UAV formation control and collision avoidance under time-varying communication constraints. Under a graph theoretic leader-followed based formation control pattern, we exploit the Artifical Potential Fields (APF) collision avoidance. We then investigate the model by simulating time-delays, missing linkages and noise to the existing topology, and seek to improve the performance using Kalman filters with Gaussian assumption. We introduce holistic cost function which incorporates formation, reference objective and controls, we are able to guarantee early convergence for the UAV system. We perform simulations in Microsoft AirSim, and analyze the results to study the validity and effectiveness of our implemented model.

UAP-BEV: Uncertainty Aware Planning using Bird's Eye View generated from Surround Monocular Images

Vikrant Dewangan , Basant Sharma , Tushar Choudhary , Sarthak Sharma , Aakash Aanegola , Arun K. Singh , K. Madhava Krishna†

CASE 2023

HTML PDF VIDEO

Autonomous driving requires accurate reasoning of the location of objects from raw sensor data. Recent end-to-end learning methods go from raw sensor data to a trajectory output via Bird's Eye View(BEV) segmentation as an interpretable intermediate representation. Motion planning over cost maps generated via Birds Eye View (BEV) segmentation has emerged as a prominent approach in autonomous driving. However, the current approaches have two critical gaps. First, the optimization process is simplistic and involves just evaluating a fixed set of trajectories over the cost map. The trajectory samples are not adapted based on their associated cost values. Second, the existing cost maps do not account for the uncertainty in the cost maps that can arise due to noise in RGB images, and BEV annotations. As a result, these approaches can struggle in challenging scenarios where there is abrupt cut-in, stopping, overtaking, merging, etc from the neighboring vehicles. In this paper, we propose UAP-BEV: A novel approach that models the noise in Spatio-Temporal BEV predictions to create an uncertainty-aware occupancy grid map. Using queries of the distance to the closest occupied cell, we obtain a sample estimate of the collision probability of the ego-vehicle. Subsequently, our approach uses gradient-free sampling-based optimization to compute low-cost trajectories over the cost map. Importantly, the sampling distribution is adapted based on the optimal cost values of the sampled trajectories. By explicitly modeling probabilistic collision avoidance in the BEV space, our approach is able to outperform the cost-map-based baselines in collision avoidance, route completion, time to completion, and smoothness. To further validate our method, we also show results on the real-world dataset NuScenes, where we report improvements in collision avoidance and smoothness.