Bhavik Shangari | Embodied AI & Robotics

Researcher working at the intersection of perception, policy learning, and robotic action.

I completed my B.Tech in Data Science & Artificial Intelligence at IIT Bhilai, with coursework spanning machine learning, NLP, adversarial ML, robotic systems, and algorithms.

I am deeply passionate about intelligent robotic systems and human-robot interaction, with a strong background in artificial intelligence, deep learning, and robotics.

My work with large language models and vision-language models drives my interest in using multimodal AI to help robots perceive the world, adapt to new tasks, and collaborate naturally with people.

I will be joining Prof. Harold Soh's CLEAR Lab as a Research Assistant on the CNRS@CREATE Embodied AI project, which focuses on human-robot collaborative task execution by connecting perception, reasoning, planning, and manipulation.

I received IIT Bhilai's Young Researcher Award for my accepted EACL research paper. My research profile is also available on Google Scholar.

I was also Coordinator of the Data Science & Artificial Intelligence Club at IIT Bhilai, where I organized workshops and hackathons to foster a culture of innovation and learning.

Reinforcement learning for adaptive robotic control

Vision-language models for perception and reasoning

Human-robot interaction and embodied intelligence

Technical skills for real-time AI systems, robotics, and multimodal learning.

Languages

Python, C, data structures, algorithms, and production-oriented ML scripting.

Frameworks

PyTorch, TensorFlow, Keras, Transformers, Torchvision, OpenCV, scikit-learn, and ROS2.

Deployment

Docker, ONNX, TensorRT, Streamlit, Linux, Git, Jetson Nano, and edge AI optimization.

Publications.

Work on multimodal models that understand visual structure, reason about context, and support action in physical environments.

TechING: Towards Real World Technical Image Understanding via VLMs

Bhavik Shangari with Dr. Ashutosh Modi and Dr. Gagan Raj Gupta · Accepted at EACL 2026 · Google Scholar

Contributed TechING, a dataset for technical image generation and four downstream tasks.
Contributed LLaMA-VL-TUG, which surpassed GPT-4o-mini on technical image generation tasks.
Led synthetic technical diagram data generation, methodology design, and model training.

Work experience.

Research and industry experience across computer vision, robotics, multimodal LLMs, medical foundation models, and real-time AI systems.

Scientific Image Understanding

Benchmarked open-source and closed-source models for scientific image understanding and knowledge extraction.

Multimodal Alignment

Working on vision-code alignment to improve visual-textual data integration and in-context task adaptation.

AI Security

Implemented jailbreak prompt attacks and perception-based jailbreak pipelines to study generative AI vulnerabilities.

Incoming Research Assistant · CLEAR Lab

Prof. Harold Soh · NUS · CNRS@CREATE Embodied AI project

Joining a project focused on embodied AI for human-robot collaborative task execution.
Continuing work on systems that connect perception, reasoning, planning, manipulation, and action.

Visiting Scholar · National University of Singapore

Dr. Dianbo Liu · May 2025 - Present

Built an anomaly-aware retinal foundation model using self-supervised learning.
Validated learned representations through linear-probing and finetuning benchmarks, beating state-of-the-art methods.

AI / Data Science Intern · Assurant

Fortune 500 · Atlanta / Remote · May 2025 - Jul 2025

Built components of a multi-agent multimodal system integrating last-mile logistics data for predictive AI.

Research Collaborator · S3 Labs

Under Dr. Gagan Raj Gupta

Benchmarked model families for scientific image understanding and knowledge extraction.
Explored multimodal vision-code alignment for visual-textual data integration.
Targeted few-shot and single-shot in-context learning for robotic task adaptation.

Deep Learning Research Intern · Secure Your Hacks

Analyzed and implemented recent papers on large language models and image generation security.
Integrated jailbreak prompt attacks with perception-based jailbreak methods.

Research Intern · IIT Delhi

Under Dr. Vamsi Chalamalla

Led image enhancement integration for remotely operated vehicles in underwater inspection tasks.
Benchmarked image enhancement and restoration techniques, including GAN-based approaches.

Projects spanning robot action policies, edge VLMs, perception, and applied ML.

Vision Language Action Model for Robot Action Control

Jetson-VLA

Implemented the OpenVLA approach inside Jetson-VLM for robotic action policy learning.
Used Bridge Data V2 for vision-language-action training toward manipulation tasks.

Vision Language Model Pretraining & Instruction Tuning

Jetson-VLM

Built a lightweight multimodal model using SigLIP, DINOv2, and LLaMA 3.2 1B.
Trained with LLaVA 595K v1.5 data for edge deployment on low-powered devices.

YOLO Implementation from Scratch

Implemented YOLO on Pascal VOC and built a Streamlit GUI application.
Packaged the training and visualization workflow with Docker.

Gesture Controlled Robotic Arm

Built a robotic arm using computer vision, Arduino, and linear algebra.
Generated motor signals from hand gestures and collected data for imitation learning.
Watch demo

Stock Market Forecasting with Transformers

Implemented a PyTorch transformer for forecasting with integrated sentiment analysis.
Developed Date&Time2Vec for temporal modeling across stock and news data.

CloudPhysician

Built a computer vision and OCR pipeline to extract patient monitor vital signs from CCTV feeds.
Used YOLO-based segmentation and threading for faster real-time processing.

Achievements.

IIT Bhilai Young Researcher Award for accepted EACL research paper
Global Rank 6 in International Robo Cup Challenge
Drishti Fellowship, TIH IIT Indore, for smart physiotherapy device development
JEE Advanced AIR 3989 and JEE Mains AIR 4160
Qualified Kishore Vaigyanik Protsahan Yojana (KVPY'21)

Building intelligence that can act.