Bhavik Shangari
Embodied AI · Robotics · Machine Learning

Building intelligence that can act.

Researcher working at the intersection of perception, policy learning, and robotic action.

I completed my B.Tech in Data Science & Artificial Intelligence at IIT Bhilai, with coursework spanning machine learning, NLP, adversarial ML, robotic systems, and algorithms.

I am deeply passionate about intelligent robotic systems and human-robot interaction, with a strong background in artificial intelligence, deep learning, and robotics.

My work with large language models and vision-language models drives my interest in using multimodal AI to help robots perceive the world, adapt to new tasks, and collaborate naturally with people.

I will be joining Prof. Harold Soh's CLEAR Lab as a Research Assistant on the CNRS@CREATE Embodied AI project, which focuses on human-robot collaborative task execution by connecting perception, reasoning, planning, and manipulation.

I received IIT Bhilai's Young Researcher Award for my accepted EACL research paper. My research profile is also available on Google Scholar.

I was also Coordinator of the Data Science & Artificial Intelligence Club at IIT Bhilai, where I organized workshops and hackathons to foster a culture of innovation and learning.

Reinforcement learning for adaptive robotic control
Vision-language models for perception and reasoning
Human-robot interaction and embodied intelligence

Technical skills for real-time AI systems, robotics, and multimodal learning.

Languages

Python, C, data structures, algorithms, and production-oriented ML scripting.

Frameworks

PyTorch, TensorFlow, Keras, Transformers, Torchvision, OpenCV, scikit-learn, and ROS2.

Deployment

Docker, ONNX, TensorRT, Streamlit, Linux, Git, Jetson Nano, and edge AI optimization.

Publications.

Work on multimodal models that understand visual structure, reason about context, and support action in physical environments.

TechING: Towards Real World Technical Image Understanding via VLMs

Bhavik Shangari with Dr. Ashutosh Modi and Dr. Gagan Raj Gupta · Accepted at EACL 2026 · Google Scholar

  • Contributed TechING, a dataset for technical image generation and four downstream tasks.
  • Contributed LLaMA-VL-TUG, which surpassed GPT-4o-mini on technical image generation tasks.
  • Led synthetic technical diagram data generation, methodology design, and model training.

Work experience.

Research and industry experience across computer vision, robotics, multimodal LLMs, medical foundation models, and real-time AI systems.

Scientific Image Understanding

Benchmarked open-source and closed-source models for scientific image understanding and knowledge extraction.

Multimodal Alignment

Working on vision-code alignment to improve visual-textual data integration and in-context task adaptation.

AI Security

Implemented jailbreak prompt attacks and perception-based jailbreak pipelines to study generative AI vulnerabilities.

Incoming Research Assistant · CLEAR Lab

Prof. Harold Soh · NUS · CNRS@CREATE Embodied AI project

  • Joining a project focused on embodied AI for human-robot collaborative task execution.
  • Continuing work on systems that connect perception, reasoning, planning, manipulation, and action.

Visiting Scholar · National University of Singapore

Dr. Dianbo Liu · May 2025 - Present

  • Built an anomaly-aware retinal foundation model using self-supervised learning.
  • Validated learned representations through linear-probing and finetuning benchmarks, beating state-of-the-art methods.

AI / Data Science Intern · Assurant

Fortune 500 · Atlanta / Remote · May 2025 - Jul 2025

  • Built components of a multi-agent multimodal system integrating last-mile logistics data for predictive AI.

Research Collaborator · S3 Labs

Under Dr. Gagan Raj Gupta

  • Benchmarked model families for scientific image understanding and knowledge extraction.
  • Explored multimodal vision-code alignment for visual-textual data integration.
  • Targeted few-shot and single-shot in-context learning for robotic task adaptation.

Deep Learning Research Intern · Secure Your Hacks

  • Analyzed and implemented recent papers on large language models and image generation security.
  • Integrated jailbreak prompt attacks with perception-based jailbreak methods.

Research Intern · IIT Delhi

Under Dr. Vamsi Chalamalla

  • Led image enhancement integration for remotely operated vehicles in underwater inspection tasks.
  • Benchmarked image enhancement and restoration techniques, including GAN-based approaches.

Projects spanning robot action policies, edge VLMs, perception, and applied ML.

YOLO Implementation from Scratch

  • Implemented YOLO on Pascal VOC and built a Streamlit GUI application.
  • Packaged the training and visualization workflow with Docker.

Gesture Controlled Robotic Arm

  • Built a robotic arm using computer vision, Arduino, and linear algebra.
  • Generated motor signals from hand gestures and collected data for imitation learning.
  • Watch demo

CloudPhysician

  • Built a computer vision and OCR pipeline to extract patient monitor vital signs from CCTV feeds.
  • Used YOLO-based segmentation and threading for faster real-time processing.

Achievements.

Recent talks and DSAI Club sessions.