Python, C, data structures, algorithms, and production-oriented ML scripting.
Researcher working at the intersection of perception, policy learning, and robotic action.
I completed my B.Tech in Data Science & Artificial Intelligence at IIT Bhilai, with coursework spanning machine learning, NLP, adversarial ML, robotic systems, and algorithms.
I am deeply passionate about intelligent robotic systems and human-robot interaction, with a strong background in artificial intelligence, deep learning, and robotics.
My work with large language models and vision-language models drives my interest in using multimodal AI to help robots perceive the world, adapt to new tasks, and collaborate naturally with people.
I will be joining Prof. Harold Soh's CLEAR Lab as a Research Assistant on the CNRS@CREATE Embodied AI project, which focuses on human-robot collaborative task execution by connecting perception, reasoning, planning, and manipulation.
I received IIT Bhilai's Young Researcher Award for my accepted EACL research paper. My research profile is also available on Google Scholar.
I was also Coordinator of the Data Science & Artificial Intelligence Club at IIT Bhilai, where I organized workshops and hackathons to foster a culture of innovation and learning.
Technical skills for real-time AI systems, robotics, and multimodal learning.
PyTorch, TensorFlow, Keras, Transformers, Torchvision, OpenCV, scikit-learn, and ROS2.
Docker, ONNX, TensorRT, Streamlit, Linux, Git, Jetson Nano, and edge AI optimization.
Publications.
Work on multimodal models that understand visual structure, reason about context, and support action in physical environments.
TechING: Towards Real World Technical Image Understanding via VLMs
- Contributed TechING, a dataset for technical image generation and four downstream tasks.
- Contributed LLaMA-VL-TUG, which surpassed GPT-4o-mini on technical image generation tasks.
- Led synthetic technical diagram data generation, methodology design, and model training.
Work experience.
Research and industry experience across computer vision, robotics, multimodal LLMs, medical foundation models, and real-time AI systems.
Benchmarked open-source and closed-source models for scientific image understanding and knowledge extraction.
Working on vision-code alignment to improve visual-textual data integration and in-context task adaptation.
Implemented jailbreak prompt attacks and perception-based jailbreak pipelines to study generative AI vulnerabilities.
Incoming Research Assistant · CLEAR Lab
- Joining a project focused on embodied AI for human-robot collaborative task execution.
- Continuing work on systems that connect perception, reasoning, planning, manipulation, and action.
Visiting Scholar · National University of Singapore
- Built an anomaly-aware retinal foundation model using self-supervised learning.
- Validated learned representations through linear-probing and finetuning benchmarks, beating state-of-the-art methods.
AI / Data Science Intern · Assurant
- Built components of a multi-agent multimodal system integrating last-mile logistics data for predictive AI.
Research Collaborator · S3 Labs
- Benchmarked model families for scientific image understanding and knowledge extraction.
- Explored multimodal vision-code alignment for visual-textual data integration.
- Targeted few-shot and single-shot in-context learning for robotic task adaptation.
Deep Learning Research Intern · Secure Your Hacks
- Analyzed and implemented recent papers on large language models and image generation security.
- Integrated jailbreak prompt attacks with perception-based jailbreak methods.
Research Intern · IIT Delhi
- Led image enhancement integration for remotely operated vehicles in underwater inspection tasks.
- Benchmarked image enhancement and restoration techniques, including GAN-based approaches.
Projects spanning robot action policies, edge VLMs, perception, and applied ML.
Vision Language Action Model for Robot Action Control
- Implemented the OpenVLA approach inside Jetson-VLM for robotic action policy learning.
- Used Bridge Data V2 for vision-language-action training toward manipulation tasks.
Vision Language Model Pretraining & Instruction Tuning
- Built a lightweight multimodal model using SigLIP, DINOv2, and LLaMA 3.2 1B.
- Trained with LLaVA 595K v1.5 data for edge deployment on low-powered devices.
YOLO Implementation from Scratch
- Implemented YOLO on Pascal VOC and built a Streamlit GUI application.
- Packaged the training and visualization workflow with Docker.
Gesture Controlled Robotic Arm
- Built a robotic arm using computer vision, Arduino, and linear algebra.
- Generated motor signals from hand gestures and collected data for imitation learning.
- Watch demo
Stock Market Forecasting with Transformers
- Implemented a PyTorch transformer for forecasting with integrated sentiment analysis.
- Developed Date&Time2Vec for temporal modeling across stock and news data.
CloudPhysician
- Built a computer vision and OCR pipeline to extract patient monitor vital signs from CCTV feeds.
- Used YOLO-based segmentation and threading for faster real-time processing.
Achievements.
- IIT Bhilai Young Researcher Award for accepted EACL research paper
- Global Rank 6 in International Robo Cup Challenge
- Drishti Fellowship, TIH IIT Indore, for smart physiotherapy device development
- JEE Advanced AIR 3989 and JEE Mains AIR 4160
- Qualified Kishore Vaigyanik Protsahan Yojana (KVPY'21)