Publications

Selected Publications

	MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs International Conference on Learning Representations (ICLR) -- 2025 This work investigates Multimodal Large Language Models' (MLLMs) ability to perceive small versus large visual details in question answering tasks. The study shows that MLLMs' accuracy is sensitive to subject size and can be improved using visual cropping methods. These findings suggest caution and potential improvements for detail-sensitive applications.
	FIRE: Food Image to REcipe generation IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) -- 2024 This paper introduces FIRE, a novel multimodal methodology for generating recipes from food images, contributing to the growing field of food computing. FIRE effectively produces food titles, ingredients, and cooking instructions using the BLIP model, a Vision Transformer with a decoder, and the T5 model. The paper also explores practical applications like recipe customization and recipe-to-code generation for automated cooking.
	Knowledge-enhanced Agents for Interactive Text Games International Conference on Knowledge Capture (KCap) -- 2023 🏆 Best Student Paper Award 🏆 This paper introduces a knowledge-injection framework to enhance the functional grounding of agents in text-based games, addressing existing limitations in coherence, contextual awareness, and learning. It incorporates domain knowledge through memory of past actions and object affordances, aiding two types of agents: reinforcement learning and language model agents. The framework employs strategies like knowledge graphs and input encoding augmentations. Tested on 10 tasks in the ScienceWorld environment, the study reveals how task properties, model architectures, and domain knowledge interact in interactive contexts.
	Privacy Aware Question-Answering System for Online Mental Health Risk Assessment ACL Workshop on Biomedical Natural Language Processing (BioNLP) -- 2023 This paper explores using pre-trained Language Models (LMs) for assessing mental health risk from social media data. A Question-Answering (QA) approach, utilizing the Unified-QA model, is proposed for analyzing two large mental health datasets. To ensure user privacy, the model is trained with differential privacy techniques. The results show that treating risk assessment as a QA task is effective for mental health scenarios, with minimal performance loss (less than 1%) due to privacy safeguards. This approach signifies a promising direction for creating privacy-conscious diagnostic systems in mental health.
	Visual Cropping Improves Zero-Shot Question Answering of Multimodal Large Language Models NeurIPS Workshop on Robustness of Few-shot and Zero-shot Learning in Foundation Models -- 2023 This paper examines the limitations of Multimodal Large Language Models (LLMs) in visual question answering (VQA), particularly their sensitivity to the size of visual details in images. The study finds that the zero-shot accuracy of these models decreases by up to 46% with smaller visual subjects. Human visual cropping is shown to mitigate this issue, indicating a causal relationship. The paper proposes three automatic visual cropping methods to enhance zero-shot performance in multimodal LLMs. These methods are evaluated on four VQA datasets and a VQAv2 subset focused on fine details. The results highlight the need for caution in using multimodal LLMs for detail-sensitive VQA tasks and suggest visual cropping as a viable solution for improving performance.
	DIGITOUR: Automatic Digital Tours for Real-Estate Properties International Conference on Data Science & Management of Data (CODS-COMAD) -- 2023 This paper presents an automated pipeline for creating 3D virtual tours in real estate, addressing the time and cost challenges of manual annotation in traditional methods. It introduces a novel HSV-based coloring scheme for paper tags, placed in locations before capturing 360° equirectangular images. These tags are uniquely numbered and bi-colored, enhancing tag detection and digit recognition using YOLOv5 and a custom MobileNet architecture, respectively. The method links equirectangular images based on these detected tags, demonstrating its efficiency with a real-world dataset from Housing.com.
	RE-Tagger: A light-weight Real-Estate Image Classifier European Conference on Machine Learning (ECML) -- 2022 Real-estate image tagging is one of the essential use-cases to save efforts involved in manual annotation and enhance the user experience. This paper proposes an end-to-end pipeline (referred to as RE-Tagger) for the real-estate image classification problem. We present a two-stage transfer learning approach using custom InceptionV3 architecture to classify images into different categories (i.e., bedroom, bathroom, kitchen, balcony, hall, and others).

All Publications

You can find all my publications at Google Scholar or ResearchGate.

Conferences / Workshops

CORE A*

MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs

International Conference on Learning Representations (ICLR) -- 2025

J Zhang, M Khayatkhoei, P Chhikara, and F Ilievski

Visual Cropping Improves Zero-Shot Question Answering of Multimodal Large Language Models

NeurIPS Workshop on Robustness of Few-shot and Zero-shot Learning in Foundation Models -- 2023

J Zhang, M Khayatkhoei, P Chhikara, and F Ilievski

Privacy Aware Question-Answering System for Online Mental Health Risk Assessment

ACL Proceedings of the 22nd Workshop on Biomedical Language Processing -- 2023

P Chhikara, U Pasupulety, J Marshall, D Chaurasia, and S Kumari

Federated Learning-based Aerial Image Segmentation for collision-free Movement and Landing

Proceedings of the 4th ACM MobiCom Workshops -- 2021

P Chhikara, R Tekchandani, N Kumar, and S Tanwar

CORE A

FIRE: Food Image to REcipe generation

IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) -- 2024

P Chhikara, D Chaurasia, Y Jiang, O Masur, and F Ilievski

RE-Tagger: A light-weight Real-Estate Image Classifier

European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD) -- 2022

P Chhikara, A Goyal, and C Sharma

CORE B

Knowledge-enhanced Agents for Interactive Text Games

International Conference on Knowledge Capture (KCap) -- 2023

P Chhikara, J Zhang, F Ilievski, J Francis, and K Ma

🏆 Best Student Paper Award 🏆

DIGITOUR: Automatic Digital Tours for Real-Estate Properties

ACM CODS-COMAD -- 2023

P Chhikara, H Kuhar, A Goyal, and C Sharma

Federated Learning for Air Quality Index Prediction using UAV Swarm Networks

IEEE Global Communications Conference (GLOBECOM) -- 2021

P Chhikara, R Tekchandani, N Kumar, S Tanwar, and JJPC Rodrigues

An efficient scheme for wireless charging of electric vehicles using RFID with an optimal path planning

IEEE Globecom Workshops (GC Wkshps) -- 2019

S Arora, S Goel, P Chhikara, H Singh, N Kumar, and PS Rana

Unranked

An ensemble approach for extractive text summarization

International Conference on Emerging Trends in Information Technology and Engineering -- 2020

P Singh, P Chhikara, and J Singh

Journals

Quartile 1 (Q1)

A Differentially Privacy Assisted Federated Learning Scheme to Preserve Data Privacy for IoMT Applications

IEEE Transactions on Network and Service Management -- 2024

A Barnawi, P Chhikara, R Tekchandani, N Kumar, and B Alzahrani

Sea-Pix-GAN: Underwater image enhancement using adversarial neural network

Journal of Visual Communication and Image Representation -- 2024

D Chaurasia and P Chhikara

Federated learning and autonomous UAVs for hazardous zone detection and AQI prediction in IoT environment

IEEE Internet of Things Journal -- 2021

P Chhikara, R Tekchandani, N Kumar, M Guizani, and MM Hassan

Artificial intelligence-enabled Internet of Things-based system for COVID-19 screening using aerial thermal imaging

Future Generation Computer Systems -- 2021

A Barnawi, P Chhikara, R Tekchandani, N Kumar, and B Alzahrani

DCNN-GA: A Deep Neural Net Architecture for Navigation of UAV in Indoor Environment

IEEE Internet of Things Journal -- 2020

P Chhikara, R Tekchandani, N Kumar, V Chamola, and M Guizani

Federated Learning meets Human Emotions: a Decentralized Framework for Human-Computer Interaction for IoT Applications

IEEE Internet of Things Journal -- 2020

P Chhikara, P Singh, R Tekchandani, N Kumar, and M Guizani

An Efficient Container Management Scheme for Resource Constrained Intelligent IoT Devices

IEEE Internet of Things Journal -- 2020

P Chhikara, R Tekchandani, N Kumar, and MS Obaidat

Quartile 2 (Q2)

Adaptive federated learning scheme for recognition of malicious attacks in an IoT network

Computing -- 2023

P Chhikara, R Tekchandani, and N Kumar

A CNN-based scheme for COVID-19 detection with emergency services provisions using an optimal path planning

Multimedia Systems -- 2021

A Barnawi, P Chhikara, R Tekchandani, N Kumar, and M Boulares

Data dimensionality reduction techniques for Industry 4.0: Research results, challenges, and future research directions

Software: Practice and Experience -- 2020

P Chhikara, N Jain, R Tekchandani, and N Kumar

Quartile 3 (Q3)

A Deep Transfer Learning based model for Automatic Detection of COVID-19 from Chest X-Rays

Turkish Journal of Electrical Engineering and Computer Sciences -- 2021

P Chhikara, P Gupta, P Singh, and T Bhatia

Book Chapter

Deep convolutional neural network with transfer learning for detecting pneumonia on chest X-rays

Advances in Bioinformatics, Multimedia, and Electronics Circuits and Signals -- 2019

P Chhikara, P Singh, P Gupta, and T Bhatia

Prateek Chhikara

Publications

Selected Publications

All Publications

Conferences / Workshops

CORE A*

CORE A

CORE B

Unranked

Journals

Quartile 1 (Q1)

Quartile 2 (Q2)

Quartile 3 (Q3)

Book Chapter