Publications

Selected Publications

Paper 1 Image
International Conference on Learning Representations (ICLR) -- 2025
This work investigates Multimodal Large Language Models' (MLLMs) ability to perceive small versus large visual details in question answering tasks. The study shows that MLLMs' accuracy is sensitive to subject size and can be improved using visual cropping methods. These findings suggest caution and potential improvements for detail-sensitive applications.
Paper 1 Image
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) -- 2024
This paper introduces FIRE, a novel multimodal methodology for generating recipes from food images, contributing to the growing field of food computing. FIRE effectively produces food titles, ingredients, and cooking instructions using the BLIP model, a Vision Transformer with a decoder, and the T5 model. The paper also explores practical applications like recipe customization and recipe-to-code generation for automated cooking.
Paper 1 Image
International Conference on Knowledge Capture (KCap) -- 2023
🏆 Best Student Paper Award 🏆
This paper introduces a knowledge-injection framework to enhance the functional grounding of agents in text-based games, addressing existing limitations in coherence, contextual awareness, and learning. It incorporates domain knowledge through memory of past actions and object affordances, aiding two types of agents: reinforcement learning and language model agents. The framework employs strategies like knowledge graphs and input encoding augmentations. Tested on 10 tasks in the ScienceWorld environment, the study reveals how task properties, model architectures, and domain knowledge interact in interactive contexts.
Paper 1 Image
ACL Workshop on Biomedical Natural Language Processing (BioNLP) -- 2023
This paper explores using pre-trained Language Models (LMs) for assessing mental health risk from social media data. A Question-Answering (QA) approach, utilizing the Unified-QA model, is proposed for analyzing two large mental health datasets. To ensure user privacy, the model is trained with differential privacy techniques. The results show that treating risk assessment as a QA task is effective for mental health scenarios, with minimal performance loss (less than 1%) due to privacy safeguards. This approach signifies a promising direction for creating privacy-conscious diagnostic systems in mental health.
Paper 1 Image
NeurIPS Workshop on Robustness of Few-shot and Zero-shot Learning in Foundation Models -- 2023
This paper examines the limitations of Multimodal Large Language Models (LLMs) in visual question answering (VQA), particularly their sensitivity to the size of visual details in images. The study finds that the zero-shot accuracy of these models decreases by up to 46% with smaller visual subjects. Human visual cropping is shown to mitigate this issue, indicating a causal relationship. The paper proposes three automatic visual cropping methods to enhance zero-shot performance in multimodal LLMs. These methods are evaluated on four VQA datasets and a VQAv2 subset focused on fine details. The results highlight the need for caution in using multimodal LLMs for detail-sensitive VQA tasks and suggest visual cropping as a viable solution for improving performance.
Paper 1 Image
International Conference on Data Science & Management of Data (CODS-COMAD) -- 2023
This paper presents an automated pipeline for creating 3D virtual tours in real estate, addressing the time and cost challenges of manual annotation in traditional methods. It introduces a novel HSV-based coloring scheme for paper tags, placed in locations before capturing 360° equirectangular images. These tags are uniquely numbered and bi-colored, enhancing tag detection and digit recognition using YOLOv5 and a custom MobileNet architecture, respectively. The method links equirectangular images based on these detected tags, demonstrating its efficiency with a real-world dataset from Housing.com.
Paper 1 Image
European Conference on Machine Learning (ECML) -- 2022
Real-estate image tagging is one of the essential use-cases to save efforts involved in manual annotation and enhance the user experience. This paper proposes an end-to-end pipeline (referred to as RE-Tagger) for the real-estate image classification problem. We present a two-stage transfer learning approach using custom InceptionV3 architecture to classify images into different categories (i.e., bedroom, bathroom, kitchen, balcony, hall, and others).

All Publications

You can find all my publications at Google Scholar or ResearchGate.

Conferences / Workshops

CORE A*

MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs
International Conference on Learning Representations (ICLR) -- 2025
J Zhang, M Khayatkhoei, P Chhikara, and F Ilievski
Main Conference Paper
Visual Cropping Improves Zero-Shot Question Answering of Multimodal Large Language Models
NeurIPS Workshop on Robustness of Few-shot and Zero-shot Learning in Foundation Models -- 2023
J Zhang, M Khayatkhoei, P Chhikara, and F Ilievski
Workshop Paper Poster
Privacy Aware Question-Answering System for Online Mental Health Risk Assessment
ACL Proceedings of the 22nd Workshop on Biomedical Language Processing -- 2023
P Chhikara, U Pasupulety, J Marshall, D Chaurasia, and S Kumari
Workshop Paper Presentation Poster
Federated Learning-based Aerial Image Segmentation for collision-free Movement and Landing
Proceedings of the 4th ACM MobiCom Workshops -- 2021
P Chhikara, R Tekchandani, N Kumar, and S Tanwar
Workshop Paper

CORE A

FIRE: Food Image to REcipe generation
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) -- 2024
P Chhikara, D Chaurasia, Y Jiang, O Masur, and F Ilievski
Main Conference Paper Poster Presentation
RE-Tagger: A light-weight Real-Estate Image Classifier
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD) -- 2022
P Chhikara, A Goyal, and C Sharma
Main Conference Paper (Demo Track) Video

CORE B

Knowledge-enhanced Agents for Interactive Text Games
International Conference on Knowledge Capture (KCap) -- 2023
P Chhikara, J Zhang, F Ilievski, J Francis, and K Ma
🏆 Best Student Paper Award 🏆 Main Conference Paper Presentation
DIGITOUR: Automatic Digital Tours for Real-Estate Properties
ACM CODS-COMAD -- 2023
P Chhikara, H Kuhar, A Goyal, and C Sharma
Main Conference Paper
Federated Learning for Air Quality Index Prediction using UAV Swarm Networks
IEEE Global Communications Conference (GLOBECOM) -- 2021
P Chhikara, R Tekchandani, N Kumar, S Tanwar, and JJPC Rodrigues
Main Conference Paper
An efficient scheme for wireless charging of electric vehicles using RFID with an optimal path planning
IEEE Globecom Workshops (GC Wkshps) -- 2019
S Arora, S Goel, P Chhikara, H Singh, N Kumar, and PS Rana
Workshop Paper

Unranked

An ensemble approach for extractive text summarization
International Conference on Emerging Trends in Information Technology and Engineering -- 2020
P Singh, P Chhikara, and J Singh
Main Conference Paper


Journals

Quartile 1 (Q1)

A Differentially Privacy Assisted Federated Learning Scheme to Preserve Data Privacy for IoMT Applications
IEEE Transactions on Network and Service Management -- 2024
A Barnawi, P Chhikara, R Tekchandani, N Kumar, and B Alzahrani
Impact Factor: 4.7
Sea-Pix-GAN: Underwater image enhancement using adversarial neural network
Journal of Visual Communication and Image Representation -- 2024
D Chaurasia and P Chhikara
Impact Factor: 2.6
Federated learning and autonomous UAVs for hazardous zone detection and AQI prediction in IoT environment
IEEE Internet of Things Journal -- 2021
P Chhikara, R Tekchandani, N Kumar, M Guizani, and MM Hassan
Impact Factor: 8.2
Artificial intelligence-enabled Internet of Things-based system for COVID-19 screening using aerial thermal imaging
Future Generation Computer Systems -- 2021
A Barnawi, P Chhikara, R Tekchandani, N Kumar, and B Alzahrani
Impact Factor: 6.2
DCNN-GA: A Deep Neural Net Architecture for Navigation of UAV in Indoor Environment
IEEE Internet of Things Journal -- 2020
P Chhikara, R Tekchandani, N Kumar, V Chamola, and M Guizani
Impact Factor: 8.2
Federated Learning meets Human Emotions: a Decentralized Framework for Human-Computer Interaction for IoT Applications
IEEE Internet of Things Journal -- 2020
P Chhikara, P Singh, R Tekchandani, N Kumar, and M Guizani
Impact Factor: 8.2
An Efficient Container Management Scheme for Resource Constrained Intelligent IoT Devices
IEEE Internet of Things Journal -- 2020
P Chhikara, R Tekchandani, N Kumar, and MS Obaidat
Impact Factor: 8.2

Quartile 2 (Q2)

Adaptive federated learning scheme for recognition of malicious attacks in an IoT network
Computing -- 2023
P Chhikara, R Tekchandani, and N Kumar
Impact Factor: 3.3
A CNN-based scheme for COVID-19 detection with emergency services provisions using an optimal path planning
Multimedia Systems -- 2021
A Barnawi, P Chhikara, R Tekchandani, N Kumar, and M Boulares
Impact Factor: 3.5
Data dimensionality reduction techniques for Industry 4.0: Research results, challenges, and future research directions
Software: Practice and Experience -- 2020
P Chhikara, N Jain, R Tekchandani, and N Kumar
Impact Factor: 2.6

Quartile 3 (Q3)

A Deep Transfer Learning based model for Automatic Detection of COVID-19 from Chest X-Rays
Turkish Journal of Electrical Engineering and Computer Sciences -- 2021
P Chhikara, P Gupta, P Singh, and T Bhatia
Impact Factor: 1.2


Book Chapter

Deep convolutional neural network with transfer learning for detecting pneumonia on chest X-rays
Advances in Bioinformatics, Multimedia, and Electronics Circuits and Signals -- 2019
P Chhikara, P Singh, P Gupta, and T Bhatia
Published Version