CommCP:
Efficient Multi-Agent Coordination via LLM-Based Communication with Conformal Prediction

  • Xiaopan Zhang*
  • Zejin Wang*
  • Zhixu Li
  • Jianpeng Yao
  • Jiachen Li

  • *Equal contribution, ‡Corresponding author
    IEEE International Conference on Robotics and Automation (ICRA 2026)


An LLM-based decentralized Communication framework designed for MM-EQA, which employs Conformal Prediction to calibrate the generated messages, thereby minimizing receiver distractions and enhancing communication reliability.


Visualizations of two-robot collaboration for MM-EQA tasks across diverse scenarios in Habitat:


Robot1: Where is the red bear cushion?

  • A. On the armchair
  • B. On the sofa
  • C. On the bed
  • D. On the dining chairs

Robot2: Is there water in the bathtub?

  • A. Yes
  • B. No

Robot1: Is there hot water in the bathtub?

  • A. Yes
  • B. No

Robot2: Are my gloves on the desk in the cloakroom?

  • A. Yes
  • B. No

Visualizations of two-robot collaboration for MM-EQA tasks across diverse scenarios in read world:


Abstract

To complete assignments provided by humans in natural language, robots must interpret commands, generate and answer relevant questions for scene understanding, and manipulate target objects. Real-world deployments often require multiple heterogeneous robots with different manipulation capabilities to handle different assignments cooperatively. Beyond the need for specialized manipulation skills, effective information gathering is important in completing these assignments. To address this component of the problem, we formalize the information-gathering process in a fully cooperative setting as an underexplored multi-agent multi-task Embodied Question Answering (MM-EQA) problem, which is a novel extension of canonical Embodied Question Answering (EQA), where effective communication is crucial for coordinating efforts without redundancy. To address this problem, we propose CommCP, a novel LLM-based decentralized communication framework designed for MM-EQA. Our framework employs conformal prediction to calibrate the generated messages, thereby minimizing receiver distractions and enhancing communication reliability. To evaluate our framework, we introduce an MM-EQA benchmark featuring diverse, photo-realistic household scenarios with embodied questions. Experimental results demonstrate that CommCP significantly enhances the task success rate and exploration efficiency over baselines.


Key Ideas and Contributions


1) Multi-Agent Multi-Task EQA Formulation: We formulate the information-gathering process as a novel multi-agent multi-task Embodied Question Answering (MM-EQA) problem. This involves multiple heterogeneous robots collaborating to interpret commands, gather information, and answer questions in a shared environment without redundancy.
2) CommCP Framework: We propose CommCP, a decentralized communication framework that uses Large Language Models (LLMs) to generate natural language messages. A key innovation is the use of Conformal Prediction (CP) to calibrate these messages, ensuring robots only share information when confident, thereby reducing distractions and enhancing reliability.
3) Novel Benchmark and Performance: We introduce a new MM-EQA benchmark featuring diverse, photo-realistic household scenarios based on the Habitat-Matterport 3D (HM3D) dataset. Experimental results demonstrate that CommCP significantly improves task success rates and exploration efficiency compared to baselines, particularly in larger environments.



Evaluation of CommCP and baselines on MM-EQA tasks


Quantitative Analysis: We validated CommCP against several baselines, including multi-agent frontier-based exploration (MMFBE) and independent exploration (MMEuC). The results show that CommCP achieves a higher success rate (0.68 vs 0.65 for MMFBE) with significantly lower normalized time cost (0.4 vs 0.8), effectively doubling efficiency. Ablation studies confirm that both the communication module and conformal prediction calibration are essential for this performance boost.


Citation