TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Chain-of-Action: Faithful and Multimodal Question Answerin...

Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models

Zhenyu Pan, Haozheng Luo, Manling Li, Han Liu

2024-03-26Question AnsweringHallucinationInformation RetrievalRetrieval
PaperPDFCode(official)

Abstract

We present a Chain-of-Action (CoA) framework for multimodal and retrieval-augmented Question-Answering (QA). Compared to the literature, CoA overcomes two major challenges of current QA applications: (i) unfaithful hallucination that is inconsistent with real-time or domain facts and (ii) weak reasoning performance over compositional information. Our key contribution is a novel reasoning-retrieval mechanism that decomposes a complex question into a reasoning chain via systematic prompting and pre-designed actions. Methodologically, we propose three types of domain-adaptable `Plug-and-Play' actions for retrieving real-time information from heterogeneous sources. We also propose a multi-reference faith score (MRFS) to verify and resolve conflicts in the answers. Empirically, we exploit both public benchmarks and a Web3 case study to demonstrate the capability of CoA over other methods.

Results

TaskDatasetMetricValueModel
Question AnsweringStrategyQAEM79.2CoA
Question AnsweringStrategyQAEM77SearchChain
Question AnsweringStrategyQAEM77SearchChain
Question AnsweringStrategyQAEM70.6CoA w/o actions
Question AnsweringStrategyQAEM65.8Least-to-Most
Question AnsweringStrategyQAEM65.8Least-to-Most
Question AnsweringWebQuestionsEM70.7CoA
Question AnsweringWebQuestionsEM64.7CoA w/o actions
Question AnsweringWebQuestionsEM59.4DSP
Question AnsweringWebQuestionsEM59.4DSP
Question AnsweringWebQuestionsEM44.7Few-shot
Question AnsweringWebQuestionsEM44.7Few-shot
Question AnsweringWebQuestionsEM43Zero-shot
Question AnsweringWebQuestionsEM43Zero-shot
Question AnsweringWebQuestionsEM42.5CoT
Question AnsweringWebQuestionsEM42.5CoT
Question AnsweringWebQuestionsEM38.3React
Question AnsweringWebQuestionsEM38.3React
Question AnsweringWebQuestionsEM31.1Self-Ask
Question AnsweringWebQuestionsEM31.1Self-Ask
Question AnsweringWebQuestionsEM26.3ToT
Question AnsweringWebQuestionsEM26.3ToT
Question AnsweringTruthfulQAEM67.3CoA
Question AnsweringTruthfulQAEM63.3CoA w/o actions
Question AnsweringFEVEREM68.9CoA
Question AnsweringFEVEREM64.2Self-Ask
Question AnsweringFEVEREM64.2Self-Ask
Question AnsweringFEVEREM62.2DSP
Question AnsweringFEVEREM62.2DSP
Question AnsweringFEVEREM54.2CoA w/o actions
Question AnsweringFEVEREM50Zero-shot
Question AnsweringFEVEREM50Zero-shot

Related Papers

From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17A Survey of Context Engineering for Large Language Models2025-07-17MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17