2025 年 40 巻 5 号 p. A-P21_1-13
Multi-hop question answering refers to tasks that involve gathering clues from multiple information sourcesiteratively to arrive at a final answer. A recent research direction focuses on using large language models (LLMs)combined with chain-of-thought and retrieval-augmented generation to address multi-hop QA tasks. These methodshave significantly improved LLM performance. However, hallucination by LLMs negatively impacts the task accuracy.The Self-consistency approach reduces hallucination by having LLMs generate multiple reasoning chains andchoosing the majority answer. This method requires the majority reasoning chains lead to the correct answer, whichmay not be true for the complex tasks such as multi-hop QA. Since LLMs may make mistake at each reasoning step,the likelihood of a fully correct reasoning chain is lower than that of a single correct step. Therefore, we propose amethod that shifts the focus from selecting reasoning chains to selecting steps. This method involves three phases:(1) Have the LLM break down a question into step-by-step sub-questions, and use RAG to derive the final answer.Repeat this process to generate multiple reasoning chains; (2) Use a verifier to select most reliable steps from all reasoningchains; (3) Based on the selected steps, let another independent LLM deduce the final answer. We conductedexperiments on four multi-hop QA datasets, and the results show that our method outperforms strong baselines.