This paper introduces an adaptive multi-agent framework to enhance collaborative reasoning in large language models (LLMs). The authors address the challenge of effectively scaling collaboration and reasoning in multi-agent systems (MAS), which is an open question despite recent advances in test-time scaling (TTS) for single-agent performance.
The core methodology revolves around three key contributions:
1. **Dataset Construction:** The authors create a high-quality dataset, M500, comprising 500 multi-agent collaborative reasoning traces. This dataset is generated automatically using an open-source MAS framework (AgentVerse) and a strong reasoning model (DeepSeek-R1). To ensure quality, questions are selected based on difficulty, diversity, and interdisciplinarity. The generation process involves multiple agents with different roles collaborating to solve challenging problems. Data filtering steps are applied to ensure consensus among agents, adherence to specified formats (e.g., using tags like “ and ‘boxed{}‘), and correctness of the final answer. The filtering criteria are based on Consensus Reached, Format Compliance, and Correctness. The data generation is described in Algorithm 1 in the Appendix.