Computer Science
Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs
X. Zhang, C. Du, et al.
The recent chain-of-thought (CoT) method generates explicit reasoning paths but can be suboptimal, while tree-of-thought (ToT) finds better paths at a high inference cost. This work shows that fine-tuning LLMs using ToT search trees via Chain of Preference Optimization (CPO) lets CoT match or surpass ToT performance without heavy inference. Research conducted by Authors present in <Authors> tag.
Related Publications
Explore these studies to deepen your understanding of the subject.

