logo
ResearchBunny Logo
Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs

Computer Science

Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs

X. Zhang, C. Du, et al.

The recent chain-of-thought (CoT) method generates explicit reasoning paths but can be suboptimal, while tree-of-thought (ToT) finds better paths at a high inference cost. This work shows that fine-tuning LLMs using ToT search trees via Chain of Preference Optimization (CPO) lets CoT match or surpass ToT performance without heavy inference. Research conducted by Authors present in <Authors> tag.... show more
Citation Metrics
Citations
6
Influential Citations
7
Reference Count
77
Citation by Year

Note: The citation metrics presented here have been sourced from Semantic Scholar and OpenAlex.

Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny