Computer ScienceNeurIPS 2024
Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs
X. Zhang, C. Du, et al.
The recent chain-of-thought (CoT) method generates explicit reasoning paths but can be suboptimal, while tree-of-thought (ToT) finds better paths at a high inference cost. This work shows that fine-tuning LLMs using ToT search trees via Chain of Preference Optimization (CPO) lets CoT match or surpass ToT performance without heavy inference. Research conducted by Authors present in <Authors> tag.
Related Publications
Explore these studies to deepen your understanding
Adjacent work that informs or extends this paper's methodology and findings.
Sociology
A cultural theory of expertise: styles of thought in attitudes, beliefs, and expectations regarding the COVID-19 pandemic
P. A. Pellegrini and N. V. Rando
Linguistics and Languages
Reasoning COVID-19: the use of spatial metaphor in times of a crisis
D. Kremer and T. Felgenhauer
Health and Fitness
Detrimental effects of branched-chain amino acids in glucose tolerance can be attributed to valine induced glucotoxicity in skeletal muscle
C. A. Bishop, T. Machate, et al.
Economics
Dynamic analysis and application of network structure control in risk conduction in the industrial chain
X. Xi, X. Gao, et al.

