CM-DPO: Constraint-Margin Direct Preference Optimization for LLM Planning

Jan 1, 2026·
Rabimba Karanjai
,
Qun Gu
,
Hemanth Hegadehalii Madhavarao
,
Wenhuan Sun
,
Xiaojiao Yu
,
Suryabhan Singh Hada
,
Libin N. George
,
Uma Kona
,
Richard Williamson
,
Linsey Pang
,
Prakhar Mehrotra
· 0 min read
Abstract
Preprint: A preference optimization framework that forces planning paths to obey hard system limits.
Type
Publication
NeurIPS 2026 (Under Review)