Aligning Vision Language Models via anchor

Jan 1, 2026·
Yilin Yang
,
Yuke Wang
,
Rabimba Karanjai
,
Weidong Shi
,
Chengming Zhang
· 0 min read
Abstract
Preprint: Aligning visual tokens with semantic text anchors to enhance multi-modal reasoning capabilities.
Type
Publication
NeurIPS 2026 (Under Review)