Allocation Before Ranking: Decoupled Token Compression for OmniLLMs

Jan 1, 2026·
Zhenghui Guo
,
Yilin Yang
,
Yuanbin Man
,
Miao Yin
,
Weidong Shi
,
Rabimba Karanjai
,
Omprakash Gnawali
,
Chengming Zhang
· 0 min read
Abstract
Preprint: We decouple token compression from routing in multi-modal systems, improving processing speeds by up to 30%.
Type
Publication
NeurIPS 2026 (Under Review)