Allocation Before Ranking: Decoupled Token Compression for OmniLLMs
Jan 1, 2026·,,,,,,,·
0 min read
Zhenghui Guo
Yilin Yang
Yuanbin Man
Miao Yin
Weidong Shi
Rabimba Karanjai
Omprakash Gnawali
Chengming Zhang

Abstract
Preprint: We decouple token compression from routing in multi-modal systems, improving processing speeds by up to 30%.
Type
Publication
NeurIPS 2026 (Under Review)