Open MAGVIT2 - An Open-Source Project Toward Democratizing Auto-regressive Visual Generation
Abstract
We present Open-MAGVIT2, a family of auto-regressive image generation models ranging from 300M to 1.5B. The Open-MAGVIT2 project produces an open-source replication of Google's MAGVIT-v2 tokenizer, a tokenizer with a super-large codebook (i.e., codes), and achieves the state-of-the-art reconstruction performance (1.17 rFID) on ImageNet 256×256. Furthermore, we explore its application in plain auto-regressive models and validate scalability properties. To assist auto-regressive models in predicting with a super-large vocabulary, we factorize it into two sub-vocabulary of different sizes by asymmetric token factorization, and further introduce "next sub-token prediction" to enhance sub-token interaction for better generation quality. We release all models and codes to foster innovation and creativity in the field of auto-regressive visual generation.
Authors
Zhuoyan Luo, Fengyuan Shi, Yixiao Ge, Yujiu Yang, Limin Wang, Ying Shan
For more information go here
Subscribe to Kavour
Get the latest posts delivered right to your inbox