Treffer: Mask autoencoder for enhanced image reconstruction with position coding offset and combined masking.

Title:
Mask autoencoder for enhanced image reconstruction with position coding offset and combined masking.
Authors:
Wang, Yuenan1 (AUTHOR) nanwang623@gmail.com, Wang, Hua2 (AUTHOR) hua.wang@ldu.edu.cn, Zhang, Fan3,4 (AUTHOR) zhangfan@sdtbu.edu.cn
Source:
Visual Computer. Aug2025, Vol. 41 Issue 10, p7477-7491. 15p.
Database:
Academic Search Index

Weitere Informationen

Existing masked image modeling (MIM) methods mainly fill and enhance images by modeling and filling masked areas. These techniques usually borrow principles from computer graphics, where similar methods are used to synthesize missing parts of images or meshes. However, current MIM methods still have problems such as incomplete capture of local information and failure to capture dynamic information. To address the above situation, we propose a multi-mask autoencoder (M-MAE). M-MAE borrows the smooth transition technology from computer graphics, combines patch masking and random masking, and enhances the stability of the model by optimizing the processing of masked areas during training. In addition, we introduce a scaling layer (LScale) to improve training dynamics, similar to the scaling and distortion transformations used in graphics to adapt to different spatial distributions. To further improve the accuracy of the model in capturing spatial relationships, we propose a position encoding offset method that can generate more spatially aware encodings, thereby effectively enhancing the spatial expression ability of the model. Comparative experiments show that M-MAE can achieve an Acc1 of 84.5 on the ImageNet-1K dataset, an Acc1 of 58.2 on the Places365 dataset, and an mIoU of 49.4 on the ADE20K dataset. From the experimental results, it can be seen that our innovation improves the performance of the model in multiple visual tasks and provides a tighter integration between visual and graphics methods. Code is released at (https://github.com/zoomba35/mmae). [ABSTRACT FROM AUTHOR]