Texturing is a crucial step in the 3D asset production workflow, which enhances the visual appeal and diversity of 3D assets. Despite recent advancements in Text-to-Texture (T2T) generation, existing methods often yield subpar results, primarily due to local discontinuities, inconsistencies across multiple views, and their heavy dependence on UV unwrapping outcomes. To tackle these challenges, we propose a novel generation-refinement 3D texturing framework called MVPaint, which can generate high-resolution, seamless textures while emphasizing multi-view consistency. MVPaint mainly consists of three key modules. 1) Synchronized Multi-view Generation (SMG). Given a 3D mesh model, MVPaint first simultaneously generates multi-view images by employing an SMG model, which leads to coarse texturing results with unpainted parts due to missing observations. 2) Spatial-aware 3D Inpainting (S3I). To ensure complete 3D texturing, we introduce the S3I method, specifically designed to effectively texture previously unobserved areas. 3) UV Refinement (UVR). Furthermore, MVPaint employs a UVR module to improve the texture quality in the UV space, which first performs a UV-space Super-Resolution, followed by a Spatial-aware Seam-Smoothing algorithm for revising spatial texturing discontinuities caused by UV unwrapping. Moreover, we establish two T2T evaluation benchmarks: the Objaverse T2T benchmark and the GSO T2T benchmark, based on selected high-quality 3D meshes from the Objaverse dataset and the entire GSO dataset, respectively. Extensive experimental results demonstrate that MVPaint surpasses existing state-of-the-art methods. Notably, MVPaint could generate high-fidelity textures with minimal Janus issues and highly enhanced cross-view consistency.
We use GPT4 to generate random prompts and these prompts are applied to the same unicorn model in GSO benchmark.
We thank the following excellent open-source works:
MVDream: generates multi-view images with 3D attention, a controlled version can generate texture without Janus problem.
TEXTure: can generate textures with text and image prompt in iterative manner.SyncMVD: generates high-fidelity textures by synchronizing multiple "single" view diffusion on UV latent spaces.
Paint3D: can either generate or inpaint the texture map by a positon map controlled diffusion model.
We thank the following colleagues for the discussion and proofreading: Sijin Chen, Xiangzi Hu, Xuanyang Zhang, Liao Wang
@article{cheng2024mvpaint,
title={MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D},
author={Wei Cheng and Juncheng Mu and Xianfang Zeng and Xin Chen and Anqi Pang and Chi Zhang and Zhibin Wang and Bin Fu and Gang Yu and Ziwei Liu and Liang Pan},
journal={arXiv preprint arxiv:2411.02336},
year={2024}
}