GenPoly: Learning Generalized and Tessellated Shape Priors via
3D Polymorphic Evolving

Bangzhen Liu1, Yuyang Yu1, Xuemiao Xu1, Cheng Xu2,
Chenxi Zheng1, Haoxin Yang1, Shaoyu Huang3, Shengfeng He4
1South China University of Technology, 2The Hong Kong Polytechnic University, 3Guangzhou Yichuang Information Technology Co., Ltd. 4Singapore Management University
teaser figure

We introduce a novel 3D prior model for generalized fine-detailed 3D generation. Unlike current methods that (a) decompose objects into coarse-grained parts and focus on reassembling them into complete shapes, sacrificing local geometry details. (b) Our prior model explicitly excavates multi-level intricate local geometry variations and progressively refines shape details in a coarse-to-fine manner. This process yields versatile, tessellated priors that enable high-fidelity 3D generation across various downstream tasks.


Abstract

We introduce GenPoly, a novel generalized 3D prior model designed for multiple 3D generation tasks, focusing on preserving fine details. While previous works learn generalizable representations by decomposing objects into coarse-grained components to reassemble a coherent global structure, this approach sacrifices small-scale details. In this paper, we take a different perspective, formulating 3D prior modeling as a bottom-up polymorphic evolving process. Our key insight is that, beyond global structures, intricate local geometry variations hold rich contextual information that should be incorporated into the modeling process to learn fine-grained, generalizable representations. This allows coarse shapes to progressively evolve through multi-granular local geometry refinements, enabling high-fidelity 3D generation. To this end, we first introduce a polymorphic variational autoencoder (Poly-VAE), which constructs a versatile shape residual codebook via a polymorphic quantization mechanism. This codebook strategically encodes intricate local geometry representations from tesselated shapes within the latent space. Building on these representations, a 3D polymorphic evolving scheme is further developed to refine local details in a coarse-to-fine manner progressively. In this way, visually compelling 3D shapes with rich and complex details can be ultimately generated. The effectiveness of our method is demonstrated through extensive qualitative and quantitative evaluations, where GenPoly consistently surpasses state-of-the-art methods across various downstream tasks, particularly in local detail preservation.


Method

method figure

Overview of the proposed PolyVAE framework. Starting from an input shape \( \mathcal{X} \) , the shape feature \( Z \) is first extracted by a 3D encoder \( \mathit{E} \) and progressively quantized into polymorphic geometric representations for constructing a diverse polymorphic residual codebook \( \mathcal{C} \). This is achieved by using a \( n \)-branch polymorphic quantization mechanism, with the first branch quantizing the carse shape features \( Z_1 \), and the subsequent branches capturing polymorphic residuals \( \left\{Z_2, ... Z_n\right\} \). These local geometric features, with diverse geometric contexts, are maintained in a unified polymorphic residual codebook \( \mathcal{C} \) to facilitate the generation of 3D shapes using rich details. Finally, the quantized features \( \{\hat{Z}_1, ... \hat{Z}_n\} \) are aggregated and decoded to recover the input 3D shapes via a shared decoder \( \mathit{D} \).


Unconditional 3D Shape Generation

Slide for more Chair :
GIF 1 GIF 3 GIF 5 GIF 7 GIF 2 GIF 4 GIF 6 GIF 8 GIF 9 GIF 11 GIF 13 GIF 15 GIF 10 GIF 12 GIF 14 GIF 16
Slide for more Car :
GIF 25 GIF 27 GIF 29 GIF 31 GIF 18 GIF 20 GIF 22 GIF 24 GIF 17 GIF 19 GIF 21 GIF 23 GIF 26 GIF 28 GIF 30 GIF 32
Slide for more Airplane :
GIF 34 GIF 36 GIF 38 GIF 40 GIF 42 GIF 44 GIF 46 GIF 48 GIF 33 GIF 35 GIF 37 GIF 39 GIF 41 GIF 43 GIF 45 GIF 47
(a) TIGER
(b) SDF-Diff
(c) 3DQD
(d) Ours
Qualitative comparisons for unconditional generation. We select similar generated shapes of different methods for more comparable evaluation. Our generated shapes are of high quality, with well geometric details and smooth surfaces. Best viewed with zooming in digital version.
Slide for more Chair :
GIF 1 GIF 2 GIF 3 GIF 4 GIF 5 GIF 6 GIF 7 GIF 8 GIF 9 GIF 10 GIF 11 GIF 12 GIF 13 GIF 14 GIF 15 GIF 16 GIF 17 GIF 18 GIF 19 GIF 20 GIF 21 GIF 22 GIF 23 GIF 24 GIF 25 GIF 26 GIF 27 GIF 28 GIF 29 GIF 30
Slide for more Car :
GIF 31 GIF 32 GIF 33 GIF 34 GIF 35 GIF 36 GIF 37 GIF 38 GIF 39 GIF 40 GIF 41 GIF 42 GIF 43 GIF 44 GIF 45 GIF 46 GIF 47 GIF 48 GIF 49 GIF 50 GIF 51 GIF 52 GIF 53 GIF 54 GIF 55 GIF 56 GIF 57 GIF 58 GIF 59 GIF 60
Slide for more Airplane :
GIF 61 GIF 62 GIF 63 GIF 64 GIF 65 GIF 66 GIF 67 GIF 68 GIF 69 GIF 70 GIF 71 GIF 72 GIF 73 GIF 74 GIF 75 GIF 76 GIF 77 GIF 78 GIF 79 GIF 80 GIF 81 GIF 82 GIF 83 GIF 84 GIF 85 GIF 86 GIF 87 GIF 88 GIF 89 GIF 90
Here are our more qualitative results of unconditional generation for chairs, cars, and airplanes on ShapeNet. Our PolyVAE could sufficiently capture geometric informations of various scales, and hence facilitate detail-preserving 3D shape generation with diverse style and promising details. Best viewed with zooming in digital version.

Text-conditioned Generation

A chair with circular seat.
GIF 1 GIF 2 GIF 3 GIF 4 GIF 5 GIF 6 GIF 7
A chair that has a really long backrest.
GIF 8 GIF 9 GIF 10 GIF 11 GIF 12 GIF 13 GIF 14
A chair with an opening at the bottom of the backrest.
GIF 15 GIF 16 GIF 17 GIF 18 GIF 19 GIF 20 GIF 21
A wooden chair featuring slats on all sides, including the armrests.
GIF 22 GIF 23 GIF 24 GIF 25 GIF 26 GIF 27 GIF 28
(a) 3DQD
(b) SDFusion
(c) Ours
Qualitative comparisons for text-conditioned generation. Our generated shapes are more detail-preserving and better aligned with the text descriptions. Best viewed with zooming in digital version.

Single-view Shape Reconstruction

img 1 GIF 1 img 2 GIF 2 img 3 GIF 3 img 4 GIF 4 img 5 GIF 5 img 6 GIF 6 img 7 GIF 7 img 8 GIF 8 img 9 GIF 9
Examples for single-view shape reconstruction. With low cost finetuning, our method could quickly and flexibly generalize to image-conditioned reconstruction tasks under real-world scenes.