Coin3D Optimizes Training and Evaluation for High-Fidelity 3D Generation

Too Long; Didn't Read

Coin3D trains on the LVIS subset of Objaverse with 16-view rendering and proxy voxelization. Evaluations include CLIP scores, ImageReward, and GPTEvals3D. User studies compare its output against existing methods. The model trains with a dual 3D UNet and volume-SDS loss on an Nvidia A100-80G, enabling high-fidelity 3D reconstruction in just minutes, optimizing both quality and efficiency.

Table of Links

Abstract and 1 Introduction

3.2 Interactive Generation Workflow and 3.3 Volume Conditioned Reconstruction

4 Experiment and 4.1 Comparison on Proxy-based and Image-based 3D Generation

4.2 Comparison on Controllable 3D Object Generation, 4.3 Interactive Generation with Part Editing & 4.4 Ablation Studies

5 Conclusions, Acknowledgments, and References

SUPPLEMENTARY MATERIAL

A. Implementation Details

B. More Discussions

C. More Experiments

Supplementary Material

In this supplementary material, we describe more details of our method in Sec. A. Besides, we also conduct more experiments in Sec. C. More qualitative results can be found in our supplementary video, and the source code will be released upon the acceptance of this paper.

A IMPLEMENTATION DETAILS

A.1 Dataset Preparation

In our experiment, we use the LVIS subset of Objaverse [Deitke et al. 2023] to train the model, which contains 28,000+ objects after a heuristic cleanup process following Long et al. [Long et al. 2023]. For training view rendering, we set up 16 image views with -30° pitch and evenly facing towards the object from 360°.

A.2 Evaluation Data Preparation and User Study

A.3 Training and Network Details

Authors:

(1) Wenqi Dong, from Zhejiang University, and conducted this work during his internship at PICO, ByteDance;

(2) Bangbang Yang, from ByteDance contributed equally to this work together with Wenqi Dong;

(3) Lin Ma, ByteDance;

(4) Xiao Liu, ByteDance;

(5) Liyuan Cui, Zhejiang University;

(6) Hujun Bao, Zhejiang University;

(7) Yuewen Ma, ByteDance;

(8) Zhaopeng Cui, a Corresponding author from Zhejiang University.

This paper is available on arxiv under CC BY 4.0 DEED license.

Coin3D Optimizes Training and Evaluation for High-Fidelity 3D Generation

Too Long; Didn't Read