3D Object Classification with Selective Multi-View Fusion and Shape Rendering
Abstract
3D classification is complex and challenging because of high-dimensional data, the intricate nature of their spatial relationships, and viewpoint variations. We fill the gap in view-based 3D object classification by examining the factors that influence classification's effectiveness via determining their respective merits in feature extraction for 3D object recognition by comparing CNN-based and Transformer-based backbone networks side-by-side. Our research extends to evaluating various fusion strategies to determine the most effective method for integrating multiple views and ascertain the optimal number of views that balances classification and computation. We also probe into the effectiveness of different feature types from rendering techniques in accurately depicting 3D objects. This investigation is supported by an extensive experimental framework, incorporating a diverse set of 3D objects from the ModelNet40 dataset. Finally, based on the analysis, we present a Selective Multi-View deep model (SelectiveMV) that shows efficient performance and provides high accuracy given a few views.