3D-Reconstruction in Challenging Sparse View Setup

As a part of ETH Summer Research Fellowship 2025 at ETH Zürich I was involved in this research project

Get Presentation Slides of Project here
Get Project Report here

Pipeline of VGGT+BA

Structure from Motion paradigm has been widely used for the task of 3d reconstruction however fails in challenging scenes especially with sparse set of views. Recent deep learning model VGGT can jointly predict camera parameters, depth maps, point maps, and feature tracks, but their predictions typically lack global alignment even though the local structure is correctlty predicted. In this project, we use VGGT predictions as priors for Bundle Adjustment (BA) optimization, referred to as VGGT+BA. Since VGGT provides highly accurate predictions, they serve as an excellent initialization for the BA process, leading to more reliable and efficient optimization.VGGT+BA performs better than VGGT standalone and we experiment with different methods to improve over given baseline, which can be done by either improving the inputs to BA such as tracks or BA paramters itself.