Researchers create more precise 3D reconstructions using only two camera perspectives

In recent years, neural methods have become widespread in camera-based reconstructions. In most cases, however, hundreds of camera perspectives are needed. Meanwhile, conventional photometric methods exist which can compute highly precise reconstructions even from objects with textureless surfaces. However, these typically work only under controlled lab conditions.

Daniel Cremers, professor of Computer Vision and Artificial Intelligence at TUM and leader of the Munich Center for Machine Learning (MCML) and a director of the Munich Data Science Institute (MDSI) has developed a method together with his team that utilizes the two approaches.

It combines a neural network of the surface with a precise model of the illumination process that considers the light absorption and the distance between the object and the light source. The brightness in the images is used to determine the angle and distance of the surface relative to the light source.

“That enables us to model the objects with much greater precision than existing processes. We can use the natural surroundings and can reconstruct relatively textureless objects for our reconstructions,” says Cremers.

The paper is published on the arXiv preprint server and will be presented at the Conference on Computer Vision and Pattern Recognition (CVPR 2024) held in Seattle from June 17 to June 21, 2024.

Applications in autonomous driving and preservation of historical artifacts

The method can be used to preserve historical monuments or digitize museum exhibits. If these are destroyed or decay over time, photographic images can be used to reconstruct the originals and create authentic replicas.

The team of Prof. Cremers also develops neural camera-based reconstruction methods for autonomous driving, where a camera films the vehicle’s surroundings. The autonomous car can model its surroundings in real-time, develop a three-dimensional representation of the scene, and use it to make decisions.

The process is based on neural networks that predict 3D point clouds for individual video images that are then merged into a large-scale model of the roads traveled.

More information:
Mohammed Brahimi et al, Sparse Views, Near Light: A Practical Paradigm for Uncalibrated Point-light Photometric Stereo, arXiv (2024). DOI: 10.48550/arxiv.2404.00098