VAIR: Visuo-Acoustic Implicit Representations for Low-Cost, Multi-Modal Transparent Surface Reconstruction in Indoor Scenes

Department of Robotics, University of Michigan

*Indicates Equal Contribution

We present VAIR, an implicit visuo-acoustic sensor fusion method for mapping indoor scenes with transparent surfaces.

Abstract

Mobile robots operating indoors must be prepared to navigate challenging scenes that contain transparent surfaces. This paper proposes a novel method for the fusion of acoustic and visual sensing modalities through implicit neural representations to enable dense reconstruction of transparent surfaces in indoor scenes. We propose a novel model that leverages generative latent optimization to learn an implicit representation of indoor scenes consisting of transparent surfaces. We demonstrate that we can query the implicit representation to enable volumetric rendering in image space or 3D geometry reconstruction (point clouds or mesh) with transparent surface prediction. We evaluate our method's effectiveness qualitatively and quantitatively on a new dataset collected using a custom, low-cost sensing platform featuring RGB-D cameras and ultrasonic sensors. Our method exhibits significant improvement over state-of-the-art for transparent surface reconstruction.

BibTeX

@misc{sethuraman2024vairvisuoacousticimplicitrepresentations,
      title={VAIR: Visuo-Acoustic Implicit Representations for Low-Cost, Multi-Modal Transparent Surface Reconstruction in Indoor Scenes}, 
      author={Advaith V. Sethuraman and Onur Bagoren and Harikrishnan Seetharaman and Dalton Richardson and Joseph Taylor and Katherine A. Skinner},
      year={2024},
      eprint={2411.04963},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.04963}, 
}