Overdriving Visual Depth Perception via Sound Modulation in VR

Published in IEEE Transactions on Visualization and Computer Graphics (TVCG, Proc. IEEE VR), 2026

Presented at IEEE Virtual Reality (VR), March 2026

Daniel Jiménez-Navarro

Max Planck Institute for Informatics

Colin Groth

New York University

Xi Peng

University of North Carolina

Jorge Pina

University of Zaragoza

Qi Sun

New York University

Praneeth Chakravarthula

University of North Carolina

Karol Myszkowski

Max Planck Institute for Informatics

Hans-Peter Seidel

Max Planck Institute for Informatics

Ana Serrano

University of Zaragoza

📄 Download Paper (PDF)

📄 Download Supplementary Materials (PDF)

Abstract

Our ability to perceive and navigate the spatial world is a cornerstone of human experience, relying on the integration of visual and auditory cues to form a coherent sense of depth and distance. In stereoscopic 3D vision, depth perception requires fixation of both eyes on a target object, which is achieved through vergence movements, with convergence for near objects and divergence for distant ones. In contrast, auditory cues provide complementary depth information through variations in loudness, interaural differences (IAD), and the frequency spectrum. We investigate the interaction between visual and auditory cues and examine how contradictory auditory information can overdrive visual depth perception in virtual reality (VR). When a new visual target appears, we introduce a spatial discrepancy between the visual and auditory cues: the visual target is shifted closer to the previously fixated object, while the corresponding sound localization is displaced in the opposite direction. By integrating these conflicting cues through multimodal processing, the resulting percept is biased toward the intended depth location. This audiovisual fusion counteracts depth compression, thus reducing the required vergence magnitude and enabling faster gaze retargeting. Such audio-driven depth enhancement may further help mitigate the vergence–accommodation conflict (VAC) in scenarios where physical depth must be compressed. In a series of psychophysical studies, we first assess the efficiency of depth overdriving for various VR-relevant combinations of initial fixations and shifted target locations, considering different scenarios of audio displacements and their loudness and frequency parameters. Next, we quantify the resulting speedup in gaze retargeting for target shifts that can be successfully overdriven by sound manipulations. Finally, we apply our method in a naturalistic VR scenario where user interface interactions with the scene show an extended perceptual depth.

BibTeX Citation

@article{navarro2026overdriving,
  title = {Overdriving Visual Depth Perception via Sound Modulation in VR},
  author = {Daniel Jiménez-Navarro, Colin Groth, Xi Peng, Jorge Pina, Qi Sun, Praneeth Chakravarthula, Karol Myszkowski, Hans-Peter Seidel, and Ana Serrano},
  journal = {{IEEE} Transactions on Visualization and Computer Graphics ({TVCG}, Proc. {IEEE} {VR})},
  year = {2026}
}