Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals

8 Jan 2021  ·  Minsoo Song, Seokjae Lim, Wonjun Kim ·

With a great success of the generative model via deep neural networks, monocular depth estimation has been actively studied by exploiting various encoder-decoder architectures. However, the decoding process in most previous methods, which repeats simple up-sampling operations, probably fails to fully utilize underlying properties of well-encoded features for monocular depth estimation. To resolve this problem, we propose a simple but effective scheme by incorporating the Laplacian pyramid into the decoder architecture. Specifically, encoded features are fed into different streams for decoding depth residuals, which are defined by decomposition of the Laplacian pyramid, and corresponding outputs are progressively combined to reconstruct the final depth map from coarse to fine scales. This is fairly desirable to precisely estimate the depth boundary as well as the global layout. We also propose to apply weight standardization to pre-activation convolution blocks of the decoder architecture, which gives a great help to improve the flow of gradients and thus makes optimization easier. Experimental results on benchmark datasets constructed under various indoor and outdoor environments demonstrate that the proposed method is effective for monocular depth estimation compared to state-of-the-art models. The code and model are publicly available at: https://github.com/tjqansthd/LapDepth-release.

PDF
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Monocular Depth Estimation KITTI Eigen split LapDepth absolute relative error 0.059 # 32
RMSE 2.446 # 31
RMSE log 0.091 # 30
Delta < 1.25 0.962 # 31
Delta < 1.25^2 0.994 # 32
Delta < 1.25^3 0.999 # 11
Monocular Depth Estimation NYU-Depth V2 LapDepth RMSE 0.384 # 45
absolute relative error 0.105 # 44
Delta < 1.25 0.895 # 45
Delta < 1.25^2 0.983 # 41
Delta < 1.25^3 0.996 # 37
log 10 0.045 # 43

Methods