Single Image Intrinsic Decomposition with
Discriminative Feature Encoding

Zongji Wang

Feng Lu


Intrinsic image decomposition is an important and long-standing computer vision problem. Given a single input image, recovering the physical scene properties is ill-posed. In this work, we take the advantage of deep learning, which is proven to be highly efficient in solving the challenging computer vision problems including intrinsic image decomposition. Our focus lies in the feature encoding phase to extract discriminative features for different intrinsic layers from a single input image. To achieve this goal, we explore the distinctive characteristics between different intrinsic components in the high dimensional feature embedding space. We propose a feature divergence loss to force their high-dimensional embedding feature vectors to be separated efficiently. The feature distributions are also constrained to fit the real ones. In addition, we provide an approach to remove the data inconsistency in the MPI Sintel dataset, making it more proper for intrinsic image decomposition. Experimental results indicate that the proposed network structure is able to outperform the state-of-the-art methods.


Network Architecture 

In this paper, we propose a novel two-stream framework for efficient intrinsic image decomposition. The input image is passed through two streams of sub-network for albedo and shading image reconstructions respectively. We use the extractor in VGG-19 as the encoder structure, which is used to extract multi-scale feature maps. These feature maps are then aggregated by (upsampling, concatenation, convolution) sequences. Finally, three residual dilated blocks are used as decoder to reconstruct intrinsic images from the fused feature maps. The rounded boxes represent loss computations, in which ‘cycle’ means the cycle loss, ‘FDC’ means the feature distribution constraint and ‘FDV’ means the feature divergence loss.

MPI Sintel data refinement

The MPI Sintel [1] is a publicly-available densely-labelled dataset containing complex indoor and outdoor scenes. It is firstly designed for optical flow evaluation. For the research purpose of intrinsic image decomposition, the ground truth shading images have been rendered with a constant gray albedo considering illumination effects. However, due to the creation process, the original input frames can not be reconstructed from the ground truth albedo and shading layers through I = A*S.

The comparison of the refined MPI dataset (MPI_RD) and the original MPI dataset (MPI) is shown above. The refined MPI Sintel dataset (MPI_RD) is subject to the image formation model I = A*S, and the shading layers contain no color information (gray shading). In addition, the shading layers in the MPI_RD maintain the consistency with the original images. This can be shown in two aspects. On one hand, the specular component is removed from the shading layers. On the other hand, the shape details observed in the original images are preserved in the shading layers.

We briefly describe our data refinement algorithm. In summary, we shift the distribution of the albedo layer to a higher mean value, and then reconstruct the shading layer from the original image and the shifted albedo. After that, invalid pixels in the reconstructed shading layer are computed using Local Linear Embedding (LLE) with the input I as the guided image, which is adopted to construct the embedding. Finally, the input image is resynthesized from the processed albedo and shading images.

The data preprocessing code can be downloaded from Matlab. The refined MPI dataset can be downloaded from RD_MPI.

Intrinsic Decomposition Results

We compare our method with two other state-of-the-art methods on the refined MPI dataset (RD_MPI), including Direct Intrinsics (MSCR) [2], and Revisiting [3].





  title={Single Image Intrinsic Decomposition with Discriminative Feature Encoding},

  author={Zongji Wang, Feng Lu},

  booktitle={The IEEE International Conference on Computer Vision (ICCV) Workshops},





Special thanks to MPI Sintel Dataset


  title={A naturalistic open source movie for optical flow evaluation},

  author={Butler, D. J. and Wulff, J. and Stanley, G. B. and Black, M. J.},

  booktitle={European Conf. on Computer Vision (ECCV)},

  editor = {{A. Fitzgibbon et al. (Eds.)}},

  publisher = {Springer-Verlag},

  series = {Part IV, LNCS 7577},


  pages = {611--625},



MPI Dataset Homepage


[1] D. J. Butler, J. Wulff, G. B. Stanley, and M. J. Black. A naturalistic open source movie for optical flow evaluation. In A. Fitzgibbon et al. (Eds.), editor, European Conf. on Computer Vision (ECCV), Part IV, LNCS 7577, pages 611–625. Springer-Verlag, Oct. 2012. 1, 2, 5

[2] M. M. Takuya Narihira and S. X. Yu. Direct intrinsics: Learning albedo-shading decomposition by convolutional regression. In International Conference on Computer Vision (ICCV), 2015. 1, 2, 5, 6, 7

[3] Q. Fan, J. Yang, G. Hua, B. Chen, and D. Wipf. Revisiting deep intrinsic image decompositions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8944–8952, 2018. 1, 2, 6, 7, 8

[4] S. Bi, X. Han, and Y. Yu. An l1 image transform for edge-preserving smoothing and scene-level intrinsic decomposition. ACM Trans. Graph. (Proc. SIGGRAPH), 34(4), 2015. 2,8


If you have any further questions, please do not hesitate to contact us at {wzjgintoki, lufeng} AT