CMU-CS-21-138.pdf (68.68 MB)
Download file

Training Deep Networks with Material-Aware Supervision

Download (68.68 MB)
posted on 04.05.2022, 18:59 authored by Tiancheng Zhi

Deep learning is a strong tool for predicting scene properties from images. Typical supervised methods require large scale real data with ground truth, which is hard

to obtain. This situation demands techniques with little ground truth real data. Without annotations, an apparent question is: Where does the supervision signal come from for training deep networks? In this thesis, we demonstrate that the awareness of materials provides such easy-to-obtain signals. We also present a framework that can be used for different tasks to exploit material-aware supervisions. We consider four forms of supervision signals in the framework: ground truth  and photometric supervisions from appearance models, and adversarial and confidence supervisions from appearance locations. Specifically, given a task, an approximate appearance model can be built to describe the whole or part of the scene. With this model, we could render synthetic images for ground truth supervision or optimize the networks using photometric supervision. The scene may also contain

spatially-varying materials providing additional appearance location information. Such information can be used for separating special appearances using adversarial

supervision, or fixing failure cases using confidence supervision. We present four applications to demonstrate the effectiveness of the proposed framework. In the first application, we introduce an approach for fine-grained recognition of powders on complex backgrounds, to provide an example of synthetic ground truth supervision from translucent material awareness. We build a blending

model for synthesizing images of translucent powders on various backgrounds. As a second contribution, we demonstrate a method for recovering human texture and

geometry from an RGB-D video, as an example of photometric supervision from Lambertian material model. In the third task, we propose a floor appearance decomposition approach for realistic object insertion, as an example of adversarial supervision for diffuse-specular separation and direct sunlight detection. We obtain coarse

locations of specular and sunlight appearances based on layout geometry and the awareness of emissive and transparent materials. Lastly, we present a cross-spectral

stereo matching method for road scenes, to show that the confidence supervision from non-Lambertian appearance locations helps fix regions of failure. We believe that the method proposed in this thesis can be used in more real

applications, including interior design, medical imaging, and autonomous driving, especially when ground truth real data are not easy to obtain.




Degree Type



Computer Science

Degree Name

  • Doctor of Philosophy (PhD)


Srinivasa G. Narasimhan Martial Hebert

Usage metrics