Fusion of monocular cues to detect man-made structures in aerial imagery

10.1184/R1/6605906.v1 Jefferey A. Shufelt Jefferey A. Shufelt David M. McKeown David M. McKeown Fusion of monocular cues to detect man-made structures in aerial imagery Carnegie Mellon University 2004 Photographic interpretation Data processing. Computer vision. Image processing Digital techniques. Vision Monocular. Information and Computing Sciences not elsewhere classified 2004-05-01 00:00:00 Journal contribution https://kilthub.cmu.edu/articles/journal_contribution/Fusion_of_monocular_cues_to_detect_man-made_structures_in_aerial_imagery/6605906 Abstract: "The detection and delineation of man-made structures from aerial imagery is a complex computer vision problem. It requires locating regions in imagery that posses properties distinguishing them as man-made objects in the scene, as opposed to naturally occurring terrain features. The building extraction process requires techniques that exploit knowledge about the structure of man-made objects. Techniques do exist that take advantage of this knowledge; various methods use edge-line analysis, shadow analysis, and stereo imagery analysis to produce building hypotheses.It is reasonable, however, to assume that no single detection method will correctly delineate or verify buildings in every scene. As an example, a feature extraction system that relies on the analysis of cast shadows to predict building locations is likely to fail in cases where the sun is directly above the scene. In this paper, we introduce a cooperative-methods paradigm for information fusion that is shown to be highly effective in improving the system performance over that achieved by individual building extraction methods. Using this paradigm, each extraction technique provides information that can be added or assimilated into an overall interpretation of the scene.Thus, our research focus is to explore the development of a computer vision system that integrates the results of various scene analysis techniques into an accurate and robust interpretation of the underlying three-dimensional scene. We briefly survey four monocular building extraction, verification, and clustering systems that form the basis for the research described here. A method for fusing the symbolic data generated by these systems is described, and it is applied to both monocular image and stereo image data sets. A set of performance evaluation metrics are developed, described, and applied to the fusion result. Several detailed analyses are presented, as well as a summary of results on 23 monocular and 5 stereo scenes.These experiments show that a significant improvement in building detection is achieved using these techniques."