Health care, public safety, home security and self-driving cars applications rely on the automatic identification and interpretation of sound events. For example, abnormal respiratory sounds indicate respiratory problems, a gunshot or a glass breaking imply a safe alert, and an ambulance siren wailing implies that vehicles should stop or pull over. Systems that can automatically recognize sound events in order to extract meaning that helps us react accordingly, are
systems capable of Sound Understanding. Sound Understanding is an emerging field of Machine Hearing, which aims to build systems that can do sound-related tasks that have nothing to do with hearing - such as sonography, seismic, and sonar - and systems that could hear the way humans do and distinguish between music, speech and sounds [1]. Hearing machines that understand sounds like humans do require computational programs that can learn from years of accumulated diverse acoustics. They must use
associated knowledge to guide subsequent learning and organize what they hear, learn names for recognizable events, scenes, objects, actions, materials, places, and retrieve sounds by reference to those names. These machines must also continuously improve their hearing competence to encompass all the diversity and scale of the acoustics in the world. Therefore, this thesis proposes the Never-Ending Learning of Sounds (NELS), a computational
program that aims to build hearing machines that understand sounds under a never-ending learning paradigm. NELS continuously hears the Web, in order to learn meaningful categories and relationships of sounds, and use this knowledge to index and organize the crawled audio. The content is made available for people to query and recover all kinds of information. To enhance
NELS quality of expression of acoustic phenomena, we introduced a new interdisciplinary solution that draws domain knowledge from Psychology to build Machine Learning models. NELS breaks ground in challenges of Sound Understanding, such as collecting datasets with different types of labels and annotation processes, designing and improving sound recognition models, defining knowledge about sounds, and retrieving sounds with different types of similarities.