Noisemes: Manual Annotation of Environmental Noise in Audio Streams

Audio information retrieval is a difficult problem due to the highly unstructured nature of the data. A general labeling system for identifying audio patterns could unite research efforts in the field. This paper introduces 42 distinct labels, the “noisemes”, developed for the manual annotation of noise segments as they occur in audio streams of consumer captured and semiprofessionally produced videos. The labels describe distinct noise units based on audio concepts, independent of visual concepts as much as possible. We trained a recognition system using 5.6 hours of manually labeled data, and present recognition results