Carnegie Mellon University
mrcorrei_PhD_LTI_2022.pdf (12.72 MB)

In-the-wild detection of speech affecting diseases

Download (12.72 MB)
posted on 2024-01-17, 21:57 authored by Maria Joana Ribeiro Folgado Correia

 Speech is a complex bio-signal that is intrinsically related to human physiology and cognition. It has the potential to provide a rich biomarker for health, allowing a non-invasive route to early diagnosis and monitoring of a range of conditions that affect speech. The scientific community has shown consistent interest in automating the diagnosis and monitoring of speech affecting diseases, but advances in this area have been limited by the small size of the available speech medical corpora, as these can be prohibitively difficult and expensive to collect. 

At the same time, the problem of diagnosing and monitoring speech affecting diseases specifically in in-the-wild contexts has been neglected, as the few existing speech medical corpora only contain recordings made in controlled conditions. These are typically conditions in which the channel is known, the background noise is minimized, or the content of the recordings is controlled by either speaking exercises or clinical interviews. They do not provide a good representation of real life scenarios. 

In this thesis we address the problem of detecting SA in in-the-wild contexts by, on one hand proposing novel strategies to collect and annotate speech medical corpora of arbitrary size, for arbitrary speech affecting (SA) diseases, from pre-existing massive online multimedia repositories. On the other hand, by proposing novel strategies to detect speech affecting diseases in both controlled and in-the-wild conditions, thus expanding the scenarios in which the detection of such diseases is possible. 

At the same time, we perform the first study of the limitations of both the existing speech medical corpora and current speech affecting disease detecting techniques when faced with in-the-wild data. 

In the scope of this thesis we also collect and annotate the in-the-wild speech medical (WSM) corpus, a first of its kind, ever growing corpus of in-the-wild multimodal recordings, featuring examples of several speech affecting diseases, including depression and Parkinson’s disease 




Degree Type

  • Dissertation


  • Language Technologies Institute

Degree Name

  • Doctor of Philosophy (PhD)


Bhiksha Raj Isabel Trancoso

Usage metrics



    Ref. manager