Zhao_cmu_0041E_10871.pdf (8.71 MB)
Download file

Computational Models for the Forensic Analysis of Human Voice

Download (8.71 MB)
posted on 2023-03-01, 17:48 authored by Wenbo ZhaoWenbo Zhao

Voice-based forensic profiling of humans refers to deducing a speaker’s information or characteristics from their voice samples. Specifically, it refers to the set of methodologies, technologies, and tools that represent and model human voices, as well as infer the physical, physiological, psychological, medical, demographic, sociological, and other bio-parametric traits (bio-relevant parameters) of a person from their voice. 

Voice-based forensic profiling of humans is done based on collected objective evidence that relates measurements made from the voice signal to various bio-relevant parameters of humans. These relations are gauged using a broad spectrum of interdisciplinary technologies and investigative procedures that give us insights and information about these from different perspectives.

Numerous studies from multiple fields in acoustics, speech processing, signal processing, medicine, and psychology have revealed that the human voice carries an enormous number of bio-markers that are unique to the speaker and correlated to the speaker’s bio-relevant parameters. Such parameters include physical traits such as age, height, weight, facial skeletal contour, physiological traits such as heart rate, blood pressure, illness, psychological traits such as emotions, mental diseases, and deviation from normal mental states, to name a few. These traits are inherent in the physical articulatory instrument and phonation process and the cognitive and mental processes that influence voice production. As a result, the evidence derived from voice can be distinctive and accurately represent bio-relevant parameters. Profiling attempts to deduce these in a manner that is language/context-agnostic and robust to disguise or fabrication. 

In order to deduce bio-relevant parameters from voice, one must develop the appropriate set of voice processing and modeling methodologies. With recent advances in speech processing technologies, many methods and tools have emerged that can potentially be successfully used in this context. For instance, signal processing techniques are used to process raw speech, represent speech, and derive acoustic features from speech; machine learning and deep learning models are used to model speech and predict the speaker’s identity, age, and emotion; dynamical systems are used to model voice production and characterize changes and abnormalities in voice; to name a few. 

This thesis aims to develop computational models of voice characterization that are more powerful, more efficient, and more effective in extracting and representing useful information from the voice for forensic profiling. In this thesis, we investigate three categories of models: (1) target-specific models, (2) data-specific models, and (3) process-specific models. Target-specific models are tied to a specific task, e.g., predicting a speaker’s identity, age, or height from their voice. In this category, we develop supervised machine learning and deep learning models to represent and model human voices such that the target can be best predicted. Data-specific models are not bound to a specific task but aim to extract generic information from the voice that can be applied to multiple profiling tasks. In this category, we develop generative models to distill intrinsic data representations, called the “latent features,” from the voice signal. We also explore how the algebraic and geometric structure of the corresponding latent feature manifolds aid in target-specific tasks. Process-specific models attempt to represent and model the process of voice production through physical (bio-mechanical) means. In this category, we develop dynamical systems of differential equations that explain or emulate the biomechanics of voice production. This approach examines the associated dynamical systems’ phase space behaviors and bifurcation maps to characterize many physiological aspects of the human voice. We aim to develop theoretical formulations and practical algorithms for these three models and validate them with simulations or experiments. 




Degree Type



Electrical and Computer Engineering

Degree Name

  • Doctor of Philosophy (PhD)


Rita Singh