Integrating Language Technologies and Social Theories
In recent years, natural language processing (NLP) has seen rapid advancements over standardized tasks and carefully curated data sets. However, these models and benchmarks often fail to generalize to the diverse types of data and questions prevalent in society. In this thesis, I aim to develop technology capable of addressing diverse social-oriented questions in text by integrating social theories from related disciplines into NLP models. This work spans five primary social phenomena: stereotypes and prejudice in narrative text, global opinion manipulation strategies, toxicity on social media, public policy, and AI ethics. For each domain, I develop NLP models and frameworks that are grounded in relevant theories from other disciplines, including social psychology, political science, causal inference, and fairness. These methods are designed to involve minimal text annotations, instead relying on unsupervised or distantly supervised approaches, and several of them are language-agnostic or supported by cross-lingual models in order to facilitate analyses of languages beyond English, including Russian, Spanish, and Hindi. Overall, this thesis aims to shift NLP research beyond standard tasks and data sets to real-word data and challenges, where research questions and methodology are guided by relevant theories and incorporate social context.
- Language Technologies Institute
- Doctor of Philosophy (PhD)