Carnegie Mellon University
Browse

Knowledge-Enhanced Social Content Analysis in Generative Modeling

thesis
posted on 2025-06-24, 16:20 authored by Haoyang WenHaoyang Wen

With the rapid advancement in language modeling, we have witnessed great success in building natural language processing or multimedia analysis models capable of performing more complex reasoning and inference. Large models also possess parameterized knowledge and can perform knowledge-centric tasks without using externally stored knowledge. However, this paradigm is accompanied by issues such as outdated, inaccurate knowledge, or hallucinations. Automated social content analysis, on the other hand, is an area often closely intertwined with relevant, and in many cases, up-to-date background knowledge. Since the goal of automatic social content analysis includes inferring or discovering opinions, interests, trends, and insights from text or multimedia content, access to background information or knowledge is usually essential for a comprehensive understanding of the content. Therefore, with the advent of recent generative modeling methods, it is crucial to investigate whether it is still necessary to explicitly use and model the knowledge for social content analysis tasks, as well as to identify effective methods to incorporate the background knowledge.


In this thesis, we demonstrate that generative modeling can be effectively used to perform various social content analysis tasks. We also show that external knowledge is a powerful resource and can benefit the generative social content analysis model during the training and inference stages. For the training stage, we discuss methods to leverage knowledge to enhance generative model training, including transforming the external knowledge base into distant supervision for Twitter profile inference, and using abstract knowledge as training constraints to enhance entity-to-entity stance detection. For the inference stage, we primarily discuss methods to incorporate knowledge into analysis during inference. We first explore methods to find appropriate knowledge for inference with generative modeling, including multimodal reranking and generative retrieval on a domain-specific corpus. Then we discuss specific cases involving knowledge-seeking and knowledge enhanced inference with generative modeling on social content analysis tasks, including zero-shot and few-shot stance detection, and extensions to multimodal content analysis.

History

Date

2025-05-09

Degree Type

  • Dissertation

Thesis Department

  • Language Technologies Institute

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Alexander Hauptmann

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC