Carnegie Mellon University
Browse

Weak Supervision and Numerical Commonsense for Modeling Climate-related Text Documents

Download (3.51 MB)
thesis
posted on 2025-04-14, 20:20 authored by Daniel SpokoynyDaniel Spokoyny

Large pretrained language models have shown remarkable versatility across a range of NLP tasks and domains, yet there has been limited attention on applying these models to the climate domain. An ever-growing body of unstructured climate textual documents contains crucial quantitative measurements on carbon emissions, reduction commitments (e.g. reduce CO2 emission per kilometre from passenger cars by 37.5%) and other climate-related information like policy goals. However, current NLP systems struggle to comprehend the semantic meaning of numbers and their units, and generalize poorly to concepts like policy goals in the climate domain. To address these issues, in the first part of the thesis we propose new model architectures that serve as a useful inductive bias for predicting numbers as continuous values, extend these to predict units and quantities jointly, and introduce a new task of predicting the correlation of multiple quantities in texts. In the second half of the thesis, we introduce a new benchmark for climate policy goal classification tasks and demonstrate that current climate-adapted NLP models perform no better than their general counterparts. To address this shortcoming, we utilize existing semi-structured climate questionnaires to train QA models with better transfer learning capabilities on climate documents. Finally, we tackle alignment of unstructured climate documents head-on with models we fine-tuned through weak-supervision, along with modern full f ledged LLMs via prompting and in-context learning. Together, this thesis attempts to lay a foundation for future work that combines numerical commonsense models for the climate domain, paving the way for novel applications in climate documents such as extracting critical climate measurements, mining correlative relationships between quantities, and using retrieval-augmentation for numerical query answering.

History

Date

2024-04-24

Degree Type

  • Dissertation

Department

  • Language Technologies Institute

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Taylor Berg-Kirkpatrick

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC