Weak Supervision and Numerical Commonsense for Modeling Climate-related Text Documents
Large pretrained language models have shown remarkable versatility across a range of NLP tasks and domains, yet there has been limited attention on applying these models to the climate domain. An ever-growing body of unstructured climate textual documents contains crucial quantitative measurements on carbon emissions, reduction commitments (e.g. reduce CO2 emission per kilometre from passenger cars by 37.5%) and other climate-related information like policy goals. However, current NLP systems struggle to comprehend the semantic meaning of numbers and their units, and generalize poorly to concepts like policy goals in the climate domain. To address these issues, in the first part of the thesis we propose new model architectures that serve as a useful inductive bias for predicting numbers as continuous values, extend these to predict units and quantities jointly, and introduce a new task of predicting the correlation of multiple quantities in texts. In the second half of the thesis, we introduce a new benchmark for climate policy goal classification tasks and demonstrate that current climate-adapted NLP models perform no better than their general counterparts. To address this shortcoming, we utilize existing semi-structured climate questionnaires to train QA models with better transfer learning capabilities on climate documents. Finally, we tackle alignment of unstructured climate documents head-on with models we fine-tuned through weak-supervision, along with modern full f ledged LLMs via prompting and in-context learning. Together, this thesis attempts to lay a foundation for future work that combines numerical commonsense models for the climate domain, paving the way for novel applications in climate documents such as extracting critical climate measurements, mining correlative relationships between quantities, and using retrieval-augmentation for numerical query answering.
History
Date
2024-04-24Degree Type
- Dissertation
Department
- Language Technologies Institute
Degree Name
- Doctor of Philosophy (PhD)