Exploring Language Structured Prediction in Resource-limited Scenarios
In natural language processing (NLP), many tasks involve structured prediction: predicting structured outputs consisting of a group of interdependent variables. This allows extracting useful information from unstructured raw texts, which can benefit downstream tasks and analyses for both humans and machines. To obtain automatic models, the main paradigm is in a data-driven supervised-learning fashion. In this paradigm, the main bottleneck is the availability of manually annotated data, which is usually expensive and time-consuming to obtain. Moreover, we usually would like to extend the models to various new scenarios, like in different domains or languages. The model performance could drop dramatically if the training instances are insufficient to cover the target scenarios, while it is costly and inefficient to annotate large amounts of data instances in all these new cases.
To mitigate this problem and ease the reliance of structured prediction models on large amounts of annotations, we need to consider both aspects of the model and the data, which are the main driving forces of data-driven machine learning. Related to these core aspects, we examine three directions. Firstly, we investigate structured modeling in model design, which involves how the complex structured outputs are modeled and predicted. This is especially important for structured prediction tasks, which usually have large output spaces. Moreover, on the interaction of model and data, we examine transfer learning where related data is utilized to help low-resource target tasks. In this case, how to design models that are more agnostic to the discrepancies between the source and target data resources is also crucial for the success of the transfer. Finally, we explore active learning, with a specific focus on the data itself. When resources are limited, it is difficult to obtain a large amount of annotated instances, but annotating a small set can be feasible. With a strategy to select an informative set of instances, much fewer manual annotations may be required to achieve satisfactory performance.
This thesis consists of three parts, corresponding to these three directions. In the first part, we investigate the influence of structured output modeling in deep neural models. We find that structured modeling brings benefits on sentence-level complete matches and with more efficient models. We further extend the analyses to low-resource scenarios and investigate the interactions of structural constraints and training data sizes. In the second part, we investigate a series of related structured tasks and find that supervision from related data, such as those from the same task but in different languages (cross-lingual learning) and those from related tasks (multitask learning), can be beneficial, especially if utilizing models that care less about the source and target differences. Finally, in the third part, we perform a systematic investigation of active learning for structured prediction in NLP. Especially, we analyze the effectiveness of annotating and learning with partial structures, which can improve data efficiency for active learning. Moreover, we show that combining active learning with self-training with the unlabeled instances from the active learning data pool can bring further improvements.
- Language Technologies Institute
- Doctor of Philosophy (PhD)