Artificial Intelligence (AI) researchers often disagree about the best strategy to train a machine learning system, but there is one belief that is generally agreed upon: humans are still much better learners than machines. Unlike
AI systems, humans do not learn challenging new tasks (e.g., solving differential equations) from scratch, by looking at independent and identically distributed examples. Instead, humans often follow sequences of steps that
allow them to incrementally build up the necessary skills for performing these new tasks. Curriculum Learning (CL) is a line of work that tries to incorporate this human approach to learning into machine learning, with the hope that machines trained in this manner can learn faster and perform better.
However, biological brains are different than silicon brains, and are not trained by using gradient descent, which has become the norm in machine learning. So, can we expect human learning strategies to work for computers, too? Evidence from various studies in the past two decades suggests that CL can indeed benefit machine learning in some cases, while in others it may in fact hinder performance (Elman, 1993; Rohde and Plaut, 2003;
Bengio et al., 2009; Bojar et al., 2017b). In this thesis we aim to discover the problem settings in which different forms of CL are beneficial, and the types of benefits they provide. We posit the following statement: thesis statement: AI systems that learn like humans, starting with easy problems and gradually tackling more and more difficult ones, have the potential to reach better local optima and/or converge faster. Furthermore, the learning benefits gained using a curriculum depend on the choice of curriculum, the size and type of data, and the model architecture. In this work, we provide evidence for this statement, as well as investigate
what types of data and models can benefit from CL.We start by introducing a definition of CL and identifying three broad categories of CL methods.We further provide a literature review of the main CL approaches in the past three decades. Moreover, we propose new CL methods and apply them to a variety of models and problem settings, from teaching an LSTM to solve basic arithmetic problems, to neural machine translation using Transformers, image classification using convolutional neural networks, and compositional multitask learning problems. Through these experiments, we observed that CL can be very beneficial in certain settings (e.g., on sequential data such as sentences) if well-designed, but it can also harm the efficiency
of learning if performed poorly (e.g., if the curriculum spends too much time on easy problems). Finally, we conduct analyses to understand why CL leads to the observed effects.
- Doctor of Philosophy (PhD)