Classical machine learning posits that data are independently and identically distributed, in a single format usually the same as test data. In modern applications
however, additional information in other formats might be available freely or at a lower cost. For example, in data crowdsourcing we can collect preferences over the
data points instead of directly asking the labels of a single data point at a lower cost. In natural language understanding problems, we might have limited amount of data in the target domain, but can use a large amount of general domain data for free. The main topic of this thesis is to study how to efficiently incorporate these diverse
forms of information into the learning and decision making process. We study two representative paradigms in this thesis ? Firstly, we study learning and decision making problems with direct labels and comparisons. In many applications such as clinical settings and material science,
comparisons are much cheaper to obtain than direct labels. We show that comparisons can greatly reduce the problem complexity; using comparisons as input, our algorithm requires an exponentially smaller amount of labels to work
than traditional label-only algorithms. Moreover, our total query complexity is similar to previous algorithms. We consider various learning problems in this settings, including classification, regression, multi-armed bandits, nonconvex optimization, and reinforcement learning. ? Secondly, we study multi-task learning and transfer learning to learn from different domains and tasks of data. In this case, our algorithm use the previous collected data from similar tasks or domains, which are essentially free to use. We propose simple yet effective ways to transfer the knowledge from
other domains and tasks, and achieve state-of-the-art result on several natural language understanding benchmarks.
We illustrate both theoretical and practical insights in this thesis. Theoretically, we show performance guarantees of our algorithms as well as their statistical minimaxity
through information-theoretic limits. On the practical side, we demonstrate promising experimental results on price estimation and natural language understanding tasks.