<p dir="ltr">The recent success of large neural network models trained on massive amounts of unlabeled data has, yet again, highlighted the power of scaling, in terms of both model capacity and data quantity. However, despite its successes, scaling up also adds new additional challenges: (i) the large size and complexity of these models make it difficult to understand their learning dynamics and global behavior, and how these change with the data distribution; (ii) their complexity also make it difficult to extract explanations of how they arrive at individual predictions, and to know how to leverage these explanations; (iii) this, coupled with the fact that the data used to train these models is often noisy, can lead to models that do not align with their intended use case. </p><p dir="ltr">This thesis tackles all of the challenges above, with a particular focus on models trained for machine translation (MT). In the first part, we study the scaling behaviour of MT mod els trained to translate between multiple high-resource language pairs (LPs), showing that the way performance for individual LPs scales with model capacity is only dependent on the properties of that “task” and is invariant to cross-lingual interactions. We also investigate howmuchcontextinformationMTmodelsleverageindocument-levelMT,proposingways to increase it. In the second part, we start by proposing a novel method for extracting expla nations of how neural models arrive at their predictions, relying on the assumption that good explanations should help other models learn, and leveraging bilevel optimization/meta learning to learn explainers that teach well. Then we showthat not onlyare reasoning models state-of-the-art systems for low-resource MT, but that their chain-of-thought rationales are good explanations to help teach smaller LLMs the same task. In the third and final part, we start by showing that, by including metrics of translation quality (trained on human feedback annotations) in the inference stage of a trained MTmodel, we can improve the qualityof the translation and align the system towards human-like translations. We then also show that we can almost completely remove the dependency on human annotations by leveraging large language models to provide (fine-grained) translation quality feedback.</p>