<p dir="ltr">This thesis presents methodological and theoretical advances in causal inference and missing data across a range of complex data settings The first part focuses on semi-supervised causal inference where treatment and/ or outcome data may be missing for a subset of observations When both treatment and outcome are missing the problem aligns with generalization and transportation of causal effects where estimation is complicated by distributional shifts between the source and target populations When only outcomes are missing auxiliary surrogate outcomes-often easier to collect-can provide valuable information about the final outcome of interest We develop general frameworks that leverage both labeled and unlabeled data to estimate causal effects and we establish conditions under which consistent and efficient estimation is achievable </p><p dir="ltr">The second part addresses challenges arising in applications where a large number of confounders must be adjusted for We focus on a high-dimensional discrete covariate setting and analyze the statistical properties of commonly used causal effect estimators We further establish the fundamental limits of treatment effect estimation in this regime providing insights into when and how reliable estimation is possible </p><p dir="ltr">The third part investigates the identification and estimation of local treatment effects using a continuous instrumental variable a common approach for addressing unmeasured confounding We develop nonparametric estimators for the marginalized local instrumental variable ( LIV) curve and for treatment effects within the maximal complier class Our methods are supported by asymptotic guarantees under minimal smoothness assumptions enabling flexible and robust inference in complex settings </p><p dir="ltr">The final part examines the estimation of mean outcomes with missingness under clustered sampling schemes We establish the asymptotic normality of the doubly robust estimator in the presence of cluster dependence and propose valid variance estimation procedures to ensure valid inference The methodology is illustrated through an application to language model evaluation</p>