<p dir="ltr">This thesis covers three aspects to the problem of multi-operator learning (MOL). First, we establish theoretical guarantees for a novel architecture in forward MOL, rigorously deriving its convergence properties and demonstrating its practical efficacy. Next, we propose a transformer-based framework capable of generalizing across distinct families of operators, validated on partial differential equations (PDEs) spanning low- to high-dimensional regimes. Finally, we develop an adaptive training methodology that enables single-operator architectures to solve multi-operator tasks, bridging the gap between specialized models and broader applicability while maintaining computational efficiency. Our approaches systematically address the architecture, analysis, and applications of MOL.</p>