Modeling, Analysis, and Design of Influence in Multi-Agent Systems
Many natural and engineered systems are the result of strategic, interacting agents making decisions within physical or digital environments. To understand how the group behavior evolves, it must be understood how these heterogeneous entities, with different goals, levels of impact, and methods of learning, choose to act as individuals. While an important goal is to characterize the long-term behavior of such agents, an elevated objective is to analyze the capabilities of external influences on such outcomes. If this influencer aspires to achieve control objectives related to the group’s behavior, they must understand the decision processes of the agents and deduce how applying controls will produce change, thus enabling controls to be selected in a principled manner.
First, a centrally controlled multi-agent system model is presented based on a Markov decision process (MDP), building on commonly accepted assumptions in reinforcement learning. While basic control policies may be derived from the presented model, the central planner (CP) seeks formulations that yield high performance subject to pragmatic considerations of scale. One source of structure comes from independence assumptions on agents’ decision processes, which enable use of transition-independent Markov decision processes. This model is used to develop clustered control policies that balance the trade-off between optimizing for control performance and computation. A separable structure within the value function is established; approximate solution methods are proposed that converge to fixed points with bounded error, and to optimal solutions for certain classes of problems.
Assigning agents to clusters to maximize the value function is a separate, and also difficult problem. To make this combinatorial problem tractable, it is shown that the class of transition-independent MDPs with submodular reward functions yield submodular value functions, thus enabling ideas from submodular optimization to propose approximate solutions to the cluster assignment problem. The resulting method presents a greedy clustering approach that guarantees monotonic submodular improvement at each iteration with implied stopping conditions. The alternate clustering objective of maximizing the size of a MDP’s reachable state space is also shown to be submodular, giving a second potential construction.
A second consideration in policy design is robustness, as real-world systems violate stationarity assumptions. Tying into the themes of structural considerations, a policy robust to structural change is formulated as one that maximizes the value of the expected system. This strategy is used to design policies that are robust to agent dropout, and to propose a policy importance sampling method for estimating robust policies before dropout occurs.
In addition to agent dropout, non-stationarity may be caused by agents changing their behavior. In the worst case scenario, an agent may turn into an adversary and actively work to minimize the CP’s control objectives. With a zero-sum game framework, the CP can bound the maximum negative impact of potential adversarial agents, and compute optimal response strategies in such situations. In conclusion, this thesis culminates in analyses for the general study of dynamic programming and MDPs.
- Electrical and Computer Engineering
- Doctor of Philosophy (PhD)