Carnegie Mellon University
Browse

Meta-Management of Collections of Autonomic Systems

Download (10.39 MB)
thesis
posted on 2024-02-09, 19:28 authored by Thomas GlazierThomas Glazier

 To meet the demands of high availability and optimal performance in dynamic environments, modern systems deploy autonomic or self-adaptation mechanisms. However, increasingly today’s enterprise systems are compositions of many subsystems, each an adaptive system. Currently, each autonomic manager operates to maintain locally defined quality-of-service (QoS) objectives, but their independent actions often lead to globally sub-optimal results. Commonly, human administrators handle situations in which the collection of autonomic systems is behaving sub-optimally. However, generating a plan to change the configurations of the constituent autonomic managers is a complex and challenging task in the management of a single autonomous system, but the challenge is exacerbated where there may be tradeoffs in how to balance configuration options across the collection of autonomic subsystems. 

These challenges can be addressed by introducing an automated approach, referred to as meta-management, that provides a formal basis for reasoning about changes to the configurations of autonomic subsystems. The automated approach to meta-management is then established as part of a framework that can be used to instantiate a higher level autonomic manager, referred to as a meta-manager, that provides assurance about, and improves the performance of a collection of autonomic systems. This approach and framework includes a MAPE-K control loop specialized to the needs of meta-management, a domain specific language, SEAM, that enables the practical specification of adaptation policies, and a taxonomy of strategy synthesis techniques. The practicality, effectiveness, and applicability of the approach are then evaluated against three case studies. 

The first is an AWS Shopping Cart system in which a meta-manager is established to manage a collection of autonomic system represented by a front end user interface, a middleware services tier, and a database services tier. This case study was selected to evaluate the ability of the meta-manager to improve the homeostatic operations of the collection of autonomic systems on popular architectural pattern, code base, and operations platform that is in wide industrial use. 

The second is the Google Control plane in which a meta-manager was established to manage a collection of autonomic systems that suffered a significant outage. This case study was selected because it presented a well documented and specific failure scenario that occurred during the period of the research of this thesis that cause of which was, partially, a result of human-centric management of a collection of autonomic systems. 

Finally, the third is a simulation of an electrical grid cascade failure that represents the Northeast Blackout of 2003. This case study was selected because it presents an example of a failure of human-centric management of a collection of autonomic systems that was exhaustively documented that occurred in a context outside of information technology and/or cloud based providers. This provides credibility to the applicability claim of the thesis 

History

Date

2023-12-14

Degree Type

  • Dissertation

Department

  • Language Technologies Institute

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

David Garlan

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC