Carnegie Mellon University
Browse
- No file added yet -

On Scalable Algorithms and Algorithms with Predictions

Download (1.45 MB)
thesis
posted on 2022-07-11, 20:47 authored by Thomas LavastidaThomas Lavastida

In recent years the massively increased availability of data has created significant interest in data-driven methods in engineering and business. This has created a need both for more scalable algorithms to process larger data sets as well as new methods for leveraging data in decision making and optimization. In this dissertation we consider both of these broader questions for a variety of relevant problems. The first part of this dissertation is concerned with the question of scalability via parallel and distributed algorithms, while the second part considers questions within the growing field of algorithms with predictions. In Chapter 1, we consider the weighted longest common subsequence problem and its all-substrings generalization. This problem arises in computational biology applications, where there is a need to scale to larger inputs. We give efficient parallel algorithms which yield nearly optimal solutions for both problems. Hierarchical clustering is a fundamental data analysis tool, and one of the most widely used clustering methods in practice. Typical approaches suffer from issues of scalability, either due to needing at least quadratic time or having an inherently sequential nature. To get around these barriers, we consider approximations. Our main results for Chapter 2 are efficient distributed algorithms which approximate the wide class of divisive k-clustering methods. In algorithms with predictions we augment our usual algorithms with additional information representing (potentially erroneous) predictions about the input or desired solution. These predictions can be useful in resolving future uncertainty in the online setting or improving the average case running time of an algorithm when a problem must be solved repeatedly on similar inputs. In general, we want to design and analyze algorithms whose performance is tied to prediction error, as well as to show how to appropriately construct these predictions from past data. Chapter 3 considers online load balancing with restricted assignments. We show the existence of predictions which can help guide the online algorithm in constructing a fractional assignment. These predictions are robust to small errors and they can be efficiently learned from past instances of the problem.

To complete the result, we give an online rounding algorithm. Finally, Chapter 4 considers the Hungarian algorithm for minimum cost bipartite matching. Usually this algorithm is given a na¨ıve initial dual solution.

We show, both theoretically and practically, that a learned initial dual solution can significantly improve the running time of this algorithm. This can be seen as using past instances of the problem to set a “warm-start” solution, which is more likely to be close to an optimal solution, and thus require fewer iterations to reach optimality.

History

Date

2022-04-08

Degree Type

  • Dissertation

Department

  • Tepper School of Business

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Benjamin Moseley

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC