Carnegie Mellon University
Browse

Sketching and Sampling Algorithms for High-Dimensional Data

Download (4.5 MB)
thesis
posted on 2021-10-22, 19:47 authored by Rajesh JayaramRajesh Jayaram
Sketching is a powerful technique which compresses high-dimensional datasets into lower-dimensional sketches, while preserving properties of interest. Sampling is a quintessential form of sketching, where one generates samples from the data to use as the basis for a sketch. Sketching and sampling methods have become ubiquitous
across computer science, and are a central component of modern algorithm design. This thesis studies the design and analysis of sketching and sampling algorithms for a variety of computational tasks. In particular, we make contributions to three closely related areas: Streaming and Distributed Algorithms, Numerical Linear Algebra, and Database Query Evaluation. Our contributions to these areas include:
Streaming and Distributed Algorithms:
• The first perfect Lp samplers, near-optimal algorithms for Lp estimation for p 2 (0, 1), improved sketches for the Earth Mover’s Distance, and the introduction of the adversarially robust and bounded deletion streaming models.
Numerical Linear Algebra:
• The first property testing algorithms for matrix positive semi-definiteness, and the first input-sparsity solvers for linear regression on design matrices which are (1) a Kronecker product of several smaller matrices or (2) a database join of several smaller tables.
Database Query Evaluation:
• A characterization of the classes of conjunctive queries for which approximate counting and uniformly sampling can be accomplished in polynomial time. Along the way, we obtain the first polynomial time algorithm for estimating the
number of strings of length n accepted by a non-deterministic finite automaton.

History

Date

2021-07-02

Degree Type

  • Dissertation

Department

  • Computer Science

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

David Woodruff

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC