# Sketching and Sampling Algorithms for High-Dimensional Data

Sketching is a powerful technique which compresses high-dimensional datasets into lower-dimensional sketches, while preserving properties of interest. Sampling is a quintessential form of sketching, where one generates samples from the data to use as the basis for a sketch. Sketching and sampling methods have become ubiquitous

across computer science, and are a central component of modern algorithm design. This thesis studies the design and analysis of sketching and sampling algorithms for a variety of computational tasks. In particular, we make contributions to three closely related areas: Streaming and Distributed Algorithms, Numerical Linear Algebra, and Database Query Evaluation. Our contributions to these areas include:

Streaming and Distributed Algorithms:

• The first perfect Lp samplers, near-optimal algorithms for Lp estimation for p 2 (0, 1), improved sketches for the Earth Mover’s Distance, and the introduction of the adversarially robust and bounded deletion streaming models.

Numerical Linear Algebra:

• The first property testing algorithms for matrix positive semi-definiteness, and the first input-sparsity solvers for linear regression on design matrices which are (1) a Kronecker product of several smaller matrices or (2) a database join of several smaller tables.

Database Query Evaluation:

• A characterization of the classes of conjunctive queries for which approximate counting and uniformly sampling can be accomplished in polynomial time. Along the way, we obtain the first polynomial time algorithm for estimating the

number of strings of length n accepted by a non-deterministic finite automaton.

## History

## Date

2021-07-02## Degree Type

- Dissertation

## Department

- Computer Science

## Degree Name

- Doctor of Philosophy (PhD)