An Information Theoretic Approach for Privacy Preservation in Distance-based Machine Learning

Jimenez Gajardo, Abelino Enrique

doi:10.1184/R1/9735614.v1

JimenezGajardo_cmu_0041E_10437.pdf (2.57 MB)

An Information Theoretic Approach for Privacy Preservation in Distance-based Machine Learning

thesis

posted on 2019-10-09, 19:40 authored by Abelino Enrique Jimenez GajardoAbelino Enrique Jimenez Gajardo

As cloud-based services become increasingly popular as platforms for storage and computation, privacy issues
relating to their use have become increasingly important. Much of the data stored on cloud platforms are
private, belonging to individuals or institutions who often desire to utilize the facilities provided by these
platforms, but, at the same time, do not desire to expose their data to the platform itself.
Encrypting the data prior to storage on the cloud helps to protect private information. However, this
causes problems if we need to perform computations on them, for instance, to train some machine learning
algorithm. This requires the server to observe the content, so decryption is necessary. This gives rise to
privacy concerns in different cloud computing settings. Several solutions based on cryptographic techniques
have been proposed to address the issue. However, they have high computational cost and high bandwidth
requirements, and in practice are difficult to scale.
In this work, we propose an alternative approach. In this work we introduce a privacy mechanism based
on limited leakage transformations which have two key properties:
1. Individual transformed vectors are uninformative about their preimage; and
2. The comparison of transformed data points can provide information about the similarity of their
preimages, but only if they are sufficiently close; the comparison provides no information about them
otherwise.
We use tools from information theory to state theoretical properties and describe how to use this kind of
scheme in practical scenarios.
We study the implications of using our proposed method in distance-based machine learning, which is
the family of algorithms that depend directly on distance computations, with the objective of developing
privacy mechanisms that enable the use of such methods without revealing private data. We discuss how to
perform both training and inference phases under a private setting. Our goal is to show that fast and private
computations on the cloud are feasible and useful for this class of techniques. We present our progress in
this research and future directions to be addressed.

History

Date

2019-08-18

Degree Type

Dissertation

Department

Electrical and Computer Engineering

Degree Name

Doctor of Philosophy (PhD)

Advisor(s)

Bhiksha Raj

Usage metrics

Keywords

Information theory machine learning privacy security

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

An Information Theoretic Approach for Privacy Preservation in Distance-based Machine Learning

History

Date

Degree Type

Department

Degree Name

Advisor(s)

Usage metrics

Categories

Keywords

Licence

Exports