An Information Theoretic Approach for Privacy Preservation in Distance-based Machine Learning

2019-10-09T19:40:42Z (GMT) by Abelino Enrique Jimenez Gajardo
As cloud-based services become increasingly popular as platforms for storage and computation, privacy issues
relating to their use have become increasingly important. Much of the data stored on cloud platforms are
private, belonging to individuals or institutions who often desire to utilize the facilities provided by these
platforms, but, at the same time, do not desire to expose their data to the platform itself.
Encrypting the data prior to storage on the cloud helps to protect private information. However, this
causes problems if we need to perform computations on them, for instance, to train some machine learning
algorithm. This requires the server to observe the content, so decryption is necessary. This gives rise to
privacy concerns in different cloud computing settings. Several solutions based on cryptographic techniques
have been proposed to address the issue. However, they have high computational cost and high bandwidth
requirements, and in practice are difficult to scale.
In this work, we propose an alternative approach. In this work we introduce a privacy mechanism based
on limited leakage transformations which have two key properties:
1. Individual transformed vectors are uninformative about their preimage; and
2. The comparison of transformed data points can provide information about the similarity of their
preimages, but only if they are sufficiently close; the comparison provides no information about them
We use tools from information theory to state theoretical properties and describe how to use this kind of
scheme in practical scenarios.
We study the implications of using our proposed method in distance-based machine learning, which is
the family of algorithms that depend directly on distance computations, with the objective of developing
privacy mechanisms that enable the use of such methods without revealing private data. We discuss how to
perform both training and inference phases under a private setting. Our goal is to show that fast and private
computations on the cloud are feasible and useful for this class of techniques. We present our progress in
this research and future directions to be addressed.