Cloud services with powerful resources are popularly used to manage exponentially increasing data and to carry out data mining to analyze the data. However, a data mining involving query can cause privacy problems by disclosing both the data and the query. One task in data mining, classification, is used in a wide range of applications, and we focus on k -nearest neighbor (k NN) in this study to realize classification. Although several studies have already attempted to address the privacy problems associated with k NN computation in a cloud environment, the results of these studies are still inefficient. In this paper, we propose a very efficient and privacy-preserving k NN classification (PkNC) over encrypted data. While the amount of computation (encryptions/decryptions and exponentiations) and communication of the most efficient k NN classification proposed in prior studies is bounded by O(kln) , that of the proposed Pk NC is bounded by O(ln) , where l is the domain size of data and n is the number of data. When conducting experiments with the same dataset, the prior k NN classification took 12.02 to 55.5 minutes but Pk NC took 4.16 minutes. Furthermore, since Pk NC allows to be carried out in parallel for each data, its performance can be improved extremely if it is carried out on machine to allow more numerous threads. Pk NC protects the privacy of dataset, input query including the k NN result, and does not disclose any data access patterns. We propose several protocols to serve as building blocks to construct Pk NC and formally prove their security. In particular, we propose efficient protocols that privately find k largest or smallest elements in array.
- cloud computing
- k-nearest neighbor classification
- privacy-preserving data mining
ASJC Scopus subject areas
- Computer Science(all)
- Materials Science(all)