Protecting User’s Information Based on Clustering Method in Data Mining | ||
Albahir journal | ||
Article 1, Volume 2, Issue 3, December 2015, Pages 23-34 | ||
Author | ||
Heba Adnan Raheem | ||
Abstract | ||
ABSTRACT Privacy preserving data mining is a latest research area in the field of data mining. It is defined as “protecting user’s information”. Protection of privacy has become important in data mining research because of the increasing ability to store personal data about users and the development of data mining algorithms to infer this information. The main goal in privacy preserving data mining is to develop a system for modifying the original data in some way, so that the private data and knowledge remain private even after the mining process. In this paper we propos a system that used PAM (partitioning around medoid) clustering algorithm in health datasets in order to generate set of clusters, then we suggest protecting the sensitive attributes in each cluster in order to increasing the privacy of users information. Protecting the sensitive attributes is done by using privacy techniques through modifying the data values (attributes) in the dataset. We suggest using randomization techniqueData copying (which is a new suggested technique in this paper) to prevent attacker from concluding users privacy information. After modification, the same clustering algorithm is applied to modified data set to verify whether the sensitive attributes are hidden or not. Experimental results on these proposed techniques prove that the PAM algorithm is efficient for clustering in all data sets and the selected clusters are protected efficiently by using Data Copying technique. This technique is applied to Wisconsin breast cancer and diabetes data set. Finally the results of the proposed system prove that the distortion of data can be reduced when the privacy ratio was increased. These are important issues in PPDM, therefore the proposed system is highly successful in achieving the protection of privacy. | ||
Keywords | ||
clustering; PAM; privacy; Data Copying | ||
Statistics Article View: 42 PDF Download: 9 |