Data Pre-processing for knowledge discovery | ||
Tikrit Journal of Pure Science | ||
Article 1, Volume 19, Issue 5, August 2014, Pages 143-148 | ||
Authors | ||
Mortadha M. Hamad; Banaz A. Qader | ||
Abstract | ||
Abstract Data pre-processing stage is also known as (data preparation) stage and it is a fundamental stage for data analysis and knowledge discovery. If there is much irrelevant and redundant information or noisy and unreliable data, then knowledge discovery during analysis and mining phase will be more difficult. Therefore we consider the pre-processing stage as an important step for knowledge discovery process and has a significant impact on predictive accuracy. Essentially, while each customer attribute may require special treatment for each algorithm, so the choices of data pre-processing (DPP) depend on the individual dataset or database used. In this paper we have chosen and explained two different pre-processing techniques which are (consistency, reduction) depending on our data warehouse of marketing which contains inconsistent attributes and also contains duplicated records. We have also proposed two new algorithms for reduction named (Removing Duplication Algorithm) and for consistency named (Resolving Inconsistency Algorithm) so that achieving the best performance for their data set. In this paper we applied and implemented our two new algorithms on our data warehouse using (C# programming language) and (Microsoft Access file), and gained cleaning data warehouse with consistent attributes and empty of duplicated records that is ready for preparing quality data as input to the algorithms of data mining process or any other analysis method which also influences of knowledge quality that is discovered during data mining process | ||
Keywords | ||
data pre; Processing; data mining; knowledge discovery; Data Cleaning | ||
Statistics Article View: 100 PDF Download: 4 |