PDF Web Documents Categorization Using Association Rules Mining

Hannoon Abbood, Fadhil

	PDF Web Documents Categorization Using Association Rules Mining
Iraqi Journal of Information Technology
Article 1, Volume 6, Issue 4, October 2014, Pages 125-139
Author
Fadhil Hannoon Abbood
Abstract
Documents categorization aims to mapping text documents into one or more predefined class based on its contents. This problem has recently attracted many scholars in the web mining and machine learning communities since the numbers of online documents that hold useful information for decision makers, are numerous. This paper investigates the method of classifying PDF Web documents using association rule mining. The number of PDF documents is collected and analyzed, to detect vital and essential features. Ranks values are suggested for these features. A Mutual Meaning Unify (MMU) technique is proposed for increasing the accuracy of documents representations. To reduce the document vector space, stop words are removed. To reduce the documents terms, a stemming algorithm is using. Because the large number of generated rules, a pruning process is proposed to keep on only the highly distinguishing rules. The resulting rules which construct the classifier are used for categorization process. As a result, the classifier is accurate and operates well, it has accuracy about (97%) and the error rate (3%).
Keywords
Categorization; Web Documents; Association Rules

Statistics Article View: 208 PDF Download: 54