Abstract:
The aim of data mining as a scientific research is developing methods to analyze large amounts of data in order to discover interesting regularities or exceptions. Typical problems, which should be resolved during developing effective data mining algorithms, arise from the large sizes of both: The data sets used in the data mining process and the patterns results sets (for example in rules) which form discovered knowledge. Scientific researchers are oriented to find the most advantageous (i.e. most effective) solutions both during the data preparation stage and exploration and finally post- processing to obtain results. During mining of association rules, the main effort has been put so far in developing more and more sophisticated mining algorithms finding interesting patterns in the appropriately prepared data. One problem that still needs to be tackled is the problem of excessive Database scans. Most of Association rules algorithms are extensions or derivatives of the Apriori algorithm, so mostly all of them use the technique of scanning the Database many times in order to obtain the association rules, this process (lot of Database Scans) is very time consuming. In this thesis we develop an optimization of the Apriori algorithm namely Vertical Apriori, using the C++ bitset data structure (an optimized version of bit vectors). Performance improvements will be demonstrated through our experiments section in chapter 6.
Description:
M.S. -- Faculty of Natural and Applied Sciences, Notre Dame University, Louaize, 2004; "A thesis submitted in partial fulfillment of the requirements for the degree of Master of sciences in computer information system, Department of Computer Science, Faculty of Natural and Applied Sciences"; Includes bibliographical references (leaves 86-88).