Treffer: Distributed Mining of Maximal Frequent Itemsets on a Data Grid System.
Weitere Informationen
In this paper, we propose a new algorithm, named Grid-based Distributed Max-Miner (GridDMM), for mining maximal frequent itemsets from databases on a Data Grid. A frequent itemset is maximal if none of its supersets is frequent. GridDMM is specifically suitable for use in Grid environments due to low communication and synchronization overhead. GridDMM consists of a local mining phase and a global mining phase. During the local mining phase, each node mines the local database to discover the local maximal frequent itemsets, then they form a set of maximal candidate itemsets for the top-down search in the subsequent global mining phase. A new prefix-tree data structure is developed to facilitate the storage and counting of the global candidate itemsets of different sizes. We built a Data Grid system on a cluster of workstations using the open-source Globus Toolkit, and evaluated the GridDMM algorithm in terms of performance, scalability, and the overhead of communication and synchronization. GridDMM demonstrates better performance than other sequential and parallel algorithms, and its performance is scalable in terms of the database size and the number of nodes. [ABSTRACT FROM AUTHOR]
Copyright of Journal of Supercomputing is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)