GTK: A Hybrid-search Algorithm of Top-rank-k Frequent Patterns based on Greedy Strategy

No Thumbnail Available

Authors

Yuhang Long
Wensheng Tang
Bo Yang
Xinyu Wang
Hua Ma
Hang Shi
Xueyu Cheng

Issue Date

Type

Journal Article, Academic Journal

Language

Keywords

Research Projects

Organizational Units

Journal Issue

Alternative Title

Abstract

Currently, the top-rank-<i>k</i> has been widely applied to mine frequent patterns with a rank not exceeding <i>k</i>. In the existing algorithms, although a level-wise-search could fully mine the target patterns, it usually leads to the delay of high rank patterns generation, resulting in the slow growth of the support threshold and the mining efficiency. Aiming at this problem, a greedy-strategy-based top-rank-<i>k</i> frequent patterns hybrid mining algorithm (GTK) is proposed in this paper. In this algorithm, top-rank-<i>k</i> patterns are stored in a static doubly linked list called RSL, and the patterns are divided into short patterns and long patterns. The short patterns generated by a rank-first-search always joins the two patterns of the highest rank in RSL that have not yet been joined. On the basis of the short patterns satisfying specific conditions, the long patterns are extracted through level-wise-search. To reduce redundancy, GTK improves the generation method of subsume index and designs the new pruning strategies of candidates. This algorithm also takes the use of reasonable pruning strategies to reduce the amount of computation to improve the computational speed. Real datasets and synthetic datasets are adopted in experiments to evaluate the proposed algorithm. The experimental results show the obvious advantages in both time efficiency and space efficiency of GTK.

Description

Citation

Publisher

License

Journal

Volume

Issue

PubMed ID

DOI

ISSN

EISSN

Collections