FAST Sequence Mining Based on Sparse Id-Lists

Abstract

Sequential pattern mining is an important data mining task with applications in basket analysis, world wide web, medicine and telecommunication. This task is challenging because sequence databases are usually large with many and long sequences and the number of possible sequential patterns to mine can be exponential. We proposed a new sequential pattern mining algorithm called FAST which employs a representation of the dataset with indexed sparse id-lists to fast counting the support of sequential patterns. We also use a lexicographic tree to improve the efficiency of candidates generation. FAST mines the complete set of patterns by greatly reducing the effort for support counting and candidate sequences generation. Experimental results on artificial and real data show that our method outperforms existing methods in literature up to an order of magnitude or two for large datasets.


Autore Pugliese

Tutti gli autori

  • MALERBA D.

Titolo volume/Rivista

Non Disponibile


Anno di pubblicazione

2011

ISSN

Non Disponibile

ISBN

978-3-642-21915-3


Numero di citazioni Wos

Nessuna citazione

Ultimo Aggiornamento Citazioni

Non Disponibile


Numero di citazioni Scopus

Non Disponibile

Ultimo Aggiornamento Citazioni

Non Disponibile


Settori ERC

Non Disponibile

Codici ASJC

Non Disponibile