Mining frequent patterns from Web logs is an important data mining task. Candidate-generation-and-test and pattern-growth are two representative frequent pattern mining approaches. We have conducted extensive experiments on real world Web log data to analyse the characteristics of Web logs and the behaviours of these two approaches on Web logs. To improve the performance of current algorithms on mining Web logs, we propose a new algorithm - Combined Frequent Pattern Mining (CFPM) to cater for Web log data specifically. We use heuristics to prune search space and reduce costs in mining so that better efficiency is achieved. Experimental results show that CFPM significantly improves the performance of the pattern-growth approach by 1.2-7.8 times on mining frequent patterns from Web logs.
Mining frequent patterns from Web logs is an important data mining task. Candidate-generation-and-test and pattern-growth are two representative frequent pattern mining approaches. We have conducted extensive experiments on real world Web log data to analyse the characteristics of Web logs and the behaviours of these two approaches on Web logs. To improve the performance of current algorithms on mining Web logs, we propose a new algorithm - Combined Frequent Pattern Mining (CFPM) to cater for Web log data specifically. We use heuristics to prune search space and reduce costs in mining so that better efficiency is achieved. Experimental results show that CFPM significantly improves the performance of the pattern-growth approach by 1.2-7.8 times on mining frequent patterns from Web logs.
History
Related Materials
1.
ISBN - Is published in 9783540213710 (urn:isbn:9783540213710)
Start page
533
End page
542
Total pages
10
Outlet
Advanced Web Technologies and Applications: Sixth Asia-Pacific Web Conference, APWeb 2004
Editors
J. Yu et al.
Name of conference
Asia-Pacific Web Conference on Advanced Web Technologies and Applications