Graph-Based Structures for the Market Baskets Analysis

Cavique, Luís

Services on Demand

Journal

Article

Indicators

Cited by SciELO
Access statistics

Investigação Operacional

Print version ISSN 0874-5161

Inv. Op. vol.24 no.2 Lisboa Dec. 2004

Graph-Based Structures for the Market Baskets Analysis

Luís Cavique †

† ESCS-IPL / IST-UTL Portugal

lcavique@escs.ipl.pt

Abstract:

The market basket is defined as an itemset bought together by a customer on a single visit to a store. The market basket analysis is a powerful tool for the implementation of cross-selling strategies. Although some algorithms can find the market basket, they can be inefficient in computational time. The aim of this paper is to present a faster algorithm for the market basket analysis using data-condensed structures. In this innovative approach, the condensed data is obtained by transforming the market basket problem in a maximum-weighted clique problem. Firstly, the input data set is transformed into a graph-based structure and then the maximum-weighted clique problem is solved using a meta-heuristic approach in order to find the most frequent itemsets. The computational results show accurate solutions with reduced computational times.

Keywords: data mining, market basket, similarity measures, maximum clique problem.

Texto completo apenas disponível em PDF.

Full text only in PDF.

References

1. C.C. Aggarwal, J.L. Wolf and P.S. Yu, A New Method for Similarity Indexing of Market Basket Data, in Proceedings of the 1999 ACM SIGMOD Conference, Philadelphia PA, 1999, pp.407-418. [ Links ]

2. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen and A. Verkamo, Fast Discovery of Association Rules, in Advances in Knowledge and Data Mining, U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy Eds, MIT Press, 1996. [ Links ]

3. E. Balas, W. Niehaus, Optimized Crossover-Based Genetic Algorithms will be the Maximum Cardinality and Maximum Weight Clique Problems, Journal of Heuristics, Kluwer Academic Publishers, 4, 1998, pp. 107-122. [ Links ]

4. C. Berge, Graphs, 3^rd edition, North-Holland, 1991. [ Links ]

5. M. Berry and G. Linoff, Data Mining Techniques for Marketing, Sales and Customer Support, John Wiley and Sons, 1997. [ Links ]

6. I.M. Bomze, M. Budinich, P.M. Pardalos and M. Pelillo, Maximum Clique Problem, in Handbook of Combinatorial Optimization, D.-Z. Du and P.M. Pardalos Eds, 1999, pp.1-74. [ Links ]

7. S. Brin, R. Motwani, J.D. Ullman and S.Tsur, Dynamic Itemset Counting and Implication Rules for Market Basket Data, in Proceedings of the 1997 ACM SIGMOD Conference, Tucson, Arizona, 1997, pp.255-264. [ Links ]

8. M. Cardoso, I. Themido and F. Moura Pires, Evaluating a Clustering Solution: an Application in the Tourism Market, Intelligent Data Analysis, 3, 1999, pp. 491-510. [ Links ]

9. L. Cavique and I Themido, A New Algorithm for the Market Basket Analysis, Internal Report CESUR-IST, UTL, Portugal, 2001. [ Links ]

10. L. Cavique, C. Rego and I. Themido (a), A Scatter Search Algorithm for the Maximum Clique Problem, in Essays and Surveys in Meta-heuristics, C. Ribeiro and P. Hansen Eds, Kluwer Academic Publishers, 2002, pp. 227-244. [ Links ]

11. L. Cavique, C. Rego and I. Themido (b), Estruturas de Vizinhança e Procura Local para o Problema da Clique Máxima, Revista de Investigação Operacional, 2002, vol.22, pp. 1-18. [ Links ]

12. L. Cavique, Meta-heurísticas na Resolução do Problema da Clique Máxima e Aplicação na Determinação do Cabaz de Compras, dissertação de Doutoramento em Engenharia de Sistemas no Instituto Superior Técnico da Universidade Técnica de Lisboa, 2002. [ Links ]

13. N. Chen, A. Chen, L. Zou and L. Lu, A Graph-based Clustering Algorithm in Large Transaction Databases, Intelligent Data Analysis, 5, 2002, pp.327-338. [ Links ]

14. T.A. Feo, M.G.C. Resende and S.H. Smith, A Greedy Randomized Adaptive Search Procedure for Maximum Independent Set, Operations Research, 42, 1994, pp. 860-878. [ Links ]

15. M.R. Garey and D.S. Johnson, Computers and Intractability: a Guide to the Theory of NP-Completeness, W.H. Freeman and Company, New York, 1979. [ Links ]

16. A.M. Hughes, Strategic Database Marketing, McGraw-Hill, 2000. [ Links ]

17. A. Jagota, L. Sanchis, R. Ganesan, Approximately Solving Maximum Clique using Neural Network Related Heuristics, in Clique, Coloring and Satisfiability, Second Implementation Challenge DIMACS, D.S. Johnson and M.A. Trick Eds, AMS, 1996, pp. 169-203. [ Links ]

18. B. Jeudy and J.-F. Boulicaut, Optimization of Association Rule Mining Queries, Intelligent Data Analysis, 6, 2002, pp.341-357. [ Links ]

19. D.S. Johnson and M.A. Trick Eds, Clique, Coloring and Satisfiability, Second Implementation Challenge DIMACS, AMS, 1996. [ Links ]

20. W. Lin, S.A. Alvarez and C. Ruiz, Efficient Adaptive-Support Association Rule Mining for Recommender Systems, Data Mining and Knowledge Discovery, 6, Kluwer Academic Publishers, 2002, pp.83-105. [ Links ]

21. F. Liu, Z. Lu and S. Lu, Mining Association Rules Using Clustering, Intelligent Data Analysis, 5, 2001, pp.309-326. [ Links ]

22. G. Salton and M.J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill, 1983. [ Links ]

23. SAS Software, Enterprise Miner Documentation, SAS Institute, 2000. [ Links ]

24. P. Soriano and M. Gendreau, Tabu Search Algorithms for the Maximum Clique, in Clique, Coloring and Satisfiability, Second Implementation Challenge DIMACS, D.S. Johnson and M.A. Trick Eds, AMS, 1996, pp. 221-242. [ Links ]