Efficiently Discovering Recent Frequent Items In Data Streams

Bibliographic citation F. I. Tantono, N. Manerikar, T. Palpanas, "Efficiently Discovering Recent Frequent Items In Data Streams" in International Conference on Scientific and Statistical DataBase Management (SSDBM), Berlin: Springer, 2008, p. 222-239 -(LNCS; 5069). - ISBN: 9783540694762. Proceedings of: SSDBM 2008, Hong Kong, China, July 2008. - URL: http://disi.unitn.it/~themis/publications/ssdbm08.pdf . - DOI: 10.1007/978-3-540-69497-7_16

Detail

Abstract: The problem of frequent item discovery in streaming data has attracted a lot of attention lately. While the above problem has been studied extensively, and several techniques have been proposed for its solution, these approaches treat all the values of the data stream equally. Nevertheless, not all values are of equal importance. In several situations, we are interested more in the new values that have appeared in the stream, rather than in the older ones. In this paper, we address the problem of finding recentfrequent items in a data stream given a small bounded memory, and present novel algorithms to this direction. We propose a basic algorithm that extends the functionality of existing approaches by monitoring item frequencies in recent windows. Subsequently, we present an improved version of the algorithm with significantly improved performance (in terms of accuracy), at no extra memory cost. Finally, we perform an extensive experimental evaluation, and show that the proposed algorithms can efficiently identify the frequent items in ad hoc recent windows of a data stream.