Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Mirylenka, Katsiaryna; Cormode, Graham; Palpanas, Themis; Srivastava, Divesh (2015)
Publisher: Springer Berlin Heidelberg
Languages: English
Types: Article
Subjects: QA76
The notion of heavy hitters—items that make up a large fraction of the population—has been successfully used in a variety of applications across sensor and RFID monitoring, network data analysis, event mining, and more. Yet this notion often fails to capture the semantics we desire when we observe data in the form of correlated pairs. Here, we are interested in items that are conditionally frequent: when a particular item is frequent within the context of its parent item. In this work, we introduce and formalize the notion of conditional heavy hitters to identify such items, with applications in network monitoring and Markov chain modeling. We explore the relationship between conditional heavy hitters and other related notions in the literature, and show analytically and experimentally the usefulness of our approach. We introduce several algorithm variations that allow us to efficiently find conditional heavy hitters for input data with very different characteristics, and provide analytical results for their performance. Finally, we perform experimental evaluations with several synthetic and real datasets to demonstrate the efficacy of our methods and to study the behavior of the proposed algorithms for different types of data.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • 1. Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: ACM SIGMOD International Conference on Management of Data, pp. 207-216 (1993)
    • 2. Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. In: ACM Symposium on Theory of Computing, pp. 20-29 (1996)
    • 3. Arasu, A., Manku, G.S.: Approximate counts and quantiles over sliding windows. In: ACM Principles of Database Systems (2004)
    • 4. Baum, L.E., Petrie, T.: Statistical Inference for Probabilistic Functions of Finite State Markov Chains. The Annals of Mathematical Statistics 37(6), 1554-1563 (1966)
    • 5. Boyer, B., Moore, J.: A fast majority vote algorithm. Tech. Rep. ICSCA-CMP-32, Institute for Computer Science, University of Texas (1981)
    • 6. Broder, A., Mitzenmacher, M.: Network applications of bloom filters: A survey. Internet Mathematics 1(4), 485-509 (2005)
    • 7. Budak, C., Georgiou, T., Agrawal, D., El Abbadi, A.: Geoscope: Online detection of geo-correlated information trends in social networks. PVLDB 7(4), 229-240 (2013)
    • 8. Chang, J.H., Lee, W.S.: Finding recent frequent itemsets adaptively over online data streams. In: KDD, pp. 487-492 (2003)
    • 9. Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: Procedings of the International Colloquium on Automata, Languages and Programming (ICALP) (2002)
    • 10. Cormode, G., Hadjieleftheriou, M.: Finding frequent items in data streams. In: International Conference on Very Large Data Bases (2008)
    • 11. Cormode, G., Korn, F., Muthukrishnan, S., Srivastava, D.: Finding hierarchical heavy hitters in data streams. In: International Conference on Very Large Data Bases, pp. 464-475 (2003)
    • 12. Cormode, G., Korn, F., Tirthapura, S.: Time decaying aggregates in out-of-order streams. In: ACM Principles of Database Systems (2008)
    • 13. Cormode, G., Muthukrishnan, S.: An improved data stream summary: The count-min sketch and its applications. Journal of Algorithms 55(1), 58-75 (2005)
    • 14. Dallachiesa, M., Nushi, B., Mirylenka, K., Palpanas, T.: Uncertain time-series similarity: Return to the basics. PVLDB 5(11), 1662- 1673 (2012)
    • 15. Dallachiesa, M., Palpanas, T.: Identifying streaming frequent items in ad hoc time windows. Data Knowl. Eng. 87, 66-90 (2013)
    • 16. Demaine, E., Lo´pez-Ortiz, A., Munro, J.I.: Frequency estimation of internet packet streams with limited space. In: European Symposium on Algorithms (ESA) (2002)
    • 17. Duong, T., Goud, B., Schauer, K.: Closed-form density-based framework for automatic detection of cellular morphology changes. Proceedings of the National Academy of Sciences 109(22), 8382-8387 (2012)
    • 18. Durme, B.V., Lall, A.: Streaming pointwise mutual information. In: NIPS, pp. 1892-1900 (2009)
    • 19. Gehrke, J., Korn, F., Srivastava, D.: On computing correlated aggregates over continual data streams. In: ACM SIGMOD International Conference on Management of Data, pp. 13-24 (2001)
    • 20. Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.S.: Mining frequent patterns in data streams at multiple time granularities. In: Data Mining: Next Generation Challenges and Future Directions (2004)
    • 21. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD Conference, pp. 1-12 (2000)
    • 22. Lahiri, B., Tirthapura, S.: Finding correlated heavy-hitters over data streams. In: IPCCC, pp. 307-314 (2009)
    • 23. Lee, L., Ting, H.: A simpler and more efficient deterministic scheme for finding frequent items over sliding windows. In: ACM Principles of Database Systems (2006)
    • 24. Lee, L., Ting, H.: A simpler and more efficient deterministic scheme for finding frequent items over sliding windows. In: ACM Principles of Database Systems (2006)
    • 25. Letchner, J., Re, C., Balazinska, M., Philipose, M.: Approximation trade-offs in markovian stream processing: An empirical study. In: ICDE, pp. 936-939 (2010)
    • 26. Manerikar, N., Palpanas, T.: Frequent items in streaming data: An experimental evaluation of the state-of-the-art. Data Knowl. Eng. 68(4), 415-430 (2009)
    • 27. Manku, G., Motwani, R.: Approximate frequency counts over data streams. In: International Conference on Very Large Data Bases, pp. 346-357 (2002)
    • 28. Metwally, A., Agrawal, D., Abbadi, A.E.: Efficient computation of frequent and top-k elements in data streams. In: International Conference on Database Theory (2005)
    • 29. Mirylenka, K., Cormode, G., Palpanas, T., Srivastava, D.: Finding interesting correlations with conditional heavy hitters. In: International Conference on Data Engineering (ICDE) (2013)
    • 30. Misra, J., Gries, D.: Finding repeated elements. Science of Computer Programming 2, 143-152 (1982)
    • 31. Pearl, J.: Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann Publishers Inc. (1988)
    • 32. Rabinovich, M., Spatschek, O.: Web caching and replication. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA (2002)
    • 33. Raftery, A.E.: A model of high-order markov chains. Journal of the Royal Statistical Society (Series B Methodological) 47(3), 528-539 (1985)
    • 34. Rubner, Y., Tomasi, C., Guibas, L.: The earth mover's distance as a metric for image retrieval. International Journal of Computer Vision 40(2), 99-121 (2000)
    • 35. Tantono, F.I., Manerikar, N., Palpanas, T.: Efficiently discovering recent frequent items in data streams. In: SSDBM, pp. 222-239 (2008)
    • 36. Venkataraman, S., Song, D.X., Gibbons, P.B., Blum, A.: New streaming algorithms for fast detection of superspreaders. In: Network and Distributed System Security Symposium NDSS (2005)
    • 37. Wang, P., Wang, H., Wang, W.: Finding semantics in time series. In: ACM SIGMOD International Conference on Management of Data, pp. 385-396 (2011)
    • 38. Welch, B.L.: The Generalization of 'Student's' Problem when Several Different Population Variances are Involved. Biometrika 34(1/2), 28-35 (1947)
    • 39. Yu, P.S., Chi, Y.: Association rule mining on streams. In: Encyclopedia of Database Systems, pp. 136-139. Springer-Verlag (2009)
  • No related research data.
  • No similar publications.

Share - Bookmark

Cite this article