Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Canali , Davide; Cova , Marco; Vigna , Giovanni; Kruegel , Christopher (2011)
Publisher: ACM
Languages: English
Types: Conference object
Subjects: malicious web page analysis, [ SCCO.COMP ] Cognitive science/Computer science, Malicious web page analysis, drive-by download exploits, efficient web page filtering, drive-by download exploits, efficient web page filtering
International audience; Malicious web pages that host drive-by-download exploits have become a popular means for compromising hosts on the Internet and, subsequently, for creating large-scale botnets. In a drive-by-download exploit, an attacker embeds a malicious script (typically written in JavaScript) into a web page. When a victim visits this page, the script is executed and attempts to compromise the browser or one of its plugins. To detect drive-by-download exploits, researchers have developed a number of systems that analyze web pages for the presence of malicious code. Most of these systems use dynamic analysis. That is, they run the scripts associated with a web page either directly in a real browser (running in a virtualized environment) or in an emulated browser, and they monitor the scripts' executions for malicious activity. While the tools are quite precise, the analysis process is costly, often requiring in the order of tens of seconds for a single page. Therefore, performing this analysis on a large set of web pages containing hundreds of millions of samples can be prohibitive. One approach to reduce the resources required for performing large-scale analysis of malicious web pages is to develop a fast and reliable filter that can quickly discard pages that are benign, forwarding to the costly analysis tools only the pages that are likely to contain malicious code. In this paper, we describe the design and implementation of such a filter. Our filter, called Prophiler, uses static analysis techniques to quickly examine a web page for malicious content. This analysis takes into account features derived from the HTML contents of a page, from the associated JavaScript code, and from the corresponding URL. We automatically derive detection models that use these features using machine-learning techniques applied to labeled datasets. To demonstrate the effectiveness and efficiency of Prophiler, we crawled and collected millions of pages, which we analyzed for malicious behavior. Our results show that our filter is able to reduce the load on a more costly dynamic analysis tools by more than 85%, with a negligible amount of missed malicious pages.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • [1] Alexa.com. Alexa Top Global Sites. http://www.alexa.com/topsites/.
    • [2] Clam AntiVirus. http://www.clamav.net/, 2010.
    • [3] A. Clark and M. Guillemot. CyberNeko HTML Parser. http://nekohtml.sourceforge.net/.
    • [4] M. Cova, C. Kruegel, and G. Vigna. Detection and Analysis of Drive-by-Download Attacks and Malicious JavaScript Code. In Proceedings of the International World Wide Web Conference (WWW), 2010.
    • [5] B. Feinstein and D. Peck. Caffeine Monkey: Automated Collection, Detection and Analysis of Malicious JavaScript. In Proceedings of the Black Hat Security Conference, 2007.
    • [6] S. Garera, N. Provos, M. Chew, and A. D. Rubin. A Framework for Detection and Measurement of Phishing Attacks. In Proceedings of the Workshop on Rapid Malcode (WORM), 2007.
    • [7] D. Goodin. SQL injection taints BusinessWeek.com. http://www.theregister.co.uk/2008/09/16/ businessweek_hacked/, September 2008.
    • [8] D. Goodin. Potent malware link infects almost 300,000 webpages. http://www.theregister.co.uk/ 2009/12/10/mass_web_attack/, December 2010.
    • [9] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The WEKA Data Mining Software: An Update. SIGKDD Explorations, 11(1):10-18.
    • [10] Heritrix. http://crawler.archive.org/.
    • [11] M. Hines. Malware SEO: Gaming Google Trends and Big Bird. http://securitywatch.eweek.com/seo/ malware_seo_gaming_google_trends_and_ big_bird.html, November 2009.
    • [12] W. Hobson. Cyber-criminals use SEO on topical trends. http://www.vertical-leap.co.uk/news/ cybercriminals-use-seo-on-topical-trends/, February 2010.
    • [13] HoneyClient Project Team. HoneyClient. http://www.honeyclient.org/, 2010.
    • [14] A. Ikinci, T. Holz, and F. Freiling. Monkey-Spider: Detecting Malicious Websites with Low-Interaction Honeyclients. In Proceedings of Sicherheit, Schutz und Zuverlässigkeit, 2008.
    • [15] JSUnpack. http://jsunpack.jeek.org, 2010.
    • [16] P. Likarish, E. Jung, and I. Jo. Obfuscated Malicious Javascript Detection using Classification Techniques. In Proceedings of the Conference on Malicious and Unwanted Software (Malware), 2009.
    • [17] J. Ma, L. Saul, S. Savage, and G. Voelker. Beyond Blacklists: Learning to Detect Malicious Web Sites from Suspicious URLs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2009.
    • [18] A. Moshchuk, T. Bragin, S. Gribble, and H. Levy. A Crawler-based Study of Spyware in the Web. In Proceedings of the Symposium on Network and Distributed System Security (NDSS), 2006.
    • [19] Mozilla Foundation. Rhino: JavaScript for Java. http://www.mozilla.org/rhino/.
    • [20] J. Nazario. PhoneyC: A Virtual Client Honeypot. In Proceedings of the USENIX Workshop on Large-Scale Exploits and Emergent Threats (LEET), 2009.
    • [21] D. Oswald. HTMLParser. http://htmlparser.sourceforge.net/.
    • [22] N. Provos, P. Mavrommatis, M. A. Rajab, and F. Monrose. All Your iFrames Point to Us. In Proceedings of the USENIX Security Symposium, 2008.
    • [23] P. Ratanaworabhan, B. Livshits, B., and Zorn. Nozzle: a defense against heap-spraying code injection attacks. In Proceedings of the USENIX Security Symposium, 2009.
    • [24] K. Rieck, T. Krueger, and A. Dewald. CUJO: Efficient Detection and Prevention of Drive-by-Download Attacks. In Proceedings of the Annual Computer Security Applications Conference (ACSAC), 2010.
    • [25] C. Seifert and R. Steenson. Capture-HPC. https: //projects.honeynet.org/capture-hpc, 2008.
    • [26] C. Seifert, I. Welch, and P. Komisarczuk. Identification of Malicious Web Pages Through Analysis of Underlying DNS and Web Server Relationships. In Proceedings of the LCN Workshop on Network Security (WNS), 2008.
    • [27] C. Seifert, I. Welch, and P. Komisarczuk. Identification of Malicious Web Pages with Static Heuristics. In Proceedings of the Australasian Telecommunication Networks and Applications Conference (ATNAC), 2008.
    • [28] R. Sommer and V. Paxson. Outside the Closed World: On Using Machine Learning For Network Intrusion Detection. In Proceedings of the IEEE Symposium on Security and Privacy, 2010.
    • [29] B. Stone-Gross, M. Cova, L. Cavallaro, B. Gilbert, M. Szydlowski, R. Kemmerer, C. Kruegel, and G. Vigna. Your Botnet is My Botnet: Analysis of a Botnet Takeover. In Proceedings of the ACM Conference on Computer and Communications Security (CCS), 2009.
    • [30] Y.-M. Wang, D. Beck, X. Jiang, R. Roussev, C. Verbowski, S. Chen, and S. King. Automated Web Patrol with Strider HoneyMonkeys: Finding Web Sites that Exploit Browser Vulnerabilities. In Proceedings of the Symposium on Network and Distributed System Security (NDSS), 2006.
  • No related research data.
  • No similar publications.

Share - Bookmark

Funded by projects

  • NSF | CAREER: Toward eliminating ...
  • NSF | TC:Medium:Analyzing the Und...

Cite this article