Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Publisher: GITA
Languages: English
Types: Unknown
Subjects: media_dig_tech_and_creative_econ

Classified by OpenAIRE into

ACM Ref: ComputingMethodologies_PATTERNRECOGNITION
‘Scam’ is a fraudulence message by criminal intent sent to internet user mailboxes. Many approaches have been proposed to filter out unsolicited messages known as ‘spam’ from legitimate messages known as ‘ham’. However up to this date no suitable approach has been proposed to detect Scams. Almost all spam filters which use Machine Learning approaches, classify scams as hams when scam messages are more similar to\ud the average ham than spam. But such fraudulence messages can be very harmful to users as many people in\ud the world lose their funds by relying on scam messages.\ud In this paper we use Data Mining techniques for scam detection. Bayesian Classifier, Naïve Bayes and\ud K-Nearest Neighbor which are mostly used in spam detection are experimented and the results are reported.\ud In addition, a new approach in scam detection is proposed. This approach uses K-Nearest Neighbour algorithm with modification to Document Similarity equation. Additionally, classification is not binary as ‘scam’ or ‘not scam’: a Fuzzy Decision is used instead of clear types of classes. Scam messages are successfully detected by applying this approach.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • [1] Tom Fawcett, “ 'In vivo' spam filtering: a challenge problem for data mining”, KDD Explorations vol.5 no.2, December 2003.
    • [2] Airoldi E, Malin B. “Data mining challenges for electronic safety: the case of fraudulent intent detection in e-mails”, In Proceedings of the Privacy and Security Aspects of Data Mining Workshop, in conjunction with the 4th IEEE Internation Conference on Data Mining. Brighton, England, November 2004, pp. 57-66.
    • [3] K. Tretyakov, “Machine learning techniques in spam filtering”, Institute of Computer Science, University of Tartu Data Mining Problem-oriented Seminar, MTAT, vol. 3, pp. 60-79, 2004.
    • [4] Bratko, A. and Filipic, B. “Spam filtering using character-level markov models: Experiments for the TREC 2005 spam track,” Text Retrieval Conference, 2005.
    • [5] Cournane, A., and Hunt, R. “An analysis of the tools used for the generation and prevention of spam”. Computers & Security, 23, 2 (2004), 154-166.
    • [6] Bratko A., Cormack G. V., Filipic B., Lynam T. R. and Zupan B., “Spam filtering using statistical data compression models”, Journal of Machine Learning Research 7 (Dec 2006), 2699-2720.
    • [7] I. Androutsopoulos, J. Koutsias, V. Konstantinos, V. Chandrinos, G. Paliouras, C. Spyropoulos, “An evaluation of naive bayesian antispam filtering” in: G. Potamias, V. Moustakis, M. van Someren (Eds.), Proceedings of the ECML 2000 Workshop on Machine Learning in the New Information Age (2000), pp. 9-17.
    • [8] D. Leonard. “E-mail threats increase sharply”. IDG News Service, December 12, 2002.
    • [9] I. Androutsopoulos, J. Koutsias, G. Paliouras, V. Karkaletsis, G. Sakkis, C. Spyropoulos, P. Stamatopoulos, “Learning to filter spam e-mail: A comparison of a naive bayesian and a memory-based approach”, in: 4th PKDD workshop on machine learning and textual information access, 2000.
    • [10] I. Androutsopoulos, G. Paliouras, E Michelakis, “Learning to filter unsolicited commercial email”. Tech rpt 2004/2, NCSR Demokritos, 2004.
    • [11] H.D. Drucker, D. Wu, V. Vapnik, “Support vector machines for spam categorization”, IEEE Transactions On Neural Networks 10 (5) (1999) 1048-1054.
    • [12] A. Kolcz, J. Alspector, “SVM-based filtering of e-mail spam with content-specific misclassification costs”, in: Proceedings of TextDM'2001, IEEE ICDM-2001 Workshop on Text Mining, San Jose CA, 2001.
    • [13] K.R. Gee, “Using latent semantic indexing to filter spam”, in: Proceedings of the 2003 ACM Symposium on Applied Computing (SAC), (ACM, 2003), pp. 460-464.
    • [14] Delany, S.J., Cunningham, P., Tsymbal, A. & Coyle, L.: “A case-based technique for tracking concept drift in spam filtering”, Knowledge-Based Systems, vol.18(4-5), pp.187-195, 2005
    • [15] P. Pantel, D. Lin, SpamCop: “A spam classification and organization program”, in: Learning for Text Categorization-Papers from the AAAI Workshop, Madison Wisconsin, 1998 pp. 95-98, (AAAI Technical Report WS-98-05).
    • [16] M. Sahami, S. Dumais, D. Heckerman, E. Horvitz, “A bayesian approach to filtering junk email”, in: AAAI-98 Workshop on Learning for Text Categorization. Madison, Wisconsin, 1998, pp. 55-62, (AAAI Technical Report WS-98-05).
    • [17] Carpinter, J. & Hunt, R. “Tightening the net: A review of current and next generation spam filtering tools”. Computers & Security 25(8): 566-578 (2006)
    • [18] F. Provost, T. Fawcett, and R. Kohavi. “The case against accuracy estimation for comparing induction algorithms. In J. Shavlik, editor, Proceedings of ICML- 98, pages 445-453, San Francisco, CA, 1998. Morgan Kaufmann.
    • [19] G. Sakkis, I. Androutsopoulos, G. Paliouras, V. Karkaletsis, C. Spyropoulos, P. Stamatopoulos, “A memory-based approach to anti-spam filtering for mailing lists”, Information Retrieval 6 (1) (2004) 49- 73.
    • [20] Wikipedia, The free encyclopedia.
  • No related research data.
  • No similar publications.

Share - Bookmark

Cite this article