Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Itani, Maher; Roast, Chris; Al-Khayatt, Samir (2017)
Publisher: IEEE
Languages: English
Types: Part of book or chapter of book

Classified by OpenAIRE into

Different Natural Language Processing (NLP) applications such as text categorization, machine translation, etc., need annotated corpora to check quality and performance. Similarly, sentiment analysis requires annotated corpora to test the performance of classifiers. Manual annotation performed by native speakers is used as a benchmark test to measure how accurate a classifier is. In this paper we summarise currently available Arabic corpora and describe work in progress to build, annotate, and use Arabic corpora consisting of Facebook (FB) posts. The distinctive nature of thesecorpora is that it is based on posts written in Dialectal Arabic (DA) not following specific grammatical or spelling standards. The corpora are annotated with five labels (positive, negative, dual, neutral, and spam). In addition to building the corpus, the paper illustrates how manual tagging can be used to extract opinionated words and phrases to be used in a lexicon-based classifier.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • [1] El-Halees, A., 2011. Arabic opinion mining using combined classification approach.
    • [2] Jin, X., Li, Y., Mah, T. and Tong, J., 2007, August. Sensitive webpage classification for content advertising. In Proceedings of the 1st international workshop on Data mining and audience intelligence for advertising (28-33). ACM.
    • [3] Mishne, G. and Glance, N.S., 2006, March. Predicting Movie Sales from Blogger Sentiment. In AAAI spring symposium: computational approaches to analyzing weblogs (155-158).
    • [4] Shikalgar, N.R. and Badgujar, D., 2013. Online Review Mining for forecasting sales. International Journal for research in Engineering & Technologies (IJRET) December.
    • [5] Tatemura, J., 2000, January. Virtual reviewers for collaborative exploration of movie reviews. In Proceedings of the 5th international conference on Intelligent user interfaces ( 272-275). ACM.
    • [6] Somasundaran, S., Wilson, T., Wiebe, J. and Stoyanov, V., 2007, March. QA with Attitude: Exploiting Opinion Type Analysis for Improving Question Answering in On-line Discussions and the News. In ICWSM.
    • [7] Stoyanov, V., Cardie, C. and Wiebe, J., 2005, October. Multiperspective question answering using the OpQA corpus. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing ( 923-930). Association for Computational Linguistics.
    • [8] Bollen, J., Mao, H. and Zeng, X., 2011. Twitter mood predicts the stock market. Journal of Computational Science, 2(1), 1-8.
    • [10] The Arabic Language. 2013. [Online] Available at www.al-bab.com [Accessed 17 July 2016]
    • [11] Itani, M.M., Zantout, R.N., Hamandi, L. and Elkabani, I., 2012, December. Classifying sentiment in arabic social networks: Naïve search versus Naïve bayes. In Advances in Computational Tools for Engineering Applications (ACTEA), 2012 2nd International Conference on (192-197). IEEE.
    • [12] Official Languages, Un.Org, United Nations, 2016. [Online] Available at: http://www.un.org/en/sections/about-un/official-languages/ [Accessed 17 July 2016]
    • [13] What is Spoken Arabic / the Arabic Dialects?, 2015, [Online] Available at: http://www.myeasyarabic.com/site/what_is_spoken_arabic.htm [Accessed 17 July 2016]
    • [14] Houngbo, H. and Mercer, R.E., 2014, June. An automated method to build a corpus of rhetorically-classified sentences in biomedical texts. In Proceedings of the First Workshop on Argumentation Mining (19-23).
    • [15] Lita, L.V., Schlaikjer, A.H., Hong, W. and Nyberg, E., 2005, July. Qualitative dimensions in question answering: Extending the definitional QA task. In PROCEEDINGS OF THE NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (Vol. 20, No. 4, 1616). Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999.
    • [16] Carlson, L., Marcu, D. and Okurowski, M.E., 2003. Building a discourse-tagged corpus in the framework of rhetorical structure theory. In Current and new directions in discourse and dialogue (85-112). Springer Netherlands.
    • [17] Samy, D., Sandoval, A.M., Guirao, J.M. and Alfonseca, E., 2006. Building a Parallel Multilingual Corpus (Arabic-Spanish-English). In Proceedings of the 5th Intl. Conf. on Language Resources and Evaluations, LREC.
    • [18] Dukes, K. and Habash, N., 2010, May. Morphological Annotation of Quranic Arabic. In LREC.
    • [19] Rytting, C.A., Rodrigues, P., Buckwalter, T., Novak, V., Bills, A., Silbert, N.H. and Madgavkar, M., 2014. ArCADE: An Arabic Corpus of Auditory Dictation Errors. ACL 2014, 109.
    • [20] Megyesi, B.B., Hein, A.S. and Johanson, E.C., 2006, May. Building a swedish-turkish parallel corpus. In Proceedings of the Fifth International Conference on Language Resources and Evaluation.
    • [21] El-Haj, M. and Koulali, R., 2013. KALIMAT a multipurpose Arabic Corpus. In Second Workshop on Arabic Corpus Linguistics (WACL-2) (22-25).
    • [22] Arabic Linguistic Blog. 2014. [Online]. http://archive.is/Ep1a [Accessed 27 Dec 2015]
    • [23] King Saud University Corpus of Classical Arabic. 2012. [Online] Available at: http://ksucorpus.ksu.edu.sa/?p=43. [Accessed 27 Dec 2015]
    • [24] Hamouda, A.E.D.A. and El-taher, F.E.Z., 2013. Sentiment analyser for arabic comments system. Int. J. Adv. Comput. Sci. Appl, 4(3).
    • [25] Hamouda, S.B. and Akaichi, J., 2013. Social networks' text mining for sentiment classification: The case of Facebook'statuses updates in the 'Arabic Spring'era. International Journal Application or Innovation in Engineering and Management, 2(5), 470-478.
    • [26] Roberts, K., 2009, August. Building an annotated textual inference corpus for motion and space. In Proceedings of the 2009 Workshop on Applied Textual Inference (48-51). Association for Computational Linguistics.
    • [27] Bahloul, R.B., Elkarwi, M., Haddar, K. and Blache, P., 2014, September. Building an Arabic Linguistic Resource from a Treebank: The Case of Property Grammar. In International Conference on Text, Speech, and Dialogue (240-246). Springer International Publishing.
    • [28] Maamouri, M., Bies, A., Buckwalter, T. and Mekki, W., 2004, September. The penn arabic treebank: Building a large-scale annotated arabic corpus. In NEMLAR conference on Arabic language resources and tools (Vol. 27, 466-467).
    • [29] Al-Sabbagh, R. and Girju, R., 2012, May. YADAC: Yet another Dialectal Arabic Corpus. In LREC (2882-2889).
    • [30] AbdelRaouf A, Higgins CA, Pridmore T, and Khalil M. 2010, Building a multi-modal Arabic corpus (MMAC). International Journal on Document Analysis and Recognition (IJDAR). 13(4):285-302.
    • [31] Oostdijk, N., 1999. Building a corpus of spoken Dutch. In CLIN.
    • [32] Quranic Arabic Corpus, 2011. [Online] Available http://corpus.quran.com/download/default.jsp. [Accessed 7 Dec 2015]
    • [33] Abdelali A, Cowie J, and Soliman H. 25th to 28th of July 2005, Building a modern standard Arabic corpus. In workshop on computational modeling of lexical acquisition. The split meeting. Croatia,
    • [34] Farra, N., Challita, E., Assi, R.A. and Hajj, H., 2010, December. Sentence-level and document-level sentiment mining for arabic texts. In 2010 IEEE International Conference on Data Mining Workshops (1114- 1119). IEEE.
    • [35] Pang B, Lee L, Vaithyanathan S. Thumbs up?: Pang, B., Lee, L. and Vaithyanathan, S., 2002, July. Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processingVolume 10 (79-86). Association for Computational Linguistics.
    • [36] Pang, B. and Lee, L., 2004, July. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd annual meeting on Association for Computational Linguistics (271). Association for Computational Linguistics.
    • [37] Pang, B. and Lee, L., 2005, June. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd annual meeting on association for computational linguistics (115-124). Association for Computational Linguistics.
    • [38] Scannell, K.P., 2007, September. The Crúbadán Project: Corpus building for under-resourced languages. In Building and Exploring Web Corpora: Proceedings of the 3rd Web as Corpus Workshop (Vol. 4, 5- 15).
    • [39] Sinclair, J., 2005, [Online] Available at: Developing Linguistic Corpora: a Guide to Good Practice, http://icar.univlyon2.fr/ecole_thematique/contaci/documents/Baude/wynne.pdf. [Accessed 17 July 2016]
    • [40] Burnard, L. (1998) 'Using SGML for Linguistic Analysis: the case of the BNC' [online] Available from http://users.ox.ac.uk/~lou/wip/Boston/howto.htm [Accessed 27 Dec 2015]
    • [41] Alansary S, Nagi M, Adly N., 2007, Building an International Corpus of Arabic (ICA): progress of compilation stage. In7th international conference on language engineering, Cairo, Egypt, 5-6.
    • [42] Alansary S, Nagi M, and Adly N., 2008. Towards analyzing the international corpus of Arabic (ICA): Progress of morphological stage. In8th International Conference on Language Engineering, Egypt 2008 Dec.
    • [43] Zemánek, P., 2001, July. CLARA (Corpus Linguae Arabicae): An Overview. In Proceedings of ACL/EACL Workshop on Arabic Language
    • [44] Al-Hayat Online Newspaper, 2011. [Online http://www.alhayat.com/. [Accessed 27 Dec 2015]
    • [45] Annahar Online Newspaper. 2015. [Online]. http://www.annahar.com/ [Accessed 27 Dec 2015]
    • [46] Refaee, E. and Rieser, V., 2014, May. An Arabic Twitter Corpus for Subjectivity and Sentiment Analysis. In LREC (2268-2273).
    • [47] Hajjem, M., Trabelsi, M. and Latiri, C., 2013. Building comparable corpora from social networks. In BUCC, 7th Workshop on Building and Using Comparable Corpora, LREC, Reykjavik, Iceland.
    • [48] Akra D. and Jarrar M., 2014, Towards Building a Corpus for Palestinian Dialect.
    • [49] Al-Sulaiti L and Atwell ES., 2006. The design of a corpus of contemporary Arabic. International Journal of Corpus Linguistics, 11(2), 135-171.
    • [50] Abdul-Mageed M, Diab MT, and Korayem M., 2011, Subjectivity and sentiment analysis of modern standard arabic. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers-Volume 2, 587-591.
    • [51] Mustafa, M. and Suleman, H., 2011. Building a Multilingual and Mixed Arabic-English Corpus. In Proceedings Arabic Language Technology International Conference (ALTIC).
    • [52] Saad, M.K. and Ashour, W., 2010, November. Osac: Open source arabic corpora. In 6th ArchEng Int. Symposiums, EEECS (Vol. 10).
    • [53] Abdul-Mageed, M., Kübler, S., and Diab, M., 2012, 'SAMAR: a system for subjectivity and sentiment analysis of Arabic social media', in Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis, 19-28.
    • [54] Statistic Brain Research Institute, 2016. [Online]. Available at http://www.statisticbrain.com/facebook-statistics/ [Accessed 17 July 2016]
    • [55] Arab Social Media Report, 2013, [Online]. Available http://www.arabsocialmediareport.com/Facebook/LineChart.aspx?&Pri MenuID=18&CatID=24&mnu=Cat [Accessed 27 Dec 2015]
    • [56] Twitter Developer Documentation Overview, 2016, [Online] Available at: https://dev.twitter.com/overview/api/counting-characters. [Accessed 27 Dec 2015]
    • [57] Al-Arabiya Facebook Page, 2011, [Online] Available http://www.facebook.com/AlArabiya. [Accessed 27 Dec 2015]
    • [58] MBCTheVoice Facebook Page, 2011. [Online] Available http://www.facebook.com/MBCTheVoice [Accessed 27 Dec 2015]
    • [59] Salameh, M., Mohammad, S.M., Kiritchenko, S., 2016, 'Arabic Sentiment Analysis and Cross-lingual Sentiment Resources ' [online] Available from http://saifmohammad.com/WebPages/ArabicSA.html [Accessed 1 Feb 2017]
  • No related research data.
  • No similar publications.

Share - Bookmark

Cite this article