Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Bennasar, Mohamed; Hicks, Yulia Alexandrovna; Setchi, Rossitza M. (2015)
Publisher: Elsevier
Journal: Expert Systems with Applications
Languages: English
Types: Article
Subjects: Engineering(all), Computer Science Applications, T1, Artificial Intelligence
Feature selection is used in many application areas relevant to expert and intelligent systems, such as data mining and machine learning, image processing, anomaly detection, bioinformatics and natural language processing. Feature selection based on information theory is a popular approach due its computational efficiency, scalability in terms of the dataset dimensionality, and independence from the classifier. Common drawbacks of this approach are the lack of information about the interaction between the features and the classifier, and the selection of redundant and irrelevant features. The latter is due to the limitations of the employed goal functions leading to overestimation of the feature significance.\ud \ud To address this problem, this article introduces two new nonlinear feature selection methods, namely Joint Mutual Information Maximisation (JMIM) and Normalised Joint Mutual Information Maximisation (NJMIM); both these methods use mutual information and the ‘maximum of the minimum’ criterion, which alleviates the problem of overestimation of the feature significance as demonstrated both theoretically and experimentally. The proposed methods are compared using eleven publically available datasets with five competing methods. The results demonstrate that the JMIM method outperforms the other methods on most tested public datasets, reducing the relative average classification error by almost 6% in comparison to the next best performing method. The statistical significance of the results is confirmed by the ANOVA test. Moreover, this method produces the best trade-off between accuracy and stability
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • Bache, K., & Lichman, M. (2013). UCI machine learning repository. Irvine, CA: University of California, School of Information and Computer Science. (http://archive. ics.uci.edu/ml).
    • Bajwa, I., Naweed, M., Asif, M., & Hyder, S. (2009). Feature based image classification by using principal component analysis. ICGST International Journal on Graphics Vision and Image Processing, 9, 11-17.
    • Battiti, R. (1994). Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Networks, 5, 537-550.
    • Bolón-Canedo, V., Sánchez-Maroño, N., & Alonso-Betanzos, A. (2013). A review of feature selection methods on synthetic data. Knowledge and Information Systems, 34, 483-519.
    • Brown, G., Pocock, A., Zhao, M., & Lujan, M. (2012). Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. Journal of Machine Learning Research, 13, 27-66.
    • Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers and Electrical Engineering, 40, 16-28.
    • Cheng, H., Qin, Z., Feng, C., Wang, Y., & Li, F. (2011). Conditional mutual informationbased feature selection analysing for synergy and redundancy. Electronics and Telecommunications Research Institute, 33, 210-218.
    • Cover, T., & Thomas, J. (2006). Elements of information theory. New York: John Wiley & Sons.
    • Ding, C., & Peng, H. (2003). Minimum redundancy feature selection from microarray gene expression data. In Proceedings of the computational systems bioinformatics: IEEE Computer Society (pp. 523-528).
    • Dougherty, J., Kohavi, R., & Sahami, M. (1995). Supervised and unsupervised discretization of continuous features. In Proceedings of the twelfth international conference on machine learning (pp. 194-202).
    • Duda, R., Hart, P., & Stork, D. (2001). Pattern classification. New York: John Wiley and Sons.
    • El Akadi, A., El Ouardighi, A., & Aboutajdine, D. (2008). A powerful feature selection approach based on mutual information. International Journal of Computer Science and Network Security, 8, 116-121.
    • Estévez, P. A., Tesmer, M., Perez, A., & Zurada, J. M. (2009). Normalized mutual information feature selection. IEEE Transactions on Neural Networks, 20, 189-201.
    • Fleuret, F. (2004). Fast binary feature selection with conditional mutual information. Journal of Machine Learning Research, 5, 1531-1555.
    • Freeman, C., Kulic´ , D., & Basir, O. (2015). An evaluation of classifier-specific filter measure performance for feature selection. Pattern Recognition, 48, 1812-1826.
    • Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157-1182.
    • Guyon, I., Gunn, S., Nikravesh, M., & Zadeh, L. A. (2006). Feature extraction foundations and applications. New York/Berlin, Heidelberg: Springer Studies in fuzziness and soft computing.
    • Hoque, N., Bhattacharyya, D. K., & Kalita, J. K. (2014). MIFS-ND: a mutual informationbased feature selection method. Expert Systems with Applications, 41(14), 6371- 6385.
    • Jain, A. K., Duin, R. P. W., & Mao, J. (2000). Statistical Pattern Recognition: A Review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 4-37.
    • Jakulin, A. (2003). Attribute interactions in machine learning. (M.S.c thesis), Computer and Information Science, University of Ljubljana.
    • Jakulin, A. (2005). Machine learning based on attribute interactions (Ph.D. thesis), Computer and Information Science, University of Ljubljana.
    • Janecek, A., Gansterer, W., Demel, M., & Ecker, G. (2008). On the relationship between feature selection and classification accuracy. Journal of Machine Learning Research: Workshop and Conference Proceedings, 4, 90-105.
    • Karegowda, A. G., Jayaram, M. A., & Manjunath, A. S. (2010). Feature subset selection problem using wrapper approach in supervised learning. International Journal of Computer Applications, 1, 13-17.
    • Kira, K., & Rendell, L. (1992). A practical approach to feature selection. In Proceedings of the 10th International Workshop on Machine Learning (ML92) (pp. 249-256).
    • Kuhn, H. (1955). The Hungarian method for the assignment problem. Naval Research Logistic Quarterly, 2, 83-97.
    • Kuncheva, L. (2007). A stability index for feature selection. In Proceedings of the 25th IASTED International Multi-Conference on Artificial Intelligence and Applications (pp. 390-395).
    • Kwok, N., & Choi, C. (2002). Input feature selection for classification problems. IEEE Transactions on Neural Networks, 13, 143-159.
    • Lee, J., & Kim, D. (2015). Fast multi-label feature selection based on informationtheoretic feature ranking. Pattern Recognition, 48, 2761-2771.
    • Liang, J., Wang, F., Dang, C., & Qian, Y. (2014). A group incremental approach to feature selection applying rough set technique. IEEE Transactions on Knowledge and Data Engineering, 26(2), 294-308.
    • Lin, T., Li, H., & Tsai, K. (2004). Implementing the fisher's discriminant ratio in a kmeans clustering algorithm for feature selection and dataset trimming. Journal of Chemical Information and Computer Sciences, 44, 76-87.
    • Liu, H., & Motoda, H. (2008). Computational methods of feature selection. New York: Chapman & Hall/CRC Taylor & Francis Group.
    • Liu, H., & Yu, L. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17, 491-502.
    • Meyer, P. E., & Bontempi, G. (2006). On the use of variable complementarity for feature selection in cancer classification. In Proceedings of European workshop on applications of evolutionary computing: Evo Workshops (pp. 91-102).
    • Meyer, P. E., Schretter, C., & Bontempi, G. (2008). Information-theoretic feature selection in microarray data using variable complementarity. IEEE Journal of Selected Topics in Signal Processing, 2, 261-274.
    • Peng, H., Long, F., & Ding, C. (2005). Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 1226-1238.
    • Rodgers, J., & Nicewander, W. A. (1988). Thirteen ways to look at the correlation coefficient. The American Statistician, 42, 59-66.
    • Qian, Y., Wang, Q., Cheng, H., Liang, J., & Dang, C. (2015). Fuzzy-rough feature selection accelerator. Fuzzy Sets and Systems, 258, 61-78.
    • Saeys, Y., Inza, I., & Larranaga, P. (2007). A review of feature selection techniques in bioinformatics. Bioinformatics, 23, 2507-2517.
    • Tang, E. K., Suganthana, P. N., Yao, X., & Qina, A. K. (2005). Linear dimensionality reduction using relevance weighted LDA. Pattern Recognition, 38, 485-493.
    • Turk, M., & Pentland, A. (1991). Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3, 72-86.
    • Vergara, J., & Estévez, P. (2014). A review of feature selection methods based on mutual information. Neural Computing and Applications, 24, 175-186.
    • Vidal-Naquet, M., & Ullman, S. (2003). Object recognition with informative features and linear classification. In Proceedings of the 10th IEEE international conference on computer vision (pp. 281-289).
    • Yang, H., & Moody, J. (1999). Feature selection based on joint mutual information. In Proceedings of international ICSC symposium on advances in intelligent data analysis (pp. 22-25).
    • Yu, H., & Yang, J. (2001). A direct LDA algorithm for high-dimensional data with application to face recognition. Pattern Recognition, 34, 2067-2070.
    • Yu, L., & Liu, H. (2004). Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research, 5, 1205-1224.
    • Yu, L., Ding, C., & Loscalzo, S. (2008). Stable feature selection via dense feature groups. In Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 803-811).
    • Zhang, Y., Yang, A., Xiong, C., Wang, T., & Zhang, Z. (2014). Feature selection using data envelopment analysis. Knowledge-Based Systems, 64, 70-80.
    • Zhang, Y., Yang, C., Yang, A., Xiong, C. Y., Zhou, X., & Zhang, Z. (2015). Feature selection for classification with class-separability strategy and data envelopment analysis. Neurocomputing, 166, 172-184.
  • No related research data.
  • No similar publications.

Share - Bookmark

Cite this article