Remember Me
Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:

OpenAIRE is about to release its new face with lots of new content and services.
During September, you may notice downtime in services, while some functionalities (e.g. user registration, login, validation, claiming) will be temporarily disabled.
We apologize for the inconvenience, please stay tuned!
For further information please contact helpdesk[at]openaire.eu

fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Soria, Daniele; Garibaldi, Jonathan M.; Ambrogi, Federico; Biganzoli, Elia M.; Ellis, Ian O. (2011)
Publisher: Elsevier
Languages: English
Types: Article

Classified by OpenAIRE into

ACM Ref: ComputingMethodologies_PATTERNRECOGNITION
Many algorithms have been proposed for the machine learning task of classication. One of the simplest methods, the naive Bayes classifyer, has often been found to give good performance despite the fact that its underlying assumptions (of independence and a Normal distribution of the variables) are perhaps violated. In previous work, we applied naive Bayes and other standard algorithms to a breast cancer database from Nottingham City Hospital in which the variables are highly non-Normal and found that the algorithm performed well when predicting a class that had been derived from the same data. However, when we then applied naive Bayes to predict an alternative clinical variable, it performed much worse than other techniques. This motivated us to propose an alternative method, based on naive Bayes, which removes the requirement for the variables to be Normally distributed, but retains the essential structure and other underlying assumptions of the method. We tested our novel algorithm on our breast cancer data and on three UCI datasets which also exhibited strong violations of Normality. We found our algorithm outperformed naive Bayes in all four cases and outperformed multinomial logistic regression (MLR) in two cases. We conclude that our method offers a competitive alternative to MLR and naive Bayes when dealing with data sets in which non-Normal distributions are observed.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • [1] J. Nahar, Y.-P. Chen, S. Ali, Kernel-based naive bayes classi er for breast cancer prediction, Journal of Biological System 15 (2007) 17{25.
    • [2] M. Hall, A decision tree-based attribute weighting lter for naive Bayes, Knowledge-Based Systems 20 (2007) 120{126.
    • [3] T. Mitchell, Machine Learning, McGraw-Hill, 1997.
    • [4] G. John, P. Langley, Estimating continuous distributions in bayesian classi ers, Proceeding of the Eleventh Conference on Uncertainty in Arti cial Intelligence (1995).
    • [5] R. Bouckaert, Naive bayes classi ers that perform well with continuous variables, in: Proceedings of the 17th Australian Conference on AI (AI04), Berlin: Springer, 2004.
    • [6] J. Dougherty, R. Kohavi, M. Sahami, Supervised and unsupervised discretization of continuous features, in: ICML, Morgan Kaufmann, 1995, pp. 194{202.
    • [7] R. Yager, An extension of Naive Bayes classi er, Information Science 176 (2006) 577{588.
    • [8] C.-H. Lee, Improving classi cation performance using unlabeled data: Naive Bayesian case, Knowledge-Based Systems 20 (2007) 220{224.
    • [9] A. Dempster, N. Laird, D. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society 39 (1977) 1{38.
    • [10] C. Hsu, Y. Huang, K. Chang, Extended Naive Bayes classi er for mixed data, Expert Systems with Applications 35 (2008) 1080{1083.
    • [11] J. Chen, H. Huang, F. Tian, S. Tian, A selective Bayes Classi er for classifying incomplete data based on gain ratio, Knowledge-Based Systems 21 (2008) 530{534.
    • [12] A. alias Balamurugan, R. Rajaram, Pramala, Rajalakshmi, Jeyendran, Dinesh, NB+: An improved Naive Bayesian algorithm, Knowledge-Based Systems In Press, Uncorrected Proof (2010).
    • [13] A. Asuncion, D. Newman, UCI machine learning repository, http://archive.ics.uci.edu/ml/, 2007. University of California, Irvine, School of Information and Computer Sciences.
    • [14] D. Soria, J. Garibaldi, E. Biganzoli, I. Ellis, A comparison of three di erent methods for classi cation of breast cancer data, in: Machine Learning and Applications, 2008. ICMLA '08. Seventh International Conference on, pp. 619{624. 14
    • [15] D. Abd El-Rehim, G. Ball, S. Pinder, E. Rakha, C. Paish, J. Robertson, D. Macmillan, R. Blamey, I. Ellis, Highthroughput protein expression analysis using tissue microarray technology of a large well-characterised series identi es biologically distinct classes of breast cancer con rming recent cDNA expression analyses, Int. Journal of Cancer 116 (2005) 340{350.
    • [16] J. Siebert, Vehicle recognition using rule based methods, Turing Institute Research Memorandum TIRM-87-018 (1987).
    • [17] T. Mitchell, Generative and discriminative classi ers: Naive bayes and logistic regression, 2005. Freely available at http://www.cs.cmu.edu/~tom/mlbook/NBayesLogReg.pdf.
    • [18] Z. Zheng, G. Webb, Lazy learning of bayesian rules, Machine Learning 41 (2000) 53{84.
    • [19] A. Ng, M. Jordan, On discriminative vs. generative classi ers: A comparison of logistic regression and naive bayes, Advances in Neural Information Processing Systems (NIPS) 14 (2002).
    • [20] I. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufmann, San Francisco, 2000.
    • [21] J. Maindonald, W. Braun, Data Analysis and Graphics Using R - An Example-Based Approach, Cambridge University Press, 2003.
    • [22] L. Xu, M.-Y. Chow, X. Gao, Comparisons of logistic regression and arti cial neural network on power distribution systems fault cause identi cation, in: 2005 IEEE Mid-Summer Workshop on Soft Computing in Industrial Applications.
    • [23] C. Perou, T. Sorlie, M. Eisen, M. Van De Rijn, S. Je rey, C. Rees, J. Pollack, D. Ross, H. Johnsen, L. Akslen, O. Fluge, A. Pergamenschikov, C. Williams, S. Zhu, P. Lonning, A. Borresen-Dale, P. Brown, D. Botstein, Molecular portraits of human breast tumours, Nature 406 (2000) 747{752.
    • [24] J. Pollack, T. Sorlie, C. Perou, C. Rees, S. Je rey, P. Lonning, R. Tibshirani, D. Botstein, A. Borresen-Dale, P. Brown, Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors, Proc Natl Acad Sci U S A 99 (2002) 12963{12968.
    • [25] T. Sorlie, C. Perou, R. Tibshirani, T. Aas, S. Geisler, H. Johnsen, T. Hastie, M. Eisen, M. Van De Rijn, S. Je rey, T. Thorsen, H. Quist, J. Matese, P. Brown, D. Botstein, P. Eystein Lonning, A. Borresen-Dale, Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications, Proc Natl Acad Sci U S A 98 (2001) 10869{ 10874.
    • [26] L. Van'T Veer, H. Dai, M. Van De Vijver, Y. He, A. Hart, R. Bernards, S. Friend, Expression pro ling predicts outcome in breast cancer, Breast Cancer Res 5 (2003) 57{58.
    • [27] M. Van De Vijver, Y. He, L. Van'T Veer, H. Dai, A. Hart, D. Voskuil, G. Schreiber, J. Peterse, C. Roberts, M. Marton, M. Parrish, D. Atsma, A. Witteveen, A. Glas, L. Delahaye, T. Van Der Velde, H. Bartelink, S. Rodenhuis, E. Rutgers, S. Friend, R. Bernards, A gene-expression signature as a predictor of survival in breast cancer, N Engl J Med 347 (2002) 1999{2009.
    • [28] A. Naderi, A. Teschendor , N. Barbosa-Morais, S. Pinder, A. Green, D. Powe, J. Robertson, S. Aparicio, I. Ellis, J. Brenton, C. Caldas, A gene-expression signature to predict survival in breast cancer across independent data sets, Oncogene 26 (2006) 1507{1516.
    • [29] M. Galea, R. Blamey, C. Elston, I. Ellis, The Nottingham Prognostic Index in primary breast cancer, Breast Cancer Res Treat 22 (1992) 207{219.
    • [30] I. Evett, E. Spiehler, Rule induction in forensic science, in: KBS in Goverment, Online Publications, 1987, pp. 107{118.
    • [31] S. Haberman, Generalized residuals for log-linear models, in: Proceedings of the 9th International Biometrics Conference, pp. 104{122.
    • [32] P. Royston, Algorithm as 181: The w test for normality, Applied Statistics 31 (1982) 176{180.
    • [33] F. Harrell Jr., K. Lee, D. Mark, Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors, Statistics in Medicine 15 (1996) 361{387.
    • Prognostic Group Excellent Prognostic Group (EPG) Good Prognostic Group (GPG) Moderate Prognostic Group 1 (MPG1) Moderate Prognostic Group 2 (MPG2) Poor Prognostic Group (PPG)
  • No related research data.
  • No similar publications.

Share - Bookmark

Cite this article

Cookies make it easier for us to provide you with our services. With the usage of our services you permit us to use cookies.
More information Ok