Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Soria, Daniele; Garibaldi, Jonathan M. (2010)
Languages: English
Types: Unknown

Classified by OpenAIRE into

ACM Ref: ComputingMethodologies_PATTERNRECOGNITION
In this paper we present an original framework to extract representative groups from a dataset, and we validate it\ud over a novel case study. The framework specifies the application of different clustering algorithms, then several statistical and visualisation techniques are used to characterise the results, and core classes are defined by consensus clustering. Classes may be verified using supervised classification algorithms to obtain a set of rules which may be useful for new data points in the future. This framework is validated over a novel set of histone markers for breast cancer patients. From a technical perspective, the resultant classes are well separated and characterised by low, medium and high levels of biological markers. Clinically, the groups appear to distinguish patients with poor overall survival from those with low grading score and better survival. Overall, this framework offers a promising methodology for elucidating core consensus groups from data.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • [1] S. Monti, P. Tamayo, J. Mesirov, and T. Golub, “Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data,” Machine Learning, vol. 52, pp. 91- 118, 2003.
    • [2] C. Perou, T. Sørlie, M. Eisen, M. Van De Rijn, S. Jeffrey, C. Rees, J. Pollack, D. Ross, H. Johnsen, L. Akslen, Ø. Fluge, A. Pergamenschikov, C. Williams, S. Zhu, P. Lonning, A. Børresen-Dale, P. Brown, and D. Botstein, “Molecular portraits of human breast tumours,” Nature, vol. 406, pp. 747-752, 2000.
    • [3] T. Sørlie, C. Perou, R. Tibshirani, T. Aas, S. Geisler, H. Johnsen, T. Hastie, M. Eisen, M. Van De Rijn, S. Jeffrey, T. Thorsen, H. Quist, J. Matese, P. Brown, D. Botstein, P. Eystein Lonning, and A. BørresenDale, “Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications,” Proc Natl Acad Sci U S A, vol. 98, pp. 10 869-10 874, 2001.
    • [4] L. Van't Veer, H. Dai, M. van de Vijver, Y. He, A. Hart, M. Mao, H. Peterse, K. van der Kooy, M. Marton, A. Witteveen, G. Schreiber, R. Kerkhoven, C. Roberts, P. Linsley, R. Bernards, and S. Friend, “Gene expression profiling predicts clinical outcome of breast cancer,” Nature, vol. 415, pp. 530-536, 2002.
    • [5] F. Ambrogi, E. Biganzoli, P. Querzoli, S. Ferretti, P. Boracchi, S. Alberti, E. Marubini, and I. Nenci, “Molecular subtyping of breast cancer from traditional tumor marker profiles using parallel clustering methods,” Clinical Cancer Research, vol. 12, no. 3, pp. 781-790, 2006.
    • [6] D. Soria, J. Garibaldi, F. Ambrogi, A. Green, D. Powe, E. Rakha, R. Macmillan, R. Blamey, G. Ball, P. Lisboa, T. Etchells, P. Boracchi, E. Biganzoli, and I. Ellis, “A methodology to identify consensus classes from clustering algorithms applied to immunohistochemical data from breast cancer patients,” Computers in Biology and Medicine, vol. 40, no. 3, pp. 318-330, 2010.
    • [7] P. Kellam, X. Liu, N. Martin, C. Orengo, S. Swift, and A. Tucker, “Comparing, contrasting and combining clusters in viral gene expression data,” in Proceedings of 6th Workshop on Intelligent Data Analysis in Medicine, 2001.
    • [8] V. Filkov and S. Skiena, “Integrating microarray data by consensus clustering,” in Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence, 2003, pp. 418- 426.
    • [9] S. Swift, A. Tucker, V. Vinciotti, N. Martin, C. Orengo, X. Liu, and P. Kellam, “Consensus clustering and functional interpretation of geneexpression data,” Genome Biology, vol. 5:R94, 2004.
    • [10] X. Wang and J. Garibaldi, “A comparison of fuzzy and non-fuzzy clustering techniques in cancer diagnosis,” in Proceedings of second international conference in Computational Intelligence in Medicine and Healthcare, 2005, pp. 250-256.
    • [11] R. Diallo-Danebrock, E. Ting, O. Gluz, A. Herr, S. Mohrmann, H. Geddert, A. Rody, K. Schaefer, S. Baldus, A. Hartmann, P. Wild, M. Burson, H. Gabbert, U. Nitz, and C. Poremba, “Protein expression profiling in high-risk breast cancer patients treated with high-dose or conventional dose-dense chemotherapy,” Clin Cancer Res, vol. 13, pp. 488-497, 2007.
    • [12] R. Calinski and J. Harabasz, “A dendrite method for cluster analysis,” Communs statist, vol. 3, pp. 1-27, 1974.
    • [13] J. Hartigan, Clustering Algorithms. Wiley series in probability and mathematical statistics. Applied Probability and Statistics. New York: Wiley, 1975.
    • [14] A. Scott and M. Symons, “Clustering methods based on likelihood ratio criteria,” Biometrics, vol. 27, no. 2, pp. 387-397, 1971.
    • [15] F. Marriot, “Practical problems in a method of cluster analysis,” Biometrics, vol. 27, no. 3, pp. 501-514, 1971.
    • [16] H. Friedman and J. Rubin, “On some invariant criteria for grouping data,” Journal of the American Statistical Association, vol. 62, no. 320, pp. 1159-1178, 1967.
    • [17] I. Gath and A. Geva, “Unsupervised optimal fuzzy clustering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, no. 7, pp. 773-781, 1989.
    • [18] L. Xie and G. Beni, “Validity measure for fuzzy clustering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 8, pp. 841-847, 1991.
    • [19] J. Bezdek, R. Ehrlich, and W. Full, “FCM: The fuzzy c-means clustering algorithm,” Computers & Geosciences, vol. 10, pp. 191- 203, 1984.
    • [20] A. Weingessel, E. Dimitriadou, and S. Dolnicar, “An examination of indexes for determining the number of clusters in binary data sets,” Working Paper No.29, 1999.
    • [21] M. Halkidi, Y. Batistakis, and M. Vazirgiannis, “On clustering validation techniques,” Journal of Intelligent Information Systems, vol. 17, pp. 107-145, 2001.
    • [22] P. Velleman and D. Hoaglin, Applications, Basics and Computing of Exploratory Data Analysis. Boston, Mass.: Duxbury Press, 1981.
    • [23] J. Cohen, “A coefficient of agreement for nominal scales,” Educational and Psychological Measurement, vol. 20, pp. 37-46, 1960.
    • [24] --, “Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit,” Psychological Bulletin, vol. 70, pp. 213-220, 1968.
    • [25] W. Rand, “Objective criteria for the evaluation of clustering methods,” Journal of the American Statistical Association, vol. 66, pp. 846-850, 1971.
    • [26] L. Hubert and P. Arabie, “Comparing partitions,” Journal of Classification, vol. 2, pp. 193-218, 1985.
    • [27] K. Yeung and W. Ruzzo, “Principal component analysis for clustering gene expression data,” Bioinformatics, vol. 17, no. 9, pp. 763-774, 2001.
    • [28] --, “An empirical study on principal component analysis for clustering gene expression data,” Department of Computer Science & Engineering, University of Washington, Seattle, US, Tech. Rep., 2000.
    • [29] D. Soria, J. Garibaldi, F. Ambrogi, E. Biganzoli, and I. Ellis, “A 'nonparametric' version of the naive bayes classifier,” Submitted to Data & Knowledge Engineering, 2010.
    • [30] B. Everitt, The Cambridge Dictionary of Statistics. Cambridge University Press, 2002.
    • [31] S. Elsheikh, A. Green, M. Lambros, N. Turner, M. Grainge, D. Powe, I. Ellis, and J. Reis-Filho, “FGFR1 amplification in breast carcinomas: A chromogenic in situ hybridisation analysis,” Breast Cancer Research, vol. 9:R23, 2007.
    • [32] S. Elsheikh, A. Green, E. Rakha, D. Powe, R. Ahmed, H. Collins, D. Soria, J. Garibaldi, C. Paish, A. Ammar, M. Grainge, G. Ball, M. Abdelghany, L. Martinez-Pomares, D. Heery, and I. Ellis, “Global histone modifications in breast cancer correlate with tumor phenotypes, prognostic factors, and patient outcome,” Cancer Research, vol. 69, pp. 3802-3809, 2009.
    • [33] M. Galea, R. Blamey, C. Elston, and I. Ellis, “The Nottingham Prognostic Index in primary breast cancer,” Breast Cancer Res Treat, vol. 22, pp. 207-219, 1992.
    • [34] I. Ellis, M. Galea, N. Broughton, A. Locker, R. Blamey, and C. Elston, “Pathological prognostic factors in breast cancer. II. histological type. Relationship with survival in a large study with long-term follow-up,” Histopathology, vol. 20, pp. 479-489, 1992.
    • [35] E. Rakha, M. El-Sayed, A. Lee, C. Elston, M. Grainge, Z. Hodi, R. Blamey, and I. Ellis, “Prognostic significance of nottingham histologic grade in invasive breast carcinoma,” J Clin Oncol, vol. 26, no. 19, pp. 3153-3158, 2008.
    • [36] D. Abd El-Rehim, S. Pinder, C. Paish, J. Bell, R. Blamey, J. Robertson, R. Nicholson, and I. Ellis, “Expression of luminal and basal cytokeratins in human breast carcinoma,” Journal of Pathology, vol. 203, pp. 661-671, 2004.
    • [37] D. Maglott, J. Ostell, K. Pruitt, and T. Tatusova, “Entrez Gene: Genecentered information at NCBI,” Nucleic Acids Research, vol. Database Issue, pp. D54-D58, 2005.
    • [38] J. MacQueen, “Some methods of classification and analysis of multivariate observations,” in Proceedings of Fifth Berkeley Symposium on Mathematical Statistics and Probability, University of California, Berkeley, 1967, pp. 281-297.
    • [39] L. Kaufman and P. Rousseeuw, Finding Groups in Data: an Introduction to Cluster Analysis. Wiley series in probability and mathematical statistics. Applied Probability and Statistics. New York: Wiley, 1990.
    • [40] J. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, plenum, New York ed., 1981.
    • [41] E. Kaplan and P. Meier, “Nonparametric estimation from incomplete observations,” Journal of the American Statistical Association, vol. 53, no. 282, pp. 457-481, 1958.
    • [42] J. Kalbfleisch and R. Prentice, The Statistical Analysis of Failure Time Data, 2nd ed. Hoboken, N.J.: Wiley-Interscience, 2002.
    • [43] S. Pfister, S. Rea, M. Taipale, F. Mendrzyk, B. Straub, C. Ittrich, O. Thuerigen, H. Sinn, A. Akhtar, and P. Lichter, “The histone acetyltransferase hMOF is frequently downregulated in primary breast carcinoma and medulloblastoma and constitutes a biomarker for clinical outcome in medulloblastoma,” Int. J. Cancer, vol. 122, no. 6, pp. 1207-1213, 2008.
    • [44] A. Fred and A. Jain, “Combining multiple clusterings using evidence accumulation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 6, pp. 835-850, 2005.
  • No related research data.
  • No similar publications.

Share - Bookmark

Cite this article