Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Rogers, S.; Girolami, M.; Campbell, C.; Breitling, R. (2005)
Publisher: Institute of Electrical and Electronics Engineers
Languages: English
Types: Article
Subjects: QA76

Classified by OpenAIRE into

ACM Ref: ComputingMethodologies_PATTERNRECOGNITION
We present a new computational technique (a software implementation, data sets, and supplementary information are available at http://www.enm.bris.ac.uk/lpd/) which enables the probabilistic analysis of cDNA microarray data and we demonstrate its effectiveness in identifying features of biomedical importance. A hierarchical Bayesian model, called latent process decomposition (LPD), is introduced in which each sample in the data set is represented as a combinatorial mixture over a finite set of latent processes, which are expected to correspond to biological processes. Parameters in the model are estimated using efficient variational methods. This type of probabilistic model is most appropriate for the interpretation of measurement data generated by cDNA microarray technology. For determining informative substructure in such data sets, the proposed model has several important advantages over the standard use of dendrograms. First, the ability to objectively assess the optimal number of sample clusters. Second, the ability to represent samples and gene expression levels using a common set of latent variables (dendrograms cluster samples and gene expression values separately which amounts to two distinct reduced space representations). Third, in contrast to standard cluster models, observations are not assigned to a single cluster and, thus, for example, gene expression levels are modeled via combinations of the latent processes identified by the algorithm. We show this new method compares favorably with alternative cluster analysis methods. To illustrate its potential, we apply the proposed technique to several microarray data sets for cancer. For these data sets it successfully decomposes the data into known subtypes and indicates possible further taxonomic subdivision in addition to highlighting, in a wholly unsupervised manner, the importance of certain genes which are known to be medically significant. To illustrate its wider applicability, we also illustrate its performance on a microarray data set for yeast.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • [1] A.A. Alizadeh et al., “Distinct Types of Diffuse Large B-Cell Lymphoma Identified by Gene Expression Profiling,” Nature, vol. 403, no. 3, pp. 503-511, Feb. 2000.
    • [2] D.M. Blei, A.Y. Ng, and M.I. Jordan, “Latent Dirichlet Allocation,” J. Machine Learning Research, vol. 3, pp. 993-1022, 2003.
    • [3] C.C. Chang et al., “Connective Tissue Growth Factor and Its Role in Lung Adenocarcinoma Invasion and Metastasis,” J. Nat'l Cancer Inst., vol. 96, pp. 344-345, 2004.
    • [4] S.M. Dhanasekaran et al., “Delineation of Prognostic Biomarkers in Prostate Cancer,” Nature, vol. 412, pp. 822-826, 2001.
    • [5] R.G. Fahmy et al., “Transcription Factor EGR-1 Supports FGFDependent Angiogenesis during Neovascularization and Tumor Growth,” Nature Medicine, vol. 9, pp. 1026-1032, 2003.
    • [6] E. Garber et al., “Diversity of Gene Expression in Adenocarcinoma of the Lung,” Proc. Nat'l Academy of Sciences of the USA, vol. 98, no. 24, pp. 12784-12789, 2001.
    • [7] A.P. Gasch et al., “Genomic Expression Program in the Response of Yeast Cells to Environmental Changes,” Molecular Biology of the Cell, vol. 11, pp. 4241-4257, 2000.
    • [8] L.C. Lazzeroni and A. Owen, “Plaid Models for Expression Data,” Statistica Sinica, vol. 12, pp. 61-86, 2002.
    • [9] G.J. McLachlan, R.W. Bean, and D. Peel, “A Mixture Model-Based Approach to the Clustering of Microarray Expression Data,” Bioinformatics, vol. 18, no. 3, pp. 413-422, 2002.
    • [10] E. Segal, A. Battle, and D. Koller, “Decomposing Gene Expression into Cellular Processes,” Proc. Eighth Pacific Symp. Biocomputing (PSB), pp. 89-100, 2003.
    • [11] P. Spellman et al., “Comprehensive Identification of Cell CycleRegulated Genes of the Yeast Saccharomyces Cerevisiae by Microarray Hybridization,” Molecular Biology of the Cell, vol. 9, pp. 3273-3297, 1998.
    • Simon Rogers received a degree in electrical and electronic engineering from the University of Bristol (2001) during which he was awarded the Sander prize for the best final year examination results, and the PhD degree in engineering mathematics from the University of Bristol (2004). During June 2004, he was a visiting researcher in the Bioinformatics Research Centre at the University of Glasgow where he now holds a postdoctoral research position.
  • No related research data.
  • No similar publications.

Share - Bookmark

Cite this article