Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Husmeier, D. (2000)
Publisher: MIT Press
Languages: English
Types: Article
Training probability-density estimating neural networks with the expectation-maximization (EM) algorithm aims to maximize the likelihood of the training set and therefore leads to overfitting for sparse data. In this article, a regularization method for mixture models with generalized linear kernel centers is proposed, which adopts the Bayesian evidence approach and optimizes the hyperparameters of the prior by type II maximum likelihood. This includes a marginalization over the parameters, which is done by Laplace approximation and requires the derivation of the Hessian of the log-likelihood function. The incorporation of this approach into the standard training scheme leads to a modified form of the EM algorithm, which includes a regularization term and adapts the hyperparameters on-line after each EM cycle. The article presents applications of this scheme to classification problems, the prediction of stochastic time series, and latent space models.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • Attias, H. (1999). Independent factor analysis. Neural Computation, 11(4), 803- 851.
    • Bishop, C. M. (1995). Neural networks for pattern recognition. New York: Oxford University Press.
    • Bishop, C. M., & Qazaz, C. S. (1995). Bayesian inference of noise levels in regression. In Proceedings ICANN 95 (pp. 59-64).
    • Bishop, C. M., Svensen, M., & Williams, C. K. I. (1998). GTM: The generative topographic mapping. Neural Computation, 10(1), 215-234.
    • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B39(1), 1-38.
    • Hoel, P. G. (1984). Introduction to mathematical statistics. New York: Wiley.
    • Husmeier, D. (1998). Modelling conditional probability densities with neural networks. Unpublished doctoral dissertation, King's College London. Available online at: http://www.bioss.sari.ac.uk/»dirk/My publications.html.
    • Husmeier, D., & Taylor, J. G. (1997). Predicting conditional probability densities of stationary stochastic time series. Neural Networks, 10(3), 479-497.
    • Husmeier, D., & Taylor, J. G. (1998). Neural networks for predicting conditional probability densities: Improved training scheme combining EM and RVFL. Neural Networks, 11(1), 89-116.
    • Igelnik, B., & Pao, Y. H. (1995). Stochastic choice of basis functions on adaptive functional approximation and the functional-link net. IEEE Transactions on Neural Networks, 6, 1320-1329.
    • Jacobs, R. A., Jordan, M. I., Nowlan, S. J., & Hinton, G. E. (1991). Adaptives mixtures of local experts. Neural Computation, 3, 79-87.
    • Jordan, M. I., & Jacobs, R. A. (1994). Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6, 181-214.
    • MacKay, D. J. C. (1992). Bayesian interpolation. Neural Computation, 4, 415-447.
    • MacKay, D. J. C. (1993). Hyperparameters: Optimize, or integrate out. In G. Heidbreder (Ed.), Maximum entropy and bayesian methods (pp. 43-59). Norwell, MA: Kluwer.
    • Ormoneit, D., & Tresp, V. (1996). Improved gaussian mixture density estimates using Bayesian penalty terms and network averaging. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems, 8 (pp. 542-548). Cambridge, MA: MIT Press.
    • Papoulis, A. (1991). Probability, random variables, and stochastic processes (3rd ed.). New York: McGraw-Hill.
    • Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (1992). Numerical recipes in C. New York: Cambridge University Press.
    • Ripley, B. D. (1994). Neural networks and related methods for classification. Journal of the Royal Statistical Society B, 56(3), 409-456.
    • Roberts, S. J., Husmeier, D., Rezek, I., & Penny, W. (1998). Bayesian approaches to gaussian mixture modeling. IEEE Transactions on Pattern Analysis and Machine Learning, 20, 1133-1142.
    • Spyers-Ashby, J. M. (1996). The recording and analysis of tremor in neurological disorders. Unpublished doctoral dissertation, Imperial College, London.
    • Tipping, M. E., & Bishop, C. M. (1999). Mixtures of probabilistic principal component analyzers. Neural Computation, 11(2), 443-482.
    • Xu, L., & Jordan, M. I. (1996). On convergence properties for the EM algorithm. Neural Computation, 8, 129-151.
    • 1. Mark Girolami, Ben Calderhead. 2011. Riemann manifold Langevin and Hamiltonian Monte Carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73:2, 123-214. [CrossRef]
    • 2. Antonio Penalver Benavent, Francisco Escolano Ruiz, Juan Manuel Saez. 2009. Learning Gaussian Mixture Models With EntropyBased Criteria. IEEE Transactions on Neural Networks 20:11, 1756-1771. [CrossRef]
    • 3. L.P. Wang, C.R. Wan. 2008. Comments on “The Extreme Learning Machine”. IEEE Transactions on Neural Networks 19:8, 1494-1495. [CrossRef]
  • No related research data.
  • No similar publications.

Share - Bookmark

Download from

Cite this article