Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Tepper, JA; Shertil, MS; Powell, HM (2016)
Publisher: Elsevier
Languages: English
Types: Article
The vanishing gradients problem inherent in Simple Recurrent Networks (SRN) trained with back-propagation, has led to a significant shift towards the use of Long Short-term Memory (LSTM) and Echo State Networks (ESN), which overcome this problem through either second order error-carousel schemes or different learning algorithms respectively. This paper re-opens the case for SRN-based approaches, by considering a variant, the Multi-recurrent Network (MRN). We show that memory units embedded within its architecture can ameliorate against the vanishing gradient problem, by providing variable sensitivity to recent and more historic information through layer- and self-recurrent links with varied weights, to form a so-called sluggish state-based memory. We demonstrate that an MRN, optimised with noise injection, is able to learn the long term dependency within a complex grammar induction task, significantly outperforming the SRN, NARX and ESN. Analysis of the internal representations of the networks, reveals that sluggish state-based representations of the MRN are best able to latch on to critical temporal dependencies spanning variable time delays, to maintain distinct and stable representations of all underlying grammar states. Surprisingly, the ESN was unable to fully learn the dependency problem, suggesting the major shift towards this class of models may be premature.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • [1] [2] [3] [4] [5] [6] [7] Elman, J. (1991). "Distributed representations, simple recurrent networks, and grammatical structure". Machine Learning, vol 7, 195-224.
    • D. Palmer-Brown, J. A. Tepper, and H. M. Powell (2002). “Connectionist natural language parsing,” Trends Cogn. Sci., vol. 6, no. 10, pp. 437-442, 2002.
    • M. H. Christiansen and N. Chater (2001). “Connectionist psycholinguistics in perspective,” in Christiansen, MH and Chater, N, (eds.) Connectionist psycholinguistics. pp. 19-75. Ablex: Westport, CT.
    • T. Koskela, M. Varsta, J. Heikkonen, and K. Kaski (1998). “Temporal sequence processing using recurrent SOM,” in Knowledge-Based Intelligent Electronic Systems, 1998. Proceedings KES'98. 1998 Second International Conference on, 1998, vol. 1, pp. 290-297.
    • J. M. Binner, P. Tino, J. Tepper, R. Anderson, B. Jones, and G. Kendall (2010). “Does money matter in inflation forecasting?,” Phys. A Stat. Mech. its Appl., vol. 389, no. 21, pp. 4793-4808, 2010.
    • J. F. Kolen and S. C. Kremer (2001). A field guide to dynamical recurrent networks.
    • John Wiley & Sons, 2001.
    • I. Sutskever and G. Hinton (2010). “Temporal-kernel recurrent neural networks,” Neural Networks, vol. 23, no. 2, pp. 239-243, 2010.
    • [9] [10] B. Cartling (2008). “On the implicit acquisition of a context-free grammar by a simple recurrent neural network,” Neurocomputing, vol. 71, no. 7-9, pp. 1527-1537, Mar.
    • J. L. Elman (1995). “Language as a dynamical system,” Mind as motion Explor. Dyn.
    • Cogn., pp. 195-223, 1995.
    • W. Deliang, L. Xiaomei, and S. C. Ahalt (1996). “On temporal generalization of simple recurrent networks,” Neural Networks, vol. 9, no. 7, pp. 1099-1118, 1996.
    • [11] L. Gupta, M. McAvoy, and J. Phegley (2000). “Classification of temporal sequences via prediction using the simple recurrent neural network,” Pattern Recognit., vol. 33, no. 10, pp. 1759-1770, 2000.
    • [12] A. Cleeremans, D. Servan-Schreiber, and J. L. McClelland (1989). “Finite State Automata and Simple Recurrent Networks,” Neural Comput., vol. 1, no. 3, pp. 372- 381, Sep. 1989.
    • [13] S. L. Frank (2006). “Learn more by training less : systematicity in sentence processing by recurrent networks,” Connection Sci. vol. 18, no. 3, pp. 287-302, 2006.
    • [14] A. S. Noel Sharkey and S. Jackson (2000). “Are SRNs Sufficient for Modeling Language Acquisition?,” Oxford University Press, 2000, pp. 33-54.
    • [15] J. A. Tepper, H. M. Powell, and D. Palmer-Brown (2002) “A corpus-based connectionist architecture for large-scale natural language parsing,” Conn. Sci., vol. 14, no. 2, pp. 93-114, 2002.
    • [16] I. Farkaš and M. W. Crocker (2008) “Syntactic systematicity in sentence processing with a recurrent self-organizing network,” Neurocomputing, vol. 71, no. 7, pp. 1172- 1179, 2008.
    • [17] Y. Bengio, P. Simard, and P. Frasconi (1994). “Learning long-term dependencies with gradient descent is difficult,” Neural Networks, IEEE Trans., vol. 5, no. 2, pp. 157- 166, 1994.
    • [18] J. A. Pérez-Ortiz, F. A. Gers, D. Eck, and J. Schmidhuber (2003). “Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets,” Neural Networks, vol. 16, no. 2, pp. 241-250, 2003.
    • [19] C. Ulbricht (1994) “Multi-recurrent networks for traffic forecasting,” in Proceedings Of The National Conference On Artificial Intelligence, p. 883.
    • [20] G. Dorffner (1996). “Neural networks for time series processing,” in Neural Network World, 1996.
    • [21] F. Gers (2001). “Long Short-Term Memory in Recurrent Neural Networks,” Lausanne, EPFL, 2001.
    • M. H. Tong, A. D. Bickett, E. M. Christiansen, and G. W. Cottrell (2007). “Learning grammatical structure with Echo State Networks.,” Neural Netw., vol. 20, no. 3, pp.
    • 424-32, Apr. 2007.
    • M. D. Skowronski and J. G. Harris (2006). “Minimum mean squared error time series classification using an echo state network prediction model,” 2006 IEEE Int. Symp.
    • Circuits Syst., no. m, pp. 3153-3156, 2006.
    • [25] A. Rodan and P. Tino (2011). “Minimum complexity echo state network.,” IEEE Trans. Neural Netw., vol. 22, no. 1, pp. 131-44, Jan. 2011.
    • [26] S. E. Fahlman (1991) The recurrent cascade-correlation architecture. In: Advances in neural information processing systems 3, ed. R. P. Lippmann, J. E. Moody & D. S. Touretzky. Morgan Kaufmann
    • [27] D. Servan-Schreiber, A. Cleeremans, and J. L. McClelland (1991). “Graded state machines: The representation of temporal contingencies in simple recurrent networks,” Mach. Learn., vol. 7, no. 2-3, pp. 161-193, 1991.
    • [28] M. I. Jordan (1986). “Attractor dynamics and parallelism in a sequential connectionist machine,” in Proceedings of 9th Annual Conference of Cognitive Science Society, 1986, pp. 531-546.
    • [29] H. Jaeger (2001). “The' echo state' approach to analysing and training recurrent neural networks-with an erratum note',” Bonn, Ger. Ger. Natl. Res. Cent. Inf. Technol. GMD Tech. Rep., vol. 148, 2001.
    • [30] M. Craven and J. W. Shavlik (1994). “Using Sampling and Queries to Extract Rules from Trained Neural Networks.,” in ICML, 1994, pp. 37-45.
    • [31] R. Setiono and H. Liu (1995). “Understanding neural networks via rule extraction,” in IJCAI, 1995, vol. 1, pp. 480-485.
    • [32] J. Bullinaria (1997). “Analyzing the internal representations of trained neural networks,” Neural Netw. Anal. Archit. Algorithms, pp. 3-26, 1997.
    • [34] T. Lin, B. G. Horne, and C. L. Giles (1998). “How embedded memory in recurrent neural network architectures helps learning long-term temporal dependencies,” Neural Networks, vol. 11, no. 5, pp. 861-868, 1998.
    • [35] M. Bodén and J. Wiles (2000). “Context-free and context-sensitive dynamics in recurrent neural networks,” Conn. Sci., vol. 12, no. 3-4, pp. 197-210, 2000.
    • [36] G. F. Marcus (1998). “Can connectionism save constructivism?,” Cognition, vol. 66, no. 2, pp. 153-82, May 1998.
    • [37] S. Hochreiter and J. Schmidhuber (1997). “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735-1780, 1997.
    • [38] P. J. Werbos (1990). “Backpropagation through time: What it does and how to do it”. Proceedings of the IEEE, 78 (10), 1550-1560.
    • [39] N. Cowan (2001). “The magical number 4 in short-term memory: A reconsideration of mental storage capacity”. Behavioral and Brain Sciences, 24:87-185.
    • [40] A. Clark (2013). “Whatever next? Predictive brains, situated agents, and the future of cognitive science”. Behavioral and Brain Sciences (36):181-204.doi: 10.1017/S0140525X12000477
    • [41] P. F. Dominey, M. Arbib., and J. P. Joseph (1995). “A model of corticostriatal plasticity for learning oculomotor associations and sequences”. Journal of Cognitive Neuroscience, vol 7 no 25.
    • [42] P. F. Dominey (2013). “Recurrent temporal networks and language acquisition - from corticostriatal neurophysiology to reservoir computing”. Frontiers in Psychology, vol 4, part 500, doi: 10.3389/fpsyg.2013.00500.
    • [43] X. Hinaut., and P. F. Dominey (2013). “Real-time parallel processing of grammatical structure in the frontostriatal system: a recurrent network simulation study using reservoir computing”. PLoS ONE 8(2). e52946. doi: 10.1371/journal.pone.0052946
    • [44] R. Pascanu., and H. Jaeger (2011). “A neurodynamical model for working memory”. Neural Networks, vol. 24, 199-207. doi: 10.1016/j.neunet. 2010.10.003.
    • [45] A. D. Friederici (2012). “The cortical language circuit: from auditory perception to sentence comprehension”. Trends Cogn. Sci. 16, 262-268. doi:10.1016/j.tics.2012.04.001.
    • [46] H. T. Siegelmann., and E. D. Sontag (1991) “Turing computability with neural nets”. Applied Mathematics Letters, 4, 77-80.
    • [47] H. M. Christiansen., and M. C. MacDonald (2009) “A usage-based approach to Recursion in Sentence Processing”. Language Learning, 126-161, ISSN 0023-8333.
    • [48] N. Evans., and S. C. Levinson (2009) “The myth of language universals: Language diversity and its importance for cognitive science”. Behavioral and Brain Sciences (32):429-492.doi: 10.1017/S0140525X0999094X.
  • Discovered through pilot similarity algorithms. Send us your feedback.

Share - Bookmark

Cite this article