Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
de Kok, I.A.; Heylen, Dirk K.J. (2009)
Publisher: Association for Computing Machinery (ACM)
Types: Article,Conference object
One of many skills required to engage properly in a conversation is to know the appropiate use of the rules of engagement. In order to engage properly in a conversation, a virtual human or robot should, for instance, be able to know when it is being addressed or when the speaker is about to hand over the turn. The paper presents a multimodal approach to end-of-speaker-turn prediction using sequential probabilistic models (Conditional Random Fields) to learn a model from observations of real-life multi-party meetings. Although the results are not as good as expected, we provide insight into which modalities are important when taking a multimodal approach to the problem based on literature and our own results.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • [1] M. Argyle and M. Cook. Gaze and mutual gaze. Cambridge University Press, London, United Kingdom, 1976.
    • [2] M. Atterer, T. Baumann, and D. Schlangen. Towards incremental end-of-utterance detection in dialogue systems. In Proceedings of International Conference on Computational Linguistics, 2008.
    • [3] P. Barkhuysen, E. Krahmer, and M. Swerts. The interplay between auditory and visual cues for end-of-utterance detection. Journal of Acoustical Society of America, 123(1):354 { 365, 2008.
    • [4] P. Boersma and V. van Heuven. Speak and unspeak with praat. Glot International, 5(9-10):341{347, November 2001.
    • [5] J. Cassell, Y. I. Nakano, T. W. Bickmore, C. L. Sidner, and C. Rich. Non-verbal cues for discourse structure. In ACL '01: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, pages 114{123, Morristown, NJ, USA, 2001. Association for Computational Linguistics.
    • [6] J. Cassell, J. Sullivan, S. Prevost, and E. F. Churchill. Embodied Conversational Agents. MIT Press, Cambridge Massachusetts, London England, 2000.
    • [7] J. Cassell, O. E. Torres, and S. Prevost. Turn taking vs. discourse structure: How best to model multimodal conversation. In Machine Conversations, pages 143{154. Kluwer, 1998.
    • [8] J. de Ruiter, H. Mitterer, and N. En eld. Projecting the end of a speaker's turn: A cognitive cornerstone of conversation. Language, 82(3):515 { 535, 2006.
    • [9] S. Duncan. Some signals and rules for taking speaking turns in conversations. Journal of Personality and Social Psychology, 23(2):283 { 292, 1972.
    • [10] S. Duncan and G. Niederehe. On signalling that it's your turn to speak. Journal of Experimental Social Psychology, 10:234{247, 1974.
    • [11] O. Fuentes, D. Vera, and T. Solorio. A lter-based approach to detect end-of-utterances from prosody in dialog systems. In HLT-NAACL (Short Papers), pages 45{48. The Association for Computational Linguistics, 2007.
    • [12] J. Fung, D. Hakkani-Tur, M. Magimai-Doss, E. Shriberg, S. Cuendet, and N. Mirghafori. Prosodic features and feature selection for multi-lingual sentence segmentation. In Proceedings of Interspeech 2007, pages 2585{2588, 2007.
    • [13] C. Goodwin. Conversational Organization: interaction between speakers and hearers. Academic Press, 1981.
    • [14] D. Heylen. Head gestures, gaze and the principles of conversational structure. International Journal of Humanoid Robotics, 3(3):241{267, 2006.
    • [15] D. Heylen. Listening heads. In I. Wachsmuth and G. Knoblich, editors, Modeling Communication with robots and virtual humans, volume 4930 of Lecture Notes in Arti cial Intelligence, pages 241{259. Springer Verlag, Berlin, 2008.
    • [16] http://corpus.amiproject.org. The AMI Meeting Corpus, May 2009.
    • [17] A. Kendon. Some functions of gaze direction in social interaction. Acta Psychologica, 26:22{63, 1967.
    • [18] J. La erty, A. McCallum, and F. Pereira. Conditional random elds: probabilistic models for segmenting and labelling sequence data. In ICML, 2001.
    • [19] T. Minato, Y. Yoshikawa, T. Noda, S. Ikemoto, H. Ishiguro, and M. Asada. CB2: A child robot with biomimetic body for cognitive developmental robotics. In IROS 2008: Proceedings of the IEEE/RSJ 2008 International Conference on Intelligent RObots and Systems, pages 193{200, 2008.
    • [20] L.-P. Morency, I. de Kok, and J. Gratch. Context-based recognition during human interactions: Automatic feature selection and encoding dictionary. In ICMI '08: Proceedings of the 10th International Conference on Multimodal Interfaces, pages 181{188, New York, NY, USA, 2008. ACM.
    • [21] L.-P. Morency, I. de Kok, and J. Gratch. Predicting listener backchannels: A probabilistic multimodal approach. In Intelligent Virtual Agents (IVA '08), pages 176{190, 2008.
    • [22] D. C. O'Connell, S. Kowal, and E. Kaltenbacher. Turn-taking: A critical analysis of the research tradition. Journal of Psycholinguistic Research, 19(6):345 { 373, 1990.
    • [23] L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257{286, 1989.
    • [24] R. J. Rienks, R. Poppe, and D. Heylen. Di erences in head orientation behavior for speakers and listeners: an experiment in a virtual environment. Transactions on Applied Perception, 7(1):accepted for publication, 2010.
    • [25] H. Sacks, E. A. Scheglo , and G. Je erson. A simplest systematics for the organization of turn-taking for conversation. Language, 50(4):696 { 735, 1974.
    • [26] D. Sakamoto, T. Kanda, T. Ono, H. Ishiguro, and N. Hagita. Android as a telecommunication medium with a human-like presence. In HRI '07: Proceedings of the ACM/IEEE international conference on Human-robot interaction, pages 193{200, New York, NY, USA, 2007. ACM.
    • [27] D. Schlangen. From reaction to prediction: Experiments with computational models of turn-taking. In Proceedings of Interspeech 2006, 2006.
    • [28] T. Sikorski and J. F. Allen. A task-based evaluation of the trains-95 dialogue system. In ECAI '96: Workshop on Dialogue Processing in Spoken Language Systems, pages 207{220, London, UK, 1997. Springer-Verlag.
    • [29] R. Vertegaal, R. Slagter, G. van der Veer, and A. Nijholt. Eye gaze patterns in conversations: There is more to conversational agents than meets the eyes. In Proceedings of CHI'01, pages 301 { 308. ACM, 2001.
    • [30] R. Vertegaal, G. van der Veer, and H. Vons. E ects of gaze on multiparty mediated communication. In Proceedings of Graphics Interface, pages 95 { 102, Montreal, Canada, 2000. Morgan Kaufmann Publishers.
    • [31] N. Ward and W. Tsukahara. Prosodic features which cue back-channel responses in english and japanese. Journal of Pragmatics, 32(8):1177{1207, 2000.
  • No related research data.
  • No similar publications.

Share - Bookmark

Funded by projects


Cite this article

Collected from