LOGIN TO YOUR ACCOUNT

Username
Password
Remember Me
Or use your Academic/Social account:

CREATE AN ACCOUNT

Or use your Academic/Social account:

Congratulations!

You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.

Important!

Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message

CREATE AN ACCOUNT

Name:
Username:
Password:
Verify Password:
E-mail:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Bermudez Contreras, Edgar Josue (2010)
Languages: English
Types: Doctoral thesis
Subjects: QC, TJ
Object recognition is arguably one of the main tasks carried out by the visual cortex. This task has been studied for decades and is one of the main topics being investigated in the computer vision field. While vertebrates perform this task with exceptional reliability and in very short amounts of time, the visual processes involved are still not completely understood. Considering the desirable properties of the visual systems in nature, many models have been proposed to not only match their performance in object recognition tasks, but also to study and understand the object recognition processes in the brain. One important point most of the classical models have failed to consider when modelling object recognition is the fact that all the visual systems in nature are active. Active object recognition opens different perspectives in contrast with the classical isolated way of modelling neural processes such as the exploitation of the body to aid the perceptual processes. Biologically inspired models are a good alternative to study embodied object recognition since animals are a working example that demonstrates that object recognition can be performed with great efficiency in an active manner. In this thesis I study biologically inspired models for object recognition from an active perspective. I demonstrate that by considering the problem of object recognition from this perspective, the computational complexity present in some of the classical models of object recognition can be reduced. In particular, chapter 3 compares a simple V1-like model (RBF model) with a complex hierarchical model (HMAX model) under certain conditions which make the RBF model perform as the HMAX model when using a simple attentional mechanism. Additionally, I compare the RBF and HMAX model with some other visual systems using well-known object libraries. This comparison demonstrates that the performance of the implementations of the RBF and HMAX models employed in this thesis is similar to the performance of other state-of-the-art visual systems. In chapter 4, I study the role of sensors in the neural dynamics of controllers and the behaviour of simulated agents. I also show how to employ an Evolutionary Robotics approach to study autonomous mobile agents performing visually guided tasks. In addition, in chapter 5 I investigate whether the variation in the visual information, which is determined by simple movements of an agent, can impact the performance of the RBF and HMAX models. In chapter 6 I investigate the impact of several movement strategies in the recognition performance of the models. In particular I study the impact of the variation in visual information using different movement strategies to collect training views. In addition, I show that temporal information can be exploited to improve the object recognition performance using movement strategies. In chapter 7 experiments to study the exploitation of movement and temporal information are carried out in a real world scenario using a robot. These experiments validate the results obtained in simulations in the previous chapters. Finally, in chapter 8 I show that by exploiting regularities in the visual input imposed by movement in the selection of training views, the complexity of the RBF model can be reduced in a real robot. The approach of this work proposes to gradually increase the complexity of the processes involved in active object recognition, from studying the role of moving the focus of attention while comparing object recognition models in static tasks, to analysing the exploitation of an active approach in the selection of training views for a object recognition task in a real world robot.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • 1 Introduction 24 1.1 Structure overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
    • 2 Active object recognition and autonomous mobile robots 27 2.1 Object recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.2 Object recognition models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.2.1 Computer vision approaches . . . . . . . . . . . . . . . . . . . . . . . 29 2.2.2 Biologically inspired approaches . . . . . . . . . . . . . . . . . . . . 30 2.3 Active perception, embodiment, and situatedness . . . . . . . . . . . . . . . 34 2.3.1 Active vision and object recognition . . . . . . . . . . . . . . . . . . 34 2.3.2 Embodied and Situated visual systems . . . . . . . . . . . . . . . . . 35 2.3.3 Movement and object recognition . . . . . . . . . . . . . . . . . . . . 36 2.3.4 Temporal information and object recognition . . . . . . . . . . . . . 37 2.4 Controllers for autonomous visually guided mobile robots . . . . . . . . . . 37
    • 3 A first comparison: HMAX and RBF models in realistic conditions 39 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.2 Visual system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.2.1 The Analysis module . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.2.2 Classifier module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.2.3 The attentional and foveation mechanisms . . . . . . . . . . . . . . . 49 3.3 Model evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.3.1 State of the art comparison . . . . . . . . . . . . . . . . . . . . . . . 51 3.3.2 HMAX implementation validation . . . . . . . . . . . . . . . . . . . 52 3.4 Comparison of the models in more realistic conditions . . . . . . . . . . . . 56 3.4.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
    • 5 Active acquisition of visual information 82 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.2.1 Agent, arena and objects . . . . . . . . . . . . . . . . . . . . . . . . 83 5.2.2 Training phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.2.3 Testing phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.3.1 Similarity maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.3.2 Testing the models using movement trajectories . . . . . . . . . . . . 94 5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.4.1 Dimensionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.4.2 The role of the BDM . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
    • 6 Movement strategies during learning 104 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.3 Experiment 1: Movement strategies . . . . . . . . . . . . . . . . . . . . . . . 107 6.4 Experiment 2: Temporal information using the RBF model . . . . . . . . . 111 6.5 Experiment 3: Robustness of the RBF when using temporal information . . 114 6.5.1 Changing the radius of strategy 3 . . . . . . . . . . . . . . . . . . . . 114 6.5.2 Moving the centre of strategy 3 . . . . . . . . . . . . . . . . . . . . . 115 6.5.3 Using strategy 3 for training and the testing trajectory for testing. . 116 6.5.4 Moving the centre of strategy 4 . . . . . . . . . . . . . . . . . . . . . 116 6.5.5 Considering interval timing for strategy 4 . . . . . . . . . . . . . . . 118 6.6 Experiment 4: Using more objects . . . . . . . . . . . . . . . . . . . . . . . 119 6.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 6.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 7.4.1 Differences between the real world and the simulated case . . . . . . 137 7.4.2 Exploitation of variation in the object views in the real world . . . . 137
    • 7.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
    • 8 Towards active selection of training views 139 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 8.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 8.2.1 RBF versions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 8.2.2 The classifier module . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 8.2.3 Movement strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 8.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 8.3.1 Reducing the complexity of RBF . . . . . . . . . . . . . . . . . . . . 144 8.3.2 Investigation of training views and model performance . . . . . . . . 146 8.3.3 Exploiting regularities in the environment through movement . . . . 151 8.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 8.4.1 How could the reduced versions of the RBF model fully regain performance? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 8.4.2 Towards active object recognition . . . . . . . . . . . . . . . . . . . . 160 8.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
    • 1.1 Brighton seafront on a Sunday morning. . . . . . . . . . . . . . . . . . . . . . . 24
    • 2.1 Ventral and dorsal pathways in the visual cortex. The activity of the ventral pathway is generally associated with the identification of objects while the dorsal pathway is commonly associated with the localisation and actions related to objects in space (image adapted from wikipedia). . . . . . . . . . . . . . . . . . . . . . 31
    • 2.2 Hierarchical structure in the HMAX model. (adapted from (Riesenhuber and Poggio, 1999b)) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
    • 8.1 Movement strategies used to collect the training views. A) Movement strategy T1: the agent approaches the object in a straight line. B) Movement strategy T2: the agent passes in front of the object in a straight line. C) Movement strategy T3: the agent circles the object. D) Movement strategy T4: the agent spirals around the object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
    • 8.2 Spiral strategy. 100 positions were generated spiraling around the object (circle in the centre). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
    • 8.3 Performance (%) of the different RBF reduction implementations. The performance of RBFA, RBFB and RBFC are represented by columns A, B and C, respectively. The performance corresponds to the average number of correct guesses for every position in the arena using the four training movement strategies (averaged across all 4 movement strategies). The error bars show the standard deviation over the movement strategies. . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
    • 8.4 Similarity map of the training views using RBFA and RBFC . The similarity between views vi and vj is defined as k vi − vj k where k · k is the Euclidean norm. The red colour in the map indicates the lowest similarity and blue colours indicate the highest similarity between the training views. The red areas in the RBFA similarity maps are yellowish and blueish in the RBFC , showing the reduction in the specificity in the reduced versions of the RBF model. The similarity map using RBFB (not shown in this figure) shows an intermediate state between RBFA and RBFC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
    • 8.5 Difference of similarity map. In this map, every point represents the difference between the map RBFA and the map RBFC after normalising their values by the maximum view difference for each object and movement strategy (from figure 8.4). Blue represents low differences (min value = -0.15), white represents neutral difference (zero) and red represents larger differences (max value = 1.6). . . . . . 146
    • 8.6 Total number of correct classifications by the RBFA when the training views were collected using movement strategies T1, T2, T3 and T4 across all objects. . . . . . 147
    • 8.7 Total number of correct guesses for each object for the RBFA model when using movement strategies T1, T2, T3 and T4. Objects 2 and 6 are the ones with the lowest performance. Note that the quantities represented by each colour are independent and they are not meant to represent a cumulative plot. . . . . . . . . 147
    • 8.8 Examples of training views of object 4 (squirrel) using movement strategies T1, T2, T3 and T4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
    • 8.9 Total number of correct guesses for each object and each movement strategy (T1, T2, T3 and T4) for the RBFA model. Note that the quantities represented by each colour are independent and they are not meant to represent a cumulative plot. . . 149
    • 8.10 Total number of correct classifications by the RBFA model using movement strategies T1, T2, T3 and T4, when the blobs were manually corrected. . . . . . . . . . 149
    • Bianco, G., Zelinsky, A., and Lehrer, M. (2000). Visual landmark learning. In Proceedings of the International Conference on Intelligent Robots and Systems, (IROS 2000), pages 227 - 232.
    • Biederman, I. (1987). Recognition-by-components: a theory of human image understanding. Psychol Rev, 94(2):115-147.
    • Booth, M. and Rolls, E. (1998). View-invariant representations of familiar objects by neurons in the inferior temporal visual cortex. Cerebral Cortex, 8:510-523.
    • Borotschnig, H., Paletta, L., Prantl, M., and Pinz, A. (2000). Appearance-based active object recognition. Image and Vision Computing, 18:715-727.
    • Brooks, R. A. (1992). Artificial life and real robots. In Proceedings of the First European Conference on Artificial Life, pages 3-10. MIT Press.
    • Cartwright, B. and Collett, T. (1983). Landmark learning in bees: Experiments and models. Journal of Comparative Physiology, 151:521-543.
    • Cedras, C. and Shah, M. (1995). Motion-based recognition: a survey. Image and Vision Computing, 13(2):129-155.
    • Chen, J.-H. and Chen, C.-S. (2004). Object recognition based on image sequences by using inter-feature-line consistencies. Pattern Recognition, 37(9):1913-1923.
    • Cliff, D. and Miller, G. F. (1996). Co-evolution of pursuit and evasion II: Simulation methods and results. In Maes, P., Mataric, M. J., Meyer, J.-A., Pollack, J. B., and Wilson, S. W., editors, From animals to animats 4, pages 506-515, Cambridge, MA. MIT Press.
    • Collett, T. S. and Rees, J. A. (1997). View-based navigation in hymenoptera: multiple strategies of landmark guidance in the approach to a feeder. Journal of Comparative Physiology A: Neuroethology, Sensory, Neural, and Behavioral Physiology, 181(1):47-58.
    • Deco, G. and Lee, T. S. (2004). The role of early visual cortex in visual integration: a neural model of recurrent interaction. European Journal of Neuroscience, 20:1089-1100.
    • Duchon, A., Warren, W., and L., P. K. (1998). Ecological robotics. Special Issue on Biologically Inspired Models of Spatial Navigation, 6(3).
    • Duvdevani-Bar, S., Edelman, S., Howell, A. J., and Buxton, H. (1998). A similaritybased method for the generalization of face recognition over pose and expression. Proc. 3rd IEEE International Conference on Automatic Face & Gesture Recognition (FG'98), pages 118-123. Tara, Japan.
    • Edelman, S. (1997). Computational theories of object recognition. Trends in Cognitive Sciences, 1:296-304.
    • Edelman, S. and Duvdevani-Bar, S. (1997). A model of visual recognition and categorization. Phil. Trans. R. Soc. Lond. B, 352(1358):1191-2002.
    • Fergus, R., Perona, P., and Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In Proceedings of CVPR, volume 2, pages 264-271.
    • Findlay, J. M. and Gilchrist, I. D. (2003). Active vision: the psychology of looking and seeing. Oxford University Press, New York.
    • Floreano, D., Godjevac, J., Martinoli, A., Mondada, F., and Nicoud, J. (1998). Design, Control, and Applications of Autonomous Mobile Robots. In Tzafestas, S. G., editor, Advances in Intelligent Autonomous Agents. Kluwer Academic Publishers, Boston. Part 2, Chapter 8, p. 159-186.
    • Floreano, D., Kato, T., Marocco, D., and Sauser, E. (2004). Coevolution of active vision and feature selection. Biological Cybernetics, 90:218-228.
    • Funahashi, K.-I. and Nakamura, Y. (1993). Approximation of dynamical systems by continuous time recurrent neural networks. Neural Netw., 6(6):801-806.
    • Gvozdjak, P. and Li, Z. N. (1998). From nomad to explorer: active object recognition on mobile robots. Pattern recognition, (6):773-790.
    • Harvey, I., Husbands, P., and Cliff, D. (1994). Seeing the light: aritificial evolution, real vision. In Cliff, D., Husbands, P., Meyer, J., and Wilson, S., editors, From animals to animats III, pages 392-401.
    • Harvey, I., Paolo, E. A. D., Tuci, E., Wood, R., and Quinn, M. (2005). Evolutionary robotics: A new scientific tool for studying cognition. Artificial Life, 11:79-98.
    • Lehrer, M. and Bianco, G. (2000). The turn-back-and-look behaviour: bee versus robot. Biological cybernetics, 83(3):211-229.
    • Liese, A., Polani, D., and Uthmann, T. (2001). A study of the simulated evolution of the spectral sensitivity of visual agent receptors. Special Issue on Evolution of Sensors in Nature, Hardware and Simulation. Artificial Life, 7(2):99-124.
    • Logothetis, N., Pauls, J., Bulthoff, H., and Poggio, T. (1994). View dependent object recognition by monkeys. Curr. Biol., 4:401-414.
    • Logothetis, N., Pauls, J., and Poggio, T. (1995). Shape representation in the inferotemporal cortex of monkeys. Curr. Biol., 5:552-563.
    • Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91-110.
    • Marr, D. and Nishihara, H. K. (1978). Representation and recognition of the spatial organization of three-dimensional shapes. Proceedings of the Royal Society of London. Series B, Biological Sciences, 200(1140):269-294.
    • Meger, D., Forssen, P., Lai, K., Helmer, S., Mccann, S., Southey, T., Baumann, M., Little, J., and Lowe, D. (2008). Curious george: An attentive semantic robot? Robotics and Autonomous Systems, 56(6):503-511.
    • Mikolajczyk, K., Leibe, B., and Schiele, B. (2005). Local features for object class recognition. In ICCV '05: Proceedings of the Tenth IEEE International Conference on Computer Vision, pages 1792-1799, Washington, DC, USA. IEEE Computer Society.
    • Mokhtarian, F. and Abbasi, S. (2005). Robust automatic selection of optimal views in multi-view free-form object recognition. Pattern Recognition, 38(7):1021-1031.
    • Murase, H. and Nayar, S. (1995). Visual learning and recognition of 3-d objects from appearance. International Journal of Computer Vision, 14:5-24.
    • Nayar, S. K., Nene, S. A., and Murase, H. (1996). Real-time 100 object recognition system. In Robotics and Automation, 1996. Proceedings., 1996 IEEE International Conference on, volume 3, pages 2321-2325 vol.3.
    • Nolfi, S. and Marocco, D. (2000). Evolving visually-guided robots able to discriminate between different landmarks. In In: J-A Meyer, A. Berthoz, D. Floreano, H.L. Roitblat, and S.W. Wilson (eds.) From Animals to Animats 6. Proceedings of the VI International Conference on Simulation of Adaptive Behavior. MIT Press.
    • Orr, M. (1996). Introduction to radial basis functions networks. Technical report, Centre of Cognitive Science, University of Edinburgh.
    • Orr, M., Hallam, J., Murray, A., and Leonard, T. (2000). Assessing rbf networks using delve. International Journal of Neural Systems, 10:397-415.
    • Paletta, L., Rome, E., and Buxton, H. (2005). Attention architectures for machine vision and mobile robots. Neurobiology of Attention, pages 642-648.
    • Schneider, R. and Riesenhuber, M. (2004). On the difficulty of feature-based attentional modulations in visual object recognition: A modeling study. Technical Report Memo 2004-004, Massachusetts Institute of Technology. ftp://publications.ai.mit.edu/aipublications/2004/AIM-2004-004.pdf.
    • Serre, T., Kouh, M., Cadieu, C., Knoblich, U., Kreiman, G., and Poggio, T. (2004). A new biologically motivated framework for robust object recognition. Cbcl paper #243/ai memo #2004-026, Massachusetts Institute of Technology, MIT, Cambridge, MA.
    • Serre, T., Kouh, M., Cadieu, C., Knoblich, U., Kreiman, G., and Poggio, T. (2005a). A theory for object recognition: Computations and circuits in the feedforward path of the ventral visual stream in primate visual cortex. Cbcl paper #259/ai memo #2005-036, Massachusetts Institute of Technology.
    • Weber, M., Welling, M., and Perona, P. (2000). Unsupervised learning of models for object recognition. In Proceedings of ECCV.
    • Young, D. (2000). First-order optic flow and the control of action. In Proceedings of the European Conference on Visual Perception (ECVP2000).
    • Zhang, H., Berg, A. C., Maire, M., and Malik, J. (2006a). Svm-knn: Discriminative nearest neighbor classification for visual category recognition. In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 2, pages 2126-2136.
    • Zhang, H., Berg, A. C., Maire, M., and Malik, J. (2006b). Svm-knn: Discriminative nearest neighbor classification for visual category recognition. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2 (CVPR 06), pages 2126-2136.
  • Inferred research data

    The results below are discovered through our pilot algorithms. Let us know how we are doing!

    Title Trust
    51
    51%
  • No similar publications.

Share - Bookmark

Download from

Cite this article