LOGIN TO YOUR ACCOUNT

Username
Password
Remember Me
Or use your Academic/Social account:

CREATE AN ACCOUNT

Or use your Academic/Social account:

Congratulations!

You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.

Important!

Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message

CREATE AN ACCOUNT

Name:
Username:
Password:
Verify Password:
E-mail:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Child, C. H. T. (2012)
Publisher: City University London
Languages: English
Types: Doctoral thesis
Subjects: QA75, T1
This thesis presents an approximate dynamic programming (ADP) technique for environment modelling agents. The agent learns a set of parallel stochastic planning operators (P-SPOs) by evaluating changes in its environment in response to actions, using an association rule mining approach. An approximate policy is then derived by iteratively improving state value aggregation estimates attached to the operators using the P-SPOs as a model in a Dyna-Q-like architecture.\ud \ud Reinforcement learning and dynamic programming are powerful techniques for automated agent decision making in stochastic environments. Dynamic programming is effective when there is a known environment model, while reinforcement learning is effective when a model is not available. The techniques derive a policy: a mapping from each environment state to an action which optimizes the long term reward the agent receives.\ud \ud The standard methods become less effective as the state space for the environment increases because they require values to be associated with each state, the storage and processing of which is exponential to the number of state variables. Resolving this “curse of dimensionality” is an important topic of research amongst all communities working on this problem. Two key methods are to: (i) derive an estimate of the value (approximate dynamic programming) using function approximation or state aggregation; or (ii) build a model of the environment from experience.\ud \ud This thesis presents a method of combining these approaches by exploiting structure in the state transition and value functions captured in a set of planning operators which are learnt through experience in the environment. Standard planning operators define the deterministic changes that occur in an environment in response to an action. This work presents Parallel Stochastic Planning Operators (P-SPOs), a novel form of planning operator providing a structured model of the state transition function in environments which are both non-deterministic and for which changes can occur outside the influence of actions. Next, an automated method for extracting P-SPOs from observations in an environment is explored using an adaptation of association rule mining. Finally, methods of relating the state transition structure encapsulated in the P-SPOs to state values, using the operators to store state value aggregation estimates, are evaluated.\ud \ud The framework described provides a method by which approximate dynamic programming can be applied by designers of AI agents and AI planning systems for which they have minimal prior knowledge. The framework and P-SPO based implementations are tested against standard techniques in two bench-mark stochastic environments: a “slippery gripper” block painting robot; and a “predator-prey” agent environment.\ud \ud Experimental results show that an agent using a P-SPO-based approach is able to learn an accurate model of its environment if successor state variables exhibit conditional independence, and an approximate model in the non-independent case. Results also demonstrate that the agent’s ability to generalise to previously unseen states using the model allow it to form an improved policy over an agent employing a standard Dyna-Q based technique. Finally, an approximate policy stored in state aggregation estimates attached to operators is shown to be optimal in experiments for which the P-SPO set contains sufficient information for effective aggregations to be formed.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • [25] Draper, D., Hanks, S. and Weld, D. (1994) “Probabilistic Planning with Information Gathering and Contingent Execution”, Proc. 2nd Int. Conf on Artificial Intelligence Planning Systems (AIPS-94), pp 31-36.
    • [38] Gregory, J., Lander. J. and Whiting, M. (2009) “Game Engine Architecture”, A.K. Peters/ CRC Press.
    • [39] Grzes, M. and Kudenko, D. (2008) “An Empirical Analysis of the Impact of Prioritised Sweeping on the DynaQ's Performance”, Proc. 9th Int. Conf. on Artificial Intelligence and Soft Computing (ICAISC-08), pp 1041-1051.
    • [40] Guerin, F. (2011) "Learning like a Baby: A Survey of AI approaches", The Knowledge Engineering Review (accepted to appear).
    • [41] Hidber, C. (1999) “Online Association Rule Mining”. Proc. of the 1999 ACM SIGMOD Int. Conf. on Management of Data, Philadelphia, Pennsylvania, USA, pp 145-156.
    • [42] Hipp, J., Gunter, U. and Nakhaeizadeh, G., (2000) “Algorithms for Association Rule Mining: a General Survey and Comparison”, ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2000), Boston, MA, USA, pp 58-64.
    • [43] Kirsting, K. and De Raedt, L. (2001) “Bayesian Logic Programs”, Technical Report No. 151, Institute of Computer Science, University of Freiburg, Germany.
    • [44] Kirsting, K. and De Raedt, L. (2008) “Basic Principles of Learning Bayesian Logic Programs”, Probabilistic Inductive Logic Programming, Lecture Notes in Computer Science, 4911/2008. Springer.
    • [45] Kisynski, J. and Poole, D. (2009) “Lifted Aggregation in Directed First-Order Probabilistic Models”, Proc. 21st Int. Joint Conf. on Artificial Intelligence (IJCAI 2009), pp 1922-1929.
    • [52] Jordan, M. I., (1998) “Learning in Graphical Models”. MIT Press.
    • [54] Lang, T. and Toussaint, M. (2010) “Planning with Noisy Probabilistic Relational Rules”, Journal of Artificial Intelligence Research, 39, pp 1-49.
    • [66] Pasula, H. M, Zettlemoyer, L. S. and Kaelbling, L. P. (2004) “Learning Probabilistic Relational Planning Rules”, Proc. 14th Int. Conf. on Automated Planning and Scheduling (ICAPS-04), pp 73-82.
  • No related research data.
  • No similar publications.

Share - Bookmark

Cite this article