Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Devlin, Sam; Kudenko, Daniel (2011)
Publisher: ACM
Languages: English
Types: Other
Potential-based reward shaping has previously been proven to both be equivalent to Q-table initialisation and guarantee policy invariance in single-agent reinforcement learning. The method has since been used in multi-agent reinforcement learning without consideration of whether the theoretical equivalence and guarantees hold. This paper extends the existing proofs to similar results in multi-agent systems, providing the theoretical background to explain the success of previous empirical studies. Specically, it is proven that the equivalence to Q-table initialisation remains and the Nash Equilibria of the underlying stochastic game are not modied. Furthermore, we demonstrate empirically that potential-based reward shaping eects exploration and, consequentially, can alter the joint policy converged upon.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • [1] J. Asmuth, M. Littman, and R. Zinkov. Potential-based shaping in model-based reinforcement learning. In Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, pages 604-609, 2008.
    • [2] M. Babes, E. de Cote, and M. Littman. Social reward shaping in the prisoner's dilemma. In Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems, volume 3, pages 1389-1392, 2008.
    • [3] D. P. Bertsekas. Dynamic Programming and Optimal Control (2 Vol Set). Athena Scientific, 3rd edition, 2007.
    • [4] C. Boutilier. Sequential optimality and coordination in multiagent systems. In International Joint Conference on Artificial Intelligence, volume 16, pages 478-485. Citeseer, 1999.
    • [5] L. Busoniu, R. Babuska, and B. De Schutter. A Comprehensive Survey of MultiAgent Reinforcement Learning. IEEE Transactions on Systems Man & Cybernetics Part C Applications and Reviews, 38(2):156, 2008.
    • [6] C. Claus and C. Boutilier. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the National Conference on Artificial Intelligence, pages 746-752, 1998.
    • [7] S. Devlin, M. Grze´s, and D. Kudenko. Multi-agent, potential-based reward shaping for RoboCup KeepAway. In Proceedings of The Tenth Annual International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2011.
    • [8] G. C. Eric Wiewiora and C. Elkan. Principled methods for advising reinforcement learning agents. In Proceedings of the Twentieth International Conference on Machine Learning, 2003.
    • [9] J. Filar and K. Vrieze. Competitive Markov decision processes. Springer Verlag, 1997.
    • [10] M. Grze´s and D. Kudenko. Plan-based reward shaping for reinforcement learning. In Proceedings of the 4th IEEE International Conference on Intelligent Systems (IS'08), pages 22-29. IEEE, 2008.
    • [11] J. Hu and M. Wellman. Nash Q-learning for general-sum stochastic games. The Journal of Machine Learning Research, 4:1039-1069, 2003.
    • [12] S. Kapetanakis and D. Kudenko. Reinforcement learning of coordination in cooperative multi-agent systems. In Proceedings of the National Conference on Artificial Intelligence, pages 326-331. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999, 2002.
    • [13] S. Kapetanakis and D. Kudenko. Reinforcement learning of coordination in heterogeneous cooperative multi-agent systems. pages 119-131, 2004.
    • [14] M. Littman. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the eleventh international conference on machine learning, volume 157, page 163. Citeseer, 1994.
    • [15] M. Littman. Friend-or-foe Q-learning in general-sum games. In Machine Learning - International Workshop then Conference, pages 322-328, 2001.
    • [16] B. Marthi. Automatic shaping and decomposition of reward functions. In Proceedings of the 24th International Conference on Machine learning, page 608. ACM, 2007.
    • [17] M. Matari´c. Reinforcement learning in the multi-robot domain. Autonomous Robots, 4(1):73-83, 1997.
    • [18] M. Mihaylov, K. Tuyls, and A. Now´e. Decentralized Learning in Wireless Sensor Networks. Adaptive and Learning Agents, pages 60-73, 2009.
    • [19] J. Nash. Non-cooperative games. Annals of mathematics, 54(2):286-295, 1951.
    • [20] A. Y. Ng, D. Harada, and S. J. Russell. Policy invariance under reward transformations: Theory and application to reward shaping. In Proceedings of the 16th International Conference on Machine Learning, pages 278-287, 1999.
    • [21] J. Peters, S. Vijayakumar, and S. Schaal. Reinforcement learning for humanoid robotics. In Proceedings of Humanoids2003, Third IEEE-RAS International Conference on Humanoid Robots, 2003.
    • [22] M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York, NY, USA, 1994.
    • [23] J. Randløv and P. Alstrom. Learning to drive a bicycle using reinforcement learning and shaping. In Proceedings of the 15th International Conference on Machine Learning, pages 463-471, 1998.
    • [24] Y. Shoham, R. Powers, and T. Grenager. If multi-agent learning is the answer, what is the question? Artificial Intelligence, 171(7):365-377, 2007.
    • [25] P. Stone and M. Veloso. Team-partitioned, opaque-transition reinforcement learning. In Proceedings of the third annual conference on Autonomous Agents, pages 206-212. ACM, 1999.
    • [26] R. Sutton. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding. Advances in Neural Information Processing Systems, pages 1038-1044, 1996.
    • [27] R. S. Sutton. Temporal credit assignment in reinforcement learning. PhD thesis, Department of Computer Science, University of Massachusetts, Amherst, 1984.
    • [28] R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.
    • [29] M. Tan. Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents. In Proceedings of the Tenth International Conference on Machine Learning, volume 337, 1993.
    • [30] K. Tumer and N. Khani. Learning from actions not taken in multiagent systems. Advances in Complex Systems (ACS), 12(04):455-473, 2009.
    • [31] X. Wang and T. Sandholm. Reinforcement learning to play an optimal Nash equilibrium in team Markov games. Advances in neural information processing systems, pages 1603-1610, 2003.
    • [32] M. Wellman and J. Hu. Conjectural equilibrium in multiagent learning. Machine Learning, 33(2):179-200, 1998.
    • [33] E. Wiewiora. Potential-based shaping and Q-value initialization are equivalent. Journal of Artificial Intelligence Research, 19(1):205-208, 2003.
    • [34] D. Wolpert and K. Tumer. An introduction to collective intelligence. Technical Report cs.LG/9908014, NASA Ames Research Center, 1999.
    • [35] M. Wooldridge. An Introduction to MultiAgent Systems. John Wiley and Sons, 2002.
  • No related research data.
  • No similar publications.

Share - Bookmark

Cite this article