Reinforcement learning seminar, list of subjects for second presentationArticles are grouped according to subject. There is one article selection per participant, but team work is possible by having the members of the team select articles from the same subject area. The maximal team size is three persons. A team makes a joint presentation, where results and conclusions in the selected articles are compared against each other. Articles listed in "Typical benchmark applications" are mainly suitable for individual work. Direct use of continuous-valued state variables and actionsDoya, Kenji (2000). Reinforcement Learning in Continuous Time and Space. Neural Computation, vol. 12. pp. 219-245. Long article, but good results, explanations and illustrations of learned models, which help in general understanding. Kimura, Hajime, Miyazaki, Kazuteru, Kobayashi, Shigenobu (1997). Reinforcement Learning in POMDPs with Function Approximation. Proc. of 14th Int. Conf. on Machine Learning. pp. 152-160. On a technique called "policy gradient. Uses a "crawling robot" that learns to crawl forward with one arm. Could also be used in "simulated robot environments" subject.
Kimura, Hajime, Kobayashi, Shigenobu (1999).
Efficient Non-Linear Control by Combining Q-learning with Local Linear Controllers.
Proc. of 16th Int. Conf. on Machine Learning. pp. 210-219.
Very good results on the cart-pole task, where the agent learns
to swing the pole up. Best results I have seen on this task. Simulated robot environments
Rummery, G. A., Niranjan, M. (1994).
On-Line Q-Learning Using Connectionist Systems.
Tech. Rep. Technical Report CUED/F-INFENG/TR 166,
Cambridge University Engineering Department. 20 p.
Presentation of the SARSA learning method. Used on interesting
simulated robot navigation task.
Stone, Peter, Sutton, Richard S. (2001).
Scaling Reinforcement Learning toward RoboCup Soccer.
In: Proc. 18th International Conf. on Machine Learning,
Morgan Kaufmann, San Francisco, CA. pp. 537-544.
Robots learning to keep ball in simulated RoboCup soccer.
Sun, Ron, Peterson, Todd (1998).
Autonomous Learning of Sequential Tasks: Experiments and Analyses.
IEEE Trans. on Neural Networks, Vol. 9, No. 6. pp. 1217-1234.
Quite a long article, but the "submarine navigation" task is
interesting. Also good figures of results. Real robot environments
Maes, Pattie, Brooks, Rodney A. (1990). Learning to coordinate behaviors.
In: Proceedings of Eighth National Conference on Artificial Intelligence,
Morgan Kaufmann. pp. 796-802.
Six-legged robot learns to walk.
Mahadevan, Sridhar, Connell, Jonathan (1992).
Automatic Programming of Behavior-based Robots using Reinforcement Learning.
Artificial Intelligence, Vol. 55, Nos. 2-3. pp. 311-365.
This is a long article to read. However, it is quite easy to read, interesting
and educating.
Mataric, Maja J. (1994).
Reward Functions for Accelerated Learning.
In: Cohen, W. W., Hirsch, H. (eds.) Machine Learning:
Proceedings of the Eleventh International Conference. Morgan-Kaufmann, CA.
Multiple robots collecting pucks into a "home" area. This article is
also included in the "reward shaping" subject.
Mataric, Maja J. (1997).
Reinforcement Learning in the Multi-Robot Domain.
Autonomous Robots, Vol. 4, No. 1. pp. 73-83.
"Journal version" of previous. Longer explanations and new insights, but
longer and more theory to read. Relationship between Reinforcement Learning and the Brain
Doya, Kenji (2002).
Metalearning and neuromodulation.
Neural Networks, Vol. 15, Nos 4-6. pp. 495-506.
A lot about four main neuro-transmitters in the brain and how they
could map to Reinforcement Learning parameters.
Kakade, Sham, Dayan, Peter (2002).
Dopamine: generalization and bonuses. Neural Networks, Vol. 15. pp. 549-559.
Theories and results about the relation between TD-learning and Dopamine
levels in the brain, including interest for novel situations and solutions.
Suri, Roland E. (2002).
TD models of reward predictive responses in dopamine neurons.
Neural Networks, Vol. 15. pp. 523-533.
On connection between TD-learning, reward expectation, animal behavior etc. Reward Shaping
Mataric, Maja J. (1994).
Reward Functions for Accelerated Learning.
In: Cohen, W. W., Hirsch, H. (eds.) Machine Learning:
Proceedings of the Eleventh International Conference. Morgan-Kaufmann, CA.
Multiple robots collecting pucks into a "home" area. This article is
also included in the "real robot environments" subject.
Ng, Andrew Y., Harada, Daishi, Russell, Stuart (1999).
Policy invariance under reward transformations: Theory and application to reward shaping.
Proceedings of the Sixteenth International Conference on Machine Learning.
On how to make exploration faster by modifying the reward function so that
the agent is directly guided towards the goal. Typical benchmark applications
Boyan, J. A., Moore, A. W. (1995).
Generalization in Reinforcement Learning: Safely Approximating the Value Function.
Advances in Neural Information Processing Systems 7. pp. 369-376.
About the use of function approximators in several benchmark tasks.
Short article, but might require finding some extra
background information.
Moore, A. W., Atkeson, C. G. (1995).
The Parti-game Algorithm for Variable Resolution Reinforcement
Learning in Multidimensional State-spaces. Machine Learning, Vol. 21. pp. 1-36.
About automatic partitioning of continuous-valued variables.
Many classical benchmark applications are used. Quite a long article, but
quite easy to read and contains many good illustrations that help understanding.
Randløv, J., Alstrøm, P. (1998).
Learning to Drive a Bicycle using Reinforcement Learning and Shaping.
ICML-98. pp. 463-471.
Tesauro, G.J. (1995). Temporal difference learning and TD-Gammon.
Communications of the ACM, Vol. 38, No. 3, 58-68.
Available on-line in HTML format.
On one of the biggest success stories
of Reinforcement Learning. |
This page is maintained by Kary Främling, E-mail: Kary.Framling@hut.fi. Last updated on March 18th, 2004 |