Abstract
Randløv præsenterer en af de mest tilgængelige og samtidig teknisk dybtgående introduktioner til reinforcement learning. Ved at lade en agent lære at balancere på en cykel illustrerer hun, hvordan kunstig intelligens kan udvikle sig uden vejledning – alene via belønning og straf. Artiklen er fagligt robust og formidlingsmæssigt i topklasse.
References
Get persistent links for your reference list or bibliography.
Copy and paste the list, we’ll match with our metadata and return the links.
Members may also deposit reference lists here too.
1. Stuart Russell og Peter Norvig: *Artificial Intelligence: A Modern Approach*, Englewood Cliffs, NJ: Prentice Hall, 1995.
2. Richard S. Sutton: *Learning by the Methods of Temporal Differences*, Machine Learning, Kluwer Academic Publishers, vol. 3, pp. 9-44, 1988.
https://doi.org/10.1023/A:1022633531479
3. Richard S. Sutton og Andrew G. Barto: *Introduction to Reinforcement Learning*, MIT Press/Bradford Books, 1998. (Elektronisk udgave: [http://www-anw.cs.umass.edu/~rich/book/the-book.html](http://www-anw.cs.umass.edu/~rich/book/the-book.html))
4. Gerald Tesauro: *Practical Issues in Temporal Difference Learning*, Machine Learning, vol. 8, pp. 257-277, 1992.
https://doi.org/10.1023/A:1022624705476
5. Gerald Tesauro: *TD-Gammon, A Self-Teaching Backgammon Program Achieves Master-Level Play*, IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, 1994. (ftp://archive.cis.ohio-state.edu/pub/neuroprose/tesauro.tdgammon.ps.Z)
Counting from volume 37 (2026 -), articles published are licensed under Creative Commons Attribution-NonCommercial CC BY-NC 4.0.
Articles in volume 1-36 (1990 - 2025) are not licensed under Creative Commons. In these volumes, all rights are reserved to the authors of the articles respectively.
