Authored publications
Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution
Reinforcement Learning algorithms require a large number of samples to solve complex tasks with sparse and delayed rewards. Complex tasks can often be hierarchically decomposed into sub-tasks. A step in the Q-function can be associated with solving a sub-task, where the expectation of the return increases. RUDDER has been introduced to identify the...
Vihang P. Patil, Markus Hofmarcher, Marius-Constantin Dinu, Matthias Dorfer, Patrick M. Blies, Johannes Brandstetter, Jose A. Arjona-Medina, Sepp Hochreiter
| Machine Learning, arXiv:2009.14108 | 2022
Keep exploring ...
... and find out even more about engineering at Dynatrace.