Our experiments illustrate that proper use of the reset policy can greatly reduce the number of manual resets required to learn a task, can reduce the number of unsafe actions that lead to non-reversible states, and can automatically induce a curriculum. Nina Grgić-Hlača, Elissa Redmiles, Krishna P. Human perceptions of fairness in algorithmic decision making: A case study of criminal risk prediction. Abstract: With wide-spread usage of machine learning methods in numerous domains involving human subjects, several studies have raised questions about the potential for unfairness towards certain individuals or groups.

In this paper, we build on recent work advocating a distributional approach to reinforcement learning in which the distribution over returns is modeled explicitly instead of only estimating the mean.

In this work, we propose an autonomous method for safe and efficient reinforcement learning that simultaneously learns a forward and reset policy, with the reset policy resetting the environment for a subsequent attempt.