Our experiments illustrate that proper use of the reset policy can greatly reduce the number of manual resets required to learn a task, can reduce the number of unsafe actions that lead to non-reversible states, and can automatically induce a curriculum.Comment: [Video] Nina Grgić-Hlača, Elissa Redmiles, Krishna P. Human perceptions of fairness in algorithmic decision making: A case study of criminal risk prediction. Abstract: With wide-spread usage of machine learning methods in numerous domains involving human subjects, several studies have raised questions about the potential for unfairness towards certain individuals or groups.You have a 33-year-old woman, healthy, active, in great shape, eats well, doesn't drink, smokes a few cigarettes a month.'We weren't getting answers from medical examiners and at that point we thought of everything possible.You leave it up to 12 good people that have to make a very difficult decision.Defense attorneys convinced the court that Mrs Kaufman died of an undiagnosed heart ailment and a tragic fall, telling jurors: 'This American tragedy began on the morning of November 7, 2007, and it led to a flawed, bungled inept prosecution of an innocent man.'Dr Bruce Hyma, Miami-Dade’s Chief Medical Examiner determined the case homicide by mechanical asphyxia.Adam Kaufman, a real estate developer from Miami, was found not guilty of second-degree murder after he was accused of strangling his wife, Eleonora 'Lina' Kaufman, bruising her neck and bursting blood vessels in her eyes in the bathroom of their home.
In this paper, we build on recent work advocating a distributional approach to reinforcement learning in which the distribution over returns is modeled explicitly instead of only estimating the mean.
In this work, we propose an autonomous method for safe and efficient reinforcement learning that simultaneously learns a forward and reset policy, with the reset policy resetting the environment for a subsequent attempt.