Implicit Reinforcement without Interaction at Scale (IRIS)
Different Skills From Different Demos: Implicit reinforcement without interaction at scale, explained
Reinforcement learning trains models by trial and error. In batch reinforcement learning (BRL), models learn by observing many demonstrations by a variety of actors. But what if one doctor is handier with a scalpel while another excels at suturing?