One challenge to making online education available worldwide is evaluating an immense volume of student work. Especially difficult is evaluating interactive computer programming assignments such as coding a game. A deep learning system automated the process by finding mistakes in completed assignments.
What’s new: Evan Zheran Liu and colleagues at Stanford proposed DreamGrader, a system that integrates reinforcement and supervised learning to identify errors (undesirable behaviors) in interactive computer programs and provide detailed information about where the problems lie.
Key insight: A reinforcement learning model can play a game, randomly at first, and — if it receives the proper rewards — learn to take actions that bring about an error. A classifier can learn to recognize that the error occurred, randomly at first, and reward the RL model when it triggers the error. In this scheme, training requires a small number of student submissions that have been labeled with a particular error that is known to occur. The two models learn in an alternating fashion: The RL model plays for a while and does or doesn’t bring about the error; the classifier classifies the RL model’s actions (that is, it applies the model’s label to actions that trigger the error and, if so, dispenses a reward), then the RL model plays more, and so on. By repeating this cycle, the classifier learns to recognize an error reliably.
How it works: DreamGrader was trained on a subset of 3,500 anonymized student responses to an assignment from the online educational platform Code.org. Students were asked to code Bounce, a game in which a single player moves a paddle along a horizontal axis to send a ball into a goal. The authors identified eight possible errors (such as the ball bouncing out of the goal after entering and no new ball being launched after a goal was scored) and labeled the examples accordingly. The system comprised two components for each type of error: (i) a player that played the game (a double dueling deep Q-network) and (ii) a classifier (an LSTM and vanilla neural network) that decided whether the error occurred.
- The player played the game for 100 steps, each comprising a video frame and associated paddle motion, or until the score exceeded 30. The model moved the paddle based on the gameplay’s “trajectory”: (i) current x and y coordinates of the paddle and ball, (ii) x and y velocities of the ball, and (iii) previous paddle movements, coordinates, ball velocities, and rewards.
- The player received a reward for bringing about an error, and it was trained to maximize its reward. To compute rewards, the system calculated the difference between the classification (error or no error) of the trajectory at the current and previous steps. In this way, the player received a reward only at the step in which the error occurred.
- The feedback classifier learned in a supervised manner.
- The authors repeated this process many times for each program to cover a wide variety of gameplay situations.
- At inference, DreamGrader ran each player-and-classifier pair on a program and output a list of errors it found.
Results: The authors evaluated DreamGrader on a test set of Code.org student submissions. For comparison, they modified the previous Play to Grade, which had been designed to identify error-free submissions, to predict the presence of a specific error. DreamGrader achieved 94.3 percent accuracy — 1.5 percent short of human-level performance — while Play to Grade achieved 75.5 percent accuracy. It evaluated student submissions in around 1 second each, 180 times faster than human-level performance.
Yes, but: DreamGrader finds only known errors. It can’t catch bugs that instructors haven’t already seen.
Why it matters: Each student submission can be considered a different, related task. The approach known as meta-RL aims to train an agent that can learn new tasks based on experience with related tasks. Connecting these two ideas, the authors trained their model following the learning techniques expressed in the meta-RL algorithm DREAM. Sometimes it’s not about reinventing the wheel, but reframing the problem as one we already know how to solve.
We’re thinking: Teaching people how to code empowers them to lead more fulfilling lives in the digital age, just as teaching them to read has opened doors to wisdom and skill since the invention of the printing press. Accomplishing this on a global scale requires automated systems for education (like Coursera!). It’s great to see AI research that could make these systems more effective.