My name is Yuchen Lu. I am currently a second-year P.hD. student supervised by Prof. Aaron Courville. Before that I received my undergraduate degree at UIUC working with Prof. Jian Peng. I was also an undergrad at Shanghai Jiao Tong University.
The email is luyuchen [DOT] paul [AT] gmail [DOT] com.
BSc in CS, 2015-2017
University of Illinois, Urbana-Champaign
BSc in ECE, 2013-2015
Shanghai Jiaotong University
Supervised learning methods excel at capturing statistical properties of language when trained over large text corpora. Yet, these models often produce inconsistent outputs in goal-oriented language settings as they are not trained to completethe underlying task. Moreover, as soon as theagents are finetuned to maximize task completion, they suffer from the so-called language drift phenomenon: they slowly lose syntactic and semantic properties of language as they only focus on solving the task. In this paper, we proposea generic approach to counter language drift by using iterated learning. We iterate between finetuning agents with interactive training steps, and periodically replacing them with new agents that are seeded from last iteration and trained to imitate the latest finetuned models. Iterated learning does not require external syntactic constraint nor semantic knowledge, making it a valuable task-agnostic finetuning protocol. We first explore iterated learning in the Lewis Game. We then scale-up the approach in the translation game. In both settings, our results show that iterated learning drastically counters language drift as well as it improves the task completion metric.
Diplomacy is a seven-player non-stochastic, non-cooperative game, where agents acquire resources through a mix of teamwork and betrayal. Reliance on trust and coordination makes Diplomacy the first non-cooperative multi-agent benchmark for complex sequential social dilemmas in a rich environment. In this work, we focus on training an agent that learns to play the No Press version of Diplomacy where there is no dedicated communication channel between players. We present DipNet, a neural-network-based policy model for No Press Diplomacy. The model was trained on a new dataset of more than 150,000 human games. Our model is trained by supervised learning (SL) from expert trajectories, which is then used to initialize a reinforcement learning (RL) agent trained through self-play. Both the SL and RL agents demonstrate state-of-the-art No Press performance by beating popular rule-based bots.
A report of an ongoing work of applying ordered neuron to reinforcement learning
Introducing the problem of redundant evaluation data and a game theoretic approach
A tutorial on control as inference in probablistic graphical model