My name is Yuchen Lu. I am currently a P.hD. candidate at the Mila lab of University of Montreal, supervised by Prof. Aaron Courville. Before that I received my undergraduate degree at UIUC working with Prof. Jian Peng. I was also an undergrad at Shanghai Jiao Tong University.
My fundamental research interest is language learning as systematic generalization. Humans are able to generate unseen novel utterances from a limited sample of data, while current machine learning approaches fall short on. Building an intelligent agent that is able to acquire the language as efficient as humans is the important next step as we are seeing the marginal effect of increasing model size of language models. I believe there are two main missing pieces of puzzles:
Humans learn the language in an embodied environment, and humans acquire language as a tool to influence the world around them. We should model situated language learning beyond learning from a static corpus.
Language evolves and adapts to an iterated transmission process so that it becomes structured and easy-to-acquire for the later generations. We should model this cultural evolution aspect of language in our language learning.
I enjoying seeing the impact of my research. Recently, our research team, parternered with WebDip successfully developed an AI player for the board game Diplomacy, and it’s covered in one of the most popular podcast channels in the community.
I also co-founded Tuninsight, an award-winning Montreal-based start-up.
The email is luyuchen [DOT] paul [AT] gmail [DOT] com.
BSc in CS, 2015-2017
University of Illinois, Urbana-Champaign
BSc in ECE, 2013-2015
Shanghai Jiaotong University
[05/17/2021] I will join Facebook as a research intern this summer on the topic of muli-modal pretraining
[01/12/2020] Our paper on Iterated learning for emergent systematicity in VQA is accepted at ICLR2021 (Oral)
[01/12/2020] New paper on Unsupervised Task Decomposition is accepted at ICLR2021
Although neural module networks have an architectural bias towards compositionality, they require gold standard layouts to generalize systematically in practice. When instead learning layouts and modules jointly, compositionality does not arise automatically and an explicit pressure is necessary for the emergence of layouts exhibiting the right structure. We propose to address this problem using iterated learning, a cognitive science theory of the emergence of compositional languages in nature that has primarily been applied to simple referential games in machine learning. Considering the layouts of module networks as samples from an emergent language, we use iterated learning to encourage the development of structure within this language. We show that the resulting layouts support systematic generalization in neural agents solving the more complex task of visual question-answering. Our regularized iterated learning method can outperform baselines without iterated learning on SHAPES-SyGeT (SHAPES Systematic Generalization Test), a new split of the SHAPES dataset we introduce to evaluate systematic generalization, and on CLOSURE, an extension of CLEVR also designed to test systematic generalization. We demonstrate superior performance in recovering ground-truth compositional program structure with limited supervision on both SHAPES-SyGeT and CLEVR.
Many complex real-world tasks are composed of several levels of sub-tasks. Humans leverage these hierarchical structures to accelerate the learning process and achieve better generalization. In this work, we study the inductive bias and propose Ordered Memory Policy Network (OMPN) to discover subtask hierarchy by learning from demonstration. The discovered subtask hierarchy could be used to perform task decomposition, recovering the subtask boundaries in an unstructured demonstration. Experiments on Craft and Dial demonstrate that our model can achieve higher task decomposition performance under both unsupervised and weakly supervised settings, comparing with strong baselines. OMPN can also be directly applied to partially observable environments and still achieve higher task decomposition performance. Our visualization further confirms that the subtask hierarchy can emerge in our model.
Supervised learning methods excel at capturing statistical properties of language when trained over large text corpora. Yet, these models often produce inconsistent outputs in goal-oriented language settings as they are not trained to completethe underlying task. Moreover, as soon as theagents are finetuned to maximize task completion, they suffer from the so-called language drift phenomenon: they slowly lose syntactic and semantic properties of language as they only focus on solving the task. In this paper, we proposea generic approach to counter language drift by using iterated learning. We iterate between finetuning agents with interactive training steps, and periodically replacing them with new agents that are seeded from last iteration and trained to imitate the latest finetuned models. Iterated learning does not require external syntactic constraint nor semantic knowledge, making it a valuable task-agnostic finetuning protocol. We first explore iterated learning in the Lewis Game. We then scale-up the approach in the translation game. In both settings, our results show that iterated learning drastically counters language drift as well as it improves the task completion metric.
Diplomacy is a seven-player non-stochastic, non-cooperative game, where agents acquire resources through a mix of teamwork and betrayal. Reliance on trust and coordination makes Diplomacy the first non-cooperative multi-agent benchmark for complex sequential social dilemmas in a rich environment. In this work, we focus on training an agent that learns to play the No Press version of Diplomacy where there is no dedicated communication channel between players. We present DipNet, a neural-network-based policy model for No Press Diplomacy. The model was trained on a new dataset of more than 150,000 human games. Our model is trained by supervised learning (SL) from expert trajectories, which is then used to initialize a reinforcement learning (RL) agent trained through self-play. Both the SL and RL agents demonstrate state-of-the-art No Press performance by beating popular rule-based bots.