In many 2 agent games (poker, Go, starcraft), it is common to evaluate your agent against a pool of agents. However, which agent in your pool matters the most? and which agent in your pool might be redundant? How to deal with the rock-paper-scissor situation? Is simply averaging model’s performance against all opponents enough? To answer this question, the paper first study the algebraic property of evaluation matrix. Then it proposes Nash Averaging as a way to automatically adapt to the redundancy in evaluation data, so that it can mitigate the bias projected in setting up agent pools.