Paper, Tags: #social-sciences, #reinforcement-learning
Humans intuitively and flexibly infer the underlying relationships between people. Towards the goal of building machine-learning algorithms with human-like social intelligence, we develop a generative model of multi-agent action understanding based on a novel representation for these latent relationships called Composable Team Hierarchies (CTH).
This representation is grounded in the formalism of stochastic games and multi-agent RL. We use CTH as a target for bayesian inference yielding a new algorithm for understanding behavior in groups that can both infer hidden relationships as well as predict future actions for multiple agents interacting together.
Our algorithm rapidly recovers an underlying causal model of how agents relate in spatial stochastic games from just a few observations.
Relationshpis are compositional and dynamic, and the info we get from then is sparse and indirect.
There's increasing evidence that this early arising ability to do social evaluation and inference relies on "theory of mind" ToM, a generative model of other agents with mental states of their own.
Inspired by this ability, we aim to develop a new algorithm that applies these human-like inductive biases towards understanding groups of agents in mixed incentive (cooperative/competitive) contexts.
we build up a representation for multi-agent interaction that can be used to compute policies for agents in novel situations, but is also sufficiently constrained that it can be used for rapid inference.
Observers can use CTH to probabilistically infer the various stances that each agent takes towards the others. Agents represent their uncertainty over the CTH for agent i as P(CTHi), their prior beliefs before seeing any behavior. These beliefs can be updated in response to the observation of new behavioral data using Bayes rule.
Since we have posed this IRL problem as probabilistic inference, Markov Chain Monte Carlo (MCMC) algorithms and other sampling approaches might enable the computation of P(CTHijs; ai) even when the number of hypothetical CTH are large.
For instance, when dealing with a large number of agents, people seem to use group membership cues, some of which are directly observable such as style of dress or easily inferable such as language spoken. These cues could rapidly prune the number of CTHs considered but also could lead to biases. Another possible route to scaling these methods is through sophisticated state abstractions such as those in deep multi-agent reinforcement learning where agents are trained for cooperation and competition.