Cooperation and security in multi-agent systems
Innovation involves explorations in which questions are answered to test limits, discover new possibilities and insights. In a scenario in which artificial intelligence is increasingly present in one way or another in our lives, getting ahead of the curve and focusing on high-impact topics can not only give us a competitive advantage, but also discover methods in which we can add value. It is important to align these explorations in a rigorous manner with trainings that attest to the state of the art in the industry: at Izertis we explore technological lines and propose spikes that allow us to stay ahead of the curve. Part of this is the training that we then adapt to our projects. Over the last few months I have had the opportunity to train on one of the most fascinating current challenges in the world of artificial intelligence: the problem of aligning AI and the security of the systems that go with it.
Today we will learn about multi-agent systems in the area of reinforcement learning, and important safety questions in complex environments in so-called mixed-motive games, environments in which competition and cooperation coexist, delving into the dilemma of the tragedy of the commons and simulations with Google Deepmind's MeltingPot framework.
Figure 1. Animation of Deepmind's Meltingpot Framework simulations with the tragedy of the commons dilemma. Simulation trained under a set of agents in which each agent learns independently. In this simulation, agents must harvest apples while ensuring the sustainability of the apple field. If the last apple disappears, the apple field is exhausted. Agents are able to electrocute other agents, allowing punishment as a social norm. This implies dynamics in which competitive and cooperative efforts must be balanced, as agents must harvest as many apples as possible (competition) but also allow the apple field to grow again (cooperation).
Why multi-agent systems? why the tragedy of the commons as a context?
We can use artificial intelligence to study different solutions
Scenarios in which several actors automatically coexist, and the dynamics that arise from them, with the security issues that will arise as a result, constitute a reality that brings with it significant challenges from a security and control point of view. But this is not only true of the language models we see every day, but also in industry in various use cases, such as autonomous vehicles, collaborative robotics or air traffic management. It also has an impact on IoT where multiple autonomous devices and sensors can communicate and collaborate to perform tasks such as environmental monitoring and resource management.
The tragedy of the commons is an economic concept that describes a situation in which individuals, acting independently and rationally according to their own self-interest, deplete a limited shared resource, even when it is clear that this is not in the long-term best interest of the group as a whole. Multi-agent systems have a role to play here in solving coordination, cooperation and implementation of sustainable management strategies. We can therefore talk about management and sustainability at different levels, abstracting from the more purely economic concept. To understand the problem, beyond Hardin's paper and the fascinating book Managing the commons, we are presented with the scenario in which we can use artificial intelligence to study the different solutions we can work with regarding this challenge.
What has the exploration consisted of and what were the results?
A change in policy and environment is proposed
When choosing a technical framework to investigate, we looked at MeltingPot because of its commitment to evaluation in the generalisation of multi-agent systems. Meltingpot proposes an evaluation system in which the trained population undergoes evaluations with other agents during the evaluation phase, to see how the agents respond to certain dynamics that they have not seen before. We differentiate, therefore, between what the actors call the focal population (which has been trained) and the background population (with which the evaluation is carried out, different from the training population). The exploration has been considered along several axes: a change in policies and environment is proposed.
Technical details and results of key experiments
The experiments focused on various configurations and settings within the simulation environment, with the aim of observing how different policies affect the behaviour of the agents. Here are some highlights:
- Resource regeneration rates: regeneration rates were adjusted to observe how agents modify their harvesting strategies in the face of abundance or scarcity. Experiments showed that:
- Lower regeneration rates encouraged agents to adopt more conservative strategies.
- Higher rates encouraged more aggressive competition, increasing the risk of resource depletion.
- Ability to penalise between agents: the ability of agents to impose sanctions on others was manipulated to assess their impact on cooperation and competition.
- The elimination of the ability to punish resulted in an increase, in some cases, in cooperative behaviour.
- The presence of penalties encouraged regulated competition, where agents balanced their behaviour to avoid sanctions.
- Reward signals during training: experimented with modifying reward signals to align or misalign incentives with sustainability goals.
- Agents trained with aligned incentives showed a higher propensity for behaviours that favoured resource regeneration.
- Agents with misaligned incentives tended to behave more selfishly, prioritising short-term resource accumulation over long-term sustainability.
Implications for security and cooperation in AI
These experiments underline the importance of designing multi-agent systems with well-thought-out mechanisms that not only promote the individual efficiency of each agent, but also ensure the effective and sustainable management of shared resources. The implications go beyond theory, offering practical guidelines for policy development in AI systems interacting in shared environments.
Conclusion: moving towards collaborative artificial intelligence
Research in multi-agent systems and reinforcement learning is crucial to move towards an era of artificial intelligence, which is not only capable of performing complex tasks independently, but can also collaborate effectively in shared environments to solve problems on a global scale. This technical exploration provides a solid basis for future innovations in AI, where cooperation and sustainability are just as important as autonomy and efficiency.