Our objective is to develop a framework that allows agents to trade-off predictability with progress toward the goal. Our motivation is that by accounting for predictability a group of agents can reduce uncertainty in the environment and implicitly induce decentralized coordination. Additionally, accounting for predictability can serve as a means to ‘robustify’ a prediction model a-posteriori, where agents modify their behavior to match the prediction distribution widening the space of suitable prediction models for a given problem.
Incorporating predictability in this manner parallels the principle of free-energy minimization found in active inference and control systems. Here, agents do not solely pursue reward maximization, they also seek to minimize the discrepancy between their internal prediction models and actual observations. Within multi-agent interactions, each agent maintains probabilistic beliefs about the actions of others, with the accuracy of these beliefs directly influenced by its own behavior. Through the minimization of free energy, an agent can optimize actions that both reduce environmental uncertainty and validate its internal model, while simultaneously working towards reward maximization. This method ensures that agents act in a manner that is not only goal-oriented but also supportive of maintaining coherent, accurate beliefs about the behavior of surrounding agents. Given the interdependence of trajectories in multi-agent interactions, a prediction distribution about surrounding agents is conditional on an a trajectory distribution for the ego agent. If the ego significantly deviates from this distribution, this will comromise predictions for surrounding agents.
Accordingly, we introduce a free-energy term in our derivation and establish a set of transformations that enable its use as an objective function in planning. This formulation provides a structured approach for balancing predictability with progress in multi-agent systems.
Trajectory Planning Formulation
The trajectory planning problem for a single agent is formulated as the following optimization problem:
\[
\min_{\mathbf{u} \in U, \mathbf{x} \in X} \sum_{k=0}^{K-1} J_k(\mathbf{x}_k, \mathbf{u}_k) + J_K(\mathbf{x}_K)
\]
subject to:
\[
\mathbf{x}_0 = \mathbf{x}_{\text{init}},
\]
\[
\mathbf{x}_{k+1} = f(\mathbf{x}_k, \mathbf{u}_k), \quad k = 0, \dots, K-1,
\]
\[
P \left[ C(\mathbf{x}_k, \delta_{o_k}), \forall o \right] \geq 1 - \epsilon_k, \quad \forall k.
\]
where:
- • \( \mathbf{u} = \{\mathbf{u}_0, \dots, \mathbf{u}_K\} \) represents the control inputs,
- • \( \mathbf{x}_k \) denotes the robot states, and
- • \( J_k(\mathbf{x}_k, \mathbf{u}_k) \) is the stage cost function for performance metrics,
- • \( f(\cdot) \) is the state transition dynamics function,
- • \( C(\cdot) \) denotes collision avoidance constraints, and \( \delta_{o_k} \) accounts for the uncertainty in other agents' positions.
This formulation ensures that the probability of collision remains below a specified threshold \( \epsilon_k \) at each timestep through chance-constraints.
Free Energy as a Predictability Surrogate
Drawing inspiration from previous works, we begin by defining the free energy of a trajectory prediction distribution \( \mathcal{F}(S, P, \lambda) \) as follows:
\[
\mathcal{F}(S, P, \lambda) = -\lambda \log\left( \mathbb{E}_{\mathbf{x} \sim P} \left[ \exp\left(-\frac{1}{\lambda} S(\mathbf{x})\right) \right] \right)
\]
where:
- • \( S \) is a state cost function representing the trajectory objective,
- • \( P \) is a trajectory prediction distribution,
- • \( \lambda \) is an inverse temperature parameter.
The free energy function can be interpreted as measuing how effectively a prediction distribution \( P \) minimizes cost.
Under this interpretation, to minimize the free energy an agent would behave so as to push the prediction distribution toward the
optimal trajectory thereby minimizing the expected cost \( \mathbb{E}_{\mathbf{x} \sim P} [S(\mathbf{x})] \) of trajectories sampled from the distribution.
By performin an expectation switch to include the agents plan distribution \( p(\mathbf{x}) \) and ppplying Jensen's inequality, this free energy function can be bounded and simplified as follows:
\[
F(S, P, \lambda) \leq \mathbb{E}_{\mathbf{x} \sim Q}[S(\mathbf{x})] + \lambda \, \text{KL}(q(\mathbf{x}) \| p(\mathbf{x})),
\]
where \( \text{KL} \) denotes the Kullback-Leibler divergence between the agent’s plan distribution \( q(\mathbf{x}) \) and the predicted distribution \( p(\mathbf{x}) \). This formulation provides a cost function with two main components:
- Performance Cost: Encourages goal-oriented behavior.
- Predictability Cost: Penalizes deviations from expected behavior.
The stage cost function for the planning problem then becomes:
\[
J(\tau_{0:K}) = \sum_{k=0}^K J_k(\mathbf{x}_k, \mathbf{u}_k) + \lambda \, \text{KL}(q(\mathbf{x}_k) \| p(\mathbf{x}_k)),
\]
where \( J_k(\mathbf{x}_k) \) represents the performance cost together with some control cost and \( \lambda \) adjusts the weight assigned to predictability.
This cost function serves as an upper bound to the free energy, thus minimizing the cost function also minimizes the free energy.