One Shot Deviation Principle

Posted on April 15, 2016
Tags: game theory

The content here is reproduced from Fudenberg and Tirole’s 1991 text.

In infinite horizon games, the one-shot deviation principle is a useful tool to check whether a strategy profile $\sigma$ is subgame perfect. Informally, it states that we only need to check deviations $(\sigma_i', \sigma_{-i})$ where $\sigma_{i}'$ differs from $\sigma_i$ only at a single history.

We begin with the theorem for finite horizon games.

Theorem 1 (One-Stage-Deviation Principle for Finite Horizon Games). In a finite multi-stage agme with observed actions, strategy profile $s$ is subgame perfect if and only if it satisfies the one-stage-deviation condition:

there is no player $i$ and a history $h^t$ such that player $i$ has a profitable deviation $s_i'$ where $s_i = s_i'$ everywhere except at $h^t$

Proof. The “only if” condition is immediate from the definition of subgame perfection. For sufficiency, suppose that a strategy profile $s$ satisfies the one-stage-deviation condiiton, but is not subgame perfect. Therefore, there is some history $h^t$ and player $i$ such that

\begin{equation} \label{eq:1} u_i(s_i', s_{-i}; h^t) > u_i(s; h^t) \quad \text{for some strategy $s_i'$.} \end{equation}

Let $T$ be the largest $\tau$ for the set

\begin{equation} \label{eq:2} D(\tau) := \{ h^{\tau} : s_i'(h^{\tau}) \neq s_i(h^{\tau}) \} \end{equation}

is non-empty (i.e. $T$ is the last stage at where $s_i'$ and $s_i$ differs). Since $s_i = s_i'$ for all $\tau \geq T+1$, the one-stage-deviation condition implies that

\begin{equation} \label{eq:3} u_i(s; h^T) \geq u_i(s_i', s_{-i}; h^T). \end{equation}

Now define $s_i''$ to be the strategy that is equal to $s_i'$ in stages $t \leq \tau \leq T-1$ and equals $s_i$ at stage $T$. By \eqref{eq:1} and \eqref{eq:3}

\begin{equation} \label{eq:4} u_i(s_i'', s_{-i}; h^t) \geq u_i(s_i', s_{-i}; h^t) > u_i(s; h^t). \end{equation}

But $s_i''$ differs from $s_i$ only up to $T-1$. Repeating the process, we obtain $s_i'''$ satisfying $u_i(s_i''',s_{-i};h^t)>u_i(s;h^t)$ that differs from $s_i$ only up to $T-2$. Thus, this process eventually yields a strategy $S_i$ with

\begin{equation} \label{eq:5} u_i(S_i, s_{-i}; h^t) > u_i(s_i, s_{-i};h^t) \end{equation}

that only differs from $s_i$ at time $t$. This contradicts the one-stage-deviation principle, and complets the proof. $\square$

The insight of Theorem 1 carries over for infinite horizon games: it shows that if $s_i'$ is a profitable deviation over $s_i$, it must differ from $s_i$ infinitely often. That is, for every $T$ there exist $\tau \geq T$ and a history $h^{\tau}$ such that $s_i'(h^{\tau}) \neq s_i(h^{\tau})$. This is made precise below.

Definition. A multi-stage game with observed actions is continuous at infinity if for each player $i$

\begin{equation} \label{eq:6} \sup_{h, \eta} |u_i(h) - u_i(\eta)| \to 0 \quad \text{as $t \to \infty$} \end{equation}

where the supremum is taken over all histories $h$ and $\eta$ where $h^t = \eta^t$, and $h^t$ is the restriction of the infinite history $h$ to the first $t$ periods.

In English, this says that for two histories that agree up to the first $t$ periods, the difference in the utilities for those two histories becomes arbitrarily small as $t \to \infty$.

Remark. A multi-stage game in which payoffs in each stage are uniformly bounded and players discount at $\delta < 1$ is continuous at infinity.

Theorem 2 (One-Stage Deviation Principle for Infinite-Horizon Games). In an infinite-horizon multi-stage game with observed actions that is continuous at infinity, a profile $s$ is subgame perfect if and only if it satisfies the one-stage-deviation condition in Theorem 1.

Proof. Suppose that $s$ satisfies the one-stage-deviation condition but is not subgame perfect. Then there exists a history $h^t$ and a strategy $s_i'$ for which

\begin{equation} \label{eq:7} \epsilon := u_i(s_i', s_{-i}; h^t) - u_i(s; h^t) > 0. \end{equation}

By continuity at infinity, there exists $T$ large enough such that

\begin{equation} \label{eq:8} |u_i(s_i', s_{-i}; h^t) - u_i(S_i, s_{-i}; h^t)| < \epsilon/2 \end{equation}

where $S_i$ is strategy which agrees with $s_i'$ up to period $T$ and follows strategy $s_i$ thereafter. Thus, $S_i$ is a profitable deviation for player $i$ at $h^t$ that differs from $s_i$ in only finitely many periods. This contradicts Theorem 1. $\square$