Game Theory and The Prisoner’s Dilemma
In this post, I’m going to explore and layout a central thought experiment in statistics and economics in the sub-domain of Game Theory: Prisoner’s dilemma.
First, what is the Prisoner’s dilemma???
The prisoner’s dilemma is a classic game first explored by Merrill Flood and Dresher in 1950.
The basic idea is that there are two prisoners who have been detained and had been brought in for interrogation. The prisoners cannot contact each other.
The only options they have are:
· To defect: Betray the other prisoner
· To cooperate: Maintain silence.
The rules are:
· If they Both defect: Imprisonment for both for 4 years.
· If one of them defects:
Defector goes free
The cooperator gets 5 years of imprisonment.
· If they both cooperate:
Both receive 2 years in prison.
The reason this idea becomes very intriguing is the question of what a completely rational person would be advised to proceed and how different strategies would pan out when playing the game.
To determine the best game playing strategy, let’s play the game over multiple rounds and in tournaments.
Lets define a function called create_player_class:
player.play()
: A method that represents a move. It returnsTrue
if the player defects andFalse
if they cooperate.player.history
: an array of the previous moves of the player instance.
For eg,
defector_play(self, opponent):
return TrueDefector = create_player_class(“Defector”, defector_play)
def cooperator_play(self, opponent):
return FalseCooperator = create_player_class(“Cooperator”, cooperator_play)
Let’s define a function payoff that returns a tuple of 2 elements, the first corresponding to the payoff for player 1 and the second to the payoff for player 2.
payoff(Defector(), Cooperator())
this should return (0,5).
Now, let’s pit these players against each other an examine outcomes for each of them over multiple rounds.
def run_match(p1, p2, n=5, winner=True):
p1.reset_history()
p2.reset_history()
p1_years = make_array()
p2_years = make_array()
for i in range(n):
p1_score, p2_score = payoff(p1, p2)
p1_years = np.append(p1_years, p1_score)
p2_years = np.append(p2_years, p2_score)
if winner:
p1_mean = np.mean(p1_years)
p2_mean = np.mean(p2_years)if p1_mean < p2_mean:
return p1
else:
return p2
else:
return p1_years, p2_years
The function run_match will run a match between the players over n rounds and return the winner if winner=True, if it’s false, it returns the history of each of the players in a list.
We also reset the history of the players in the first two lines since we do not want past results to affect the future rounds.
To understand the run_match function better, I’m going to run a match between a Defector and a Cooperator with 10 turns each.
run_match(Defector(), Cooperator(), n=10)
This returns the Defector as a winner as we would expect (The cooperator, as seen from the payoff matrix).
Now that we can run matches between different players, we can experiment with different playing strategies.
Let’s define an Alternator
player that alternates between Defecting and Cooperating.
def alternator_play(self, opponent):
if len(self.history) > 0:
return not self.history.item(-1)
else:
return False
Alternator = create_player_class("Alternator", alternator_play)run_match(Alternator(), Defector())
Let’s define a TitForTat
player that repeats the opponent’s move (and starts by cooperating).
def tit_for_tat_play(self, opponent):
if len(opponent.history) > 0:
return opponent.history.item(-1)
else:
return TrueTitForTat = create_player_class("TitForTat", tit_for_tat_play)
Let’s define a ForgivingTitForTat
player which defects only when the opponent defects to more than 10%. The objective is to lower the defection below 10% (the percentage should be calculated at every turn).
def forgiving_tit_for_tat_play(self, opponent):
if len(self.history) == 0:
return False
return np.count_nonzero(opponent.history)/len(opponent.history)>0.1 and opponent.history.item(-1)=="D"
ForgivingTitForTat = create_player_class("ForgivingTitForTat",forgiving_tit_for_tat_play)
Recreating Axelrod’s tournament, but on a smaller scale.
Let’s define a function to run a tournament between a list of players and also return the player-wise means of the payoff after each randomized tournament.
def run_tournament(players):
p1 = make_array() #This stores an array of all player 1's.
p2 = make_array() #This stores an array of all player 2's.
p1_means = make_array() #This stores an array of each player 1's average jail time for a particular game.
p2_means = make_array() #This stores an array of each player 2's average jail time for a particular game.for i in range(len(players)):
for j in range(i+1, len(players)):
p1_years, p2_years = run_match(players.item(i), players.item(j), n=200, winner=False)
p1 = np.append(p1, players.item(i))
p2 = np.append(p2, players.item(j))
p1_means = np.append(p1_means, np.mean(p1_years))
p2_means = np.append(p2_means, np.mean(p2_years))
results = Table().with_columns(
"p1", p1,
"p1_mean", p1_means,
"p2", p2,
"p2_mean", p2_means
)return result
Various game playing strategies have been described in the table.
all_strategies = make_array(Alternator(), Backstabber(),Bully(), Desperate(),FoolMeOnce(), Forgiver(),ForgivingTitForTat(), Grudger(), OnceBitten(), TitForTat())
tournament_results = run_tournament(all_strategies)
tournament_results
To interpret the results better, we can plot a heatmap using seaborn where player 1 is on the horizontal axis and player-2 is on the vertical axis. The cells have a mean score of player 1.
df1 = tournament_results.to_df()
df2 = tournament_results.to_df()
df2 = df2.rename({"p1": "p2", "p2": "p1", "p1_mean": "p2_mean", "p2_mean": "p1_mean"}, axis=1)
df = df1.append(df2)
df["p1"], df["p2"] = df["p1"].astype(str), df["p2"].astype(str)
df = df.pivot("p1", "p2", "p1_mean")plt.figure(figsize=[15,8])
sns.heatmap(df.T, annot=True, fmt=".3f")
plt.xlabel("Player 1")
plt.ylabel("Player 2")
plt.title("Player 1's Mean Scores by Opponent")
plt.xticks(rotation=90);
Now, lets try to make sense of this;
One of the interesting results is that player 1 would have the same score if he choses bully or desperate and if player 2 chose one of the strategies out of: Grudger, ForgivingTitForTat, FoolMeOnce, Backstabber. The scores here are the number of years in prison (which we’re trying to lower, of-course).
If I were playing a tournament, I would chose FoolMeOnce strategy since it has a relatively low scores for any strategies that player 2 chooses, assuming that the strategy chosen by the opponent is at random.
Comment if you observe other interesting things from the Heatmap.
However, surprisingly, the winning deterministic strategy was TitForTat.
In axelrod’s findings, he studied that the best strategies eventually came up with four traits of the biggest winners: niceness, retaliation, forgiveness, and lack of envy.
This is how the dramatic global crisis at the time made axelrod wonder whether being “good” might or might not be the best strategy.
Wanna learn more about this? there’s a really interesting episode of radiolab which goes in-depth of the origins of axelrod’s tournaments, the prisoner’s dilemma and the winning deterministic strategy-TitForTat!
Enjoyed reading this? Do consider following:))