Game over¶
Topic: desicion making, Monte Carlo tree search, reinforcement learning
- Task A:
Implement a method for exploring the decision tree.
Play a game of tic-tac-toe against the computer. It is such a simple game that a working AI should never misplay.
- Task B:
Play some other game by changing which module is loaded as game. You can choose from 4-in-a-row, Reversi/Othello and Ultimate tic-tac-toe.
Study how changing the AI thinking time affects performance. At what point does the AI beat you? (This will depend on game complexity.)
Try having an AI vs. AI match.
- Template:
game_ai.py¶
- class game_ai.AI(player, thinking_time=5000)[source]¶
A Monte Carlo tree search AI.
The AI chooses its actions using a MC tree built out of MC_node objects.
- Parameters:
player (int) – the player number for this AI
thinking_time (int) – the number of iterations the AI is allowed when building the MC tree
- class game_ai.MC_node(player, state, layer, play, explore=1.0)[source]¶
A node in the Monte Carlo tree.
Each node records the win and play statistics and links to the nodes that come before and after it in the tree.
- Parameters:
player (int) – player number for the player whose move lead to this node
state – current state of the game
layer (int) – counts how deep in the tree this node is
play – the play action (“move”) that leads to this node
explore (float) – for large values the algorithm is more eager to try non-optimal plays
- backpropagate(winner)[source]¶
Recursively records play statistics according to who won the simulated game.
The number of plays from this node is increased by one regardless of the outcome of the game.
If the player whose turn it is at this node won the game, the win count is also increased by one. If the game ended in a draw, the win count is increased by one half.
Once the stats have been recorded for one node, the algorithm moves on to the parent node. The process is repeated until the root node (the node with no parent) is reached.
- Parameters:
winner (int) – number of the winning player or 0 if the result was a draw
- bias_factor()[source]¶
Calculates the factor for choosing the next node for a node where all the following nodes have at least one recorded play. The higher this factor is, the more likely it is that this node is chosen.
The factor is calculated as
\[\frac{w}{n} + c \sqrt{ \frac{1}{n} \ln(n_\text{parent}) }\]where
\(w\) = the recorded wins in all the playouts from this node
\(n\) = the number of recorded playouts from this node
\(c\) = the exploration factor
\(n_\text{parent}\) = the number of recorded playouts from the previous node
The first term \(w / n\) is the win ratio for the playouts from this node. If this node is likely to lead to victory, this ratio is close to 1 and gets chosen often.
The second term is large, if there are only a few plays from this node and many plays from the other nodes that branch off the same parent node (the competing play options). If this factor was not included, any node whose initial play lead to a loss would never be considered again by the algorithm. Since this may happen simply due to bad luck, it is important to give these nodes some additional plays.
- Returns:
the bias factor for this node
- Return type:
float
- create_branches()[source]¶
Creates new MC tree nodes corresponding to all the play options available at the game state represented by this node. The new nodes are created as
MC_nodeobjects and saved in the list self.branches of this node.
- explore_branches()[source]¶
Travels through the MC tree, starting from this node, looking for a leaf node.
At each node, the algorithm checks if the current node is a leaf node using
MC_node.is_leaf(). If it is, the current node is returned.If the currently examined node is not a leaf, the algorithm calculates bias factors for all the nodes following the current node and then picks the node with the highest factor usin
MC_node.pick_played_branch(). This node then becomes the current node and the process is repeated.Eventually, the process will end up at a leaf node At that point, the exploration stops and the found leaf is returned.
Note
This function is incomplete!
- Returns:
the found leaf node
- Return type:
- has_branches()[source]¶
Checks if more gameplay branches follow from this node.
- Returns:
True if more nodes follow this node in the MC tree.
- Return type:
bool
- is_leaf()[source]¶
Checks if this is a leaf node.
A node is said to be a leaf node if
the game ends at this node (there are no following nodes) or
at least one of the following nodes has 0 recorded plays
I.e., a leaf is a node at the outer edge of the tree.
If this node is not terminal but the following nodes have not yet been created, the function calls
MC_node.create_branches()to create nodes for all possible plays.- Returns:
True if this is a leaf node.
- Return type:
bool
- pick_best_play()[source]¶
Chooses the play action (i.e., move) recorded in the node that
follows this node and
has the most recorded plays
- Returns:
the most tried out play
- Return type:
object
- pick_played_branch()[source]¶
Chooses the node that
follows this node in the MC tree and
has the highest play factor as given by
MC_node.bias_factor().
- Returns:
the chosen node
- Return type:
- pick_unplayed_branch()[source]¶
Randomly chooses one node out of the nodes that
follow this node in the MC tree and
have no recorded plays
- Returns:
the chosen node
- Return type:
- simulate()[source]¶
Simulates the outcome of the game starting from the current node.
The algorithm alternates between players making each player randomly choose one of their available play options until the game ends.
- Returns:
the player number of the winner or 0 if the game ended in a draw
- Return type:
int
tictac.py¶
- tictac.all_plays(grid, player=None)[source]¶
Lists the coordinates of all empty slots.
- Parameters:
grid (array) – game state
player (int) – player whose turn it is
- Returns:
list of possible play actions
- Return type:
list
- tictac.all_reasonable_plays(grid, player=None)[source]¶
Same as all_plays().
- Parameters:
grid (array) – game state
player (int) – player whose turn it is
- Returns:
list of possible play actions
- Return type:
list
- tictac.all_reasonable_plays_alternative(grid, player=None)[source]¶
Lists the coordinates of all empty slots that have an occupied neighbor.
- Parameters:
grid (array) – game state
player (int) – player whose turn it is
- Returns:
list of possible play actions
- Return type:
list
- tictac.ask_for_move(player, grid)[source]¶
Asks a human player for a play action (move).
This is done by visualizing the game and then asking for the index of the column and row where the player wants to play.
- Parameters:
player (int) – number of the player whose turn it is
grid (array) – the play area (current game state)
- Returns:
the coordinates of the chosen slot as tuple (row, column)
- Return type:
int, int
- tictac.can_continue(grid)[source]¶
Checks if there are available spaces in the play area.
Does not check if the game is done due to someone winning.
- Parameters:
grid (array) – game state
- Returns:
True if there are empty spaces, False if not.
- Return type:
boolean
- tictac.can_take_slot(i, j, grid)[source]¶
Checks if one can play in the given slot.
- Parameters:
i (int) – the row to check
j (int) – the column to check
grid (array) – the play area (current game state)
- Returns:
True if there are empty slots in column i, False otherwise
- Return type:
boolean
- tictac.check_winner(grid)[source]¶
Checks if someone has won the game.
- Parameters:
grid (array) – game state
- Returns:
the number of the winning player if there is one, 0 otherwise
- Return type:
int
- tictac.copy_game(grid)[source]¶
Returns a copy of the given game state.
- Parameters:
grid (array) – the play area (current game state)
- Returns:
a copy of grid
- Return type:
array
- tictac.count_consecutives(i, j, grid)[source]¶
Starting from the slot at grid[i,j], counts the highest number of consecutive squares with tokens of the same player.
If the starting position has no token, returns 0.
For ay given square, there are 8 possible directions for forming a line: up, down, left, rigth and the 4 diagonals. This function only checks half of them: down, right, down-right and down-left.
- Parameters:
i (int) – vertical position (y coordinate) of the initial slot
j (int) – horizontal position (x coordinate) of the initial slot
grid (array) – the play area (current game state)
- Returns:
the highest number of similar consecutive tokens
- Return type:
int
- tictac.declare_winner(winner)[source]¶
Celebrates the victory of a player or declares a draw.
- Parameters:
winner (int) – number of the winning player or 0 for a draw
- tictac.draw(grid)[source]¶
Draw the play area with text graphics.
- Parameters:
grid (array) – the play area (current game state)
- tictac.make_move(play, player, grid)[source]¶
Let a player have a turn.
The function modifies the given game state (grid) to match the situation after the move has been carried out.
- Parameters:
play (array) – the play action (move) to take
player (int) – number of the player who makes the move
grid (array) – the play area (current game state)
- tictac.new_game()[source]¶
Creates an empty play area.
- Returns:
the empty play area (the initial game state)
- Return type:
array
n_in_a_row.py¶
- n_in_a_row.all_plays(grid, player=None)[source]¶
Lists the indices of all columns that have empty slots.
- Parameters:
grid (array) – game state
player (int) – player whose turn it is
- Returns:
list of possible play actions
- Return type:
list
- n_in_a_row.all_reasonable_plays(grid, player=None)[source]¶
Same as all_plays().
- Parameters:
grid (array) – game state
player (int) – player whose turn it is
- Returns:
list of possible play actions
- Return type:
list
- n_in_a_row.ask_for_move(player, grid)[source]¶
Asks a human player for a play action (move).
This is done by visualizing the game and then asking for the index of the column and row where the player wants to play.
- Parameters:
player (int) – number of the player whose turn it is
grid (array) – the play area (current game state)
- Returns:
the coordinates of the chosen slot as tuple (row, column)
- Return type:
int, int
- n_in_a_row.can_continue(grid)[source]¶
Checks if there are available spaces in the play area.
Does not check if the game is done due to someone winning.
- Parameters:
grid (array) – game state
- Returns:
True if there are empty spaces, False if not.
- Return type:
boolean
- n_in_a_row.can_take_slot(i, grid)[source]¶
Checks if one can play in the given slot.
- Parameters:
i (int) – the row to check
j (int) – the column to check
grid (array) – the play area (current game state)
- Returns:
True if there are empty slots in column i, False otherwise
- Return type:
boolean
- n_in_a_row.check_winner(grid)[source]¶
Checks if someone has won the game.
- Parameters:
grid (array) – game state
- Returns:
the number of the winning player if there is one, 0 otherwise
- Return type:
int
- n_in_a_row.copy_game(grid)[source]¶
Returns a copy of the given game state.
- Parameters:
grid (array) – the play area (current game state)
- Returns:
a copy of grid
- Return type:
array
- n_in_a_row.count_consecutives(i, j, grid)[source]¶
Starting from the slot at grid[i,j], counts the highest number of consecutive squares with tokens of the same player.
If the starting position has no token, returns 0.
For ay given square, there are 8 possible directions for forming a line: up, down, left, rigth and the 4 diagonals. This function only checks half of them: down, right, down-right and down-left.
- Parameters:
i (int) – vertical position (y coordinate) of the initial slot
j (int) – horizontal position (x coordinate) of the initial slot
grid (array) – the play area (current game state)
- Returns:
the highest number of similar consecutive tokens
- Return type:
int
- n_in_a_row.declare_winner(winner)[source]¶
Celebrates the victory of a player or declares a draw.
- Parameters:
winner (int) – number of the winning player or 0 for a draw
- n_in_a_row.draw(grid)[source]¶
Draw the play area with text graphics.
- Parameters:
grid (array) – the play area (current game state)
- n_in_a_row.make_move(play, player, grid)[source]¶
Let a player have a turn.
The function modifies the given game state (grid) to match the situation after the move has been carried out.
- Parameters:
play (array) – the play action (move) to take
player (int) – number of the player who makes the move
grid (array) – the play area (current game state)
- n_in_a_row.new_game()[source]¶
Creates an empty play area.
- Returns:
the empty play area (the initial game state)
- Return type:
array
othello.py¶
- othello.all_reasonable_plays(state, player)[source]¶
Lists all valid moves for the current player. In Othello all valid moves are reasonable.
If the current player has no valid moves, returns [PASS_MOVE] so the game loop can still make a move (which does nothing).
- Parameters:
state (list) – [grid, player]
- Returns:
(row, col) moves, or [PASS_MOVE] if none exist
- Return type:
list
- othello.ask_for_move(player, state)[source]¶
Asks a human player for a move. If the player has no valid moves, automatically returns PASS_MOVE.
- Parameters:
player (int) – the player whose turn it is
state (list) – [grid, player]
- Returns:
(row, col), or PASS_MOVE
- Return type:
int, int
- othello.can_continue(state)[source]¶
Checks if the game can continue.
The game ends when BOTH players have no valid moves. A single player having no moves does not end the game since they can pass their turn.
- Parameters:
state (list) – [grid, player]
- Returns:
True if at least one player can still move, False otherwise
- Return type:
boolean
- othello.check_winner(state)[source]¶
Determines the winner when the game is over. Returns 0 if the game is still in progress.
The game is over when neither player has valid moves. The player with the most tokens then wins.
- Parameters:
state (list) – [grid, player]
- Returns:
the winning player number, or 0 for a draw or unfinished game
- Return type:
int
- othello.copy_game(state)[source]¶
Returns a deep copy of the game state.
- Parameters:
state (list) – [grid, player]
- Returns:
a copy of state
- Return type:
list
- othello.declare_winner(winner)[source]¶
Announces the winner or a draw.
- Parameters:
winner (int) – winning player number, or 0 for a draw
- othello.draw(state)[source]¶
Draw the board with text graphics, showing token counts and valid moves for the current player.
- Parameters:
state (list) – [grid, player]
- othello.make_move(play, player, state)[source]¶
Let a player place a token and flip the appropriate opponent tokens. Modifies state in place.
Also updates state[1] to the next player so that state-only functions always know whose turn it is.
If play is PASS_MOVE, the board is unchanged and the turn passes.
- Parameters:
play (array) – (row, col), or PASS_MOVE
player (int) – number of the player who makes the move
state (list) – [grid, player]
- othello.new_game()[source]¶
Creates the standard Othello starting position. The four central squares are pre-filled.
The state is a list [grid, player] where player is the current player. This allows state-only functions to always know whose turn it is.
- Returns:
[grid, player] as the initial game state
- Return type:
list
ultimate.py¶
- ultimate.all_plays(state)[source]¶
Lists all valid moves for the current player.
If next_board points to a playable local board, moves are restricted to that board. Otherwise the player may play in any cell of any unfinished local board.
- Parameters:
state (list) – [grid, next_board]
- Returns:
valid (row, col) positions in the full 9x9 grid
- Return type:
list
- ultimate.all_reasonable_plays(state, player=None)[source]¶
Lists all valid moves. In Ultimate TTT all valid moves are reasonable.
- Parameters:
state (list) – [grid, next_board]
player (int) – ignored (included for compatibility with play_game)
- Returns:
valid (row, col) positions
- Return type:
list
- ultimate.ask_for_move(player, state)[source]¶
Asks a human player for a move.
Displays the board and valid moves, then reads row and column.
- Parameters:
player (int) – the player whose turn it is
state (list) – [grid, next_board]
- Returns:
(row, col) in the full 9x9 grid
- Return type:
int, int
- ultimate.can_continue(state)[source]¶
Checks if the game can continue.
The game ends if someone has won the meta-board or if there are no valid moves left.
- Parameters:
state (list) – [grid, next_board]
- Returns:
True if the game is still going, False otherwise
- Return type:
boolean
- ultimate.check_winner(state)[source]¶
Checks if someone has won the game.
Returns 0 if the game is still in progress or ends in a draw.
- Parameters:
state (list) – [grid, next_board]
- Returns:
winning player number, or 0
- Return type:
int
- ultimate.copy_game(state)[source]¶
Returns a deep copy of the game state.
- Parameters:
state (list) – [grid, next_board]
- Returns:
a copy of state
- Return type:
list
- ultimate.declare_winner(winner)[source]¶
Announces the winner or a draw.
- Parameters:
winner (int) – winning player number, or 0 for a draw
- ultimate.draw(state)[source]¶
Draws the full Ultimate Tic-Tac-Toe board with text graphics.
Each local 3x3 board is drawn with its own grid lines. Local boards are separated by empty space. Active boards are highlighted with * instead of | and - separators. Row and column numbers are shown for easy move entry.
- Parameters:
state (list) – [grid, next_board]
- ultimate.make_move(play, player, state)[source]¶
Places a token and updates which local board is active next.
The local board for the next move is determined by the cell (r,c) within the local board where the current move was made. If that local board is already finished, next_board is set to None (free choice).
- Parameters:
play (tuple) – (row, col) in the full 9x9 grid
player (int) – the player making the move
state (list) – [grid, next_board] - modified in place
- ultimate.new_game()[source]¶
Creates an empty Ultimate Tic-Tac-Toe board.
- State is [grid, next_board] where:
grid is a 9x9 numpy array of player tokens (0 = empty) next_board is a (row, col) tuple indicating which local board the current player must play in, or None if anywhere is allowed.
Player 1 starts and may play anywhere on the first move.
- Returns:
the initial game state [grid, next_board]
- Return type:
list