Hone logo
Hone
Problems

Implement Monte Carlo Tree Search (MCTS) for a Simple Game

This challenge asks you to implement the Monte Carlo Tree Search (MCTS) algorithm in Python. MCTS is a powerful heuristic search algorithm used for decision-making in games and other domains. It balances exploration and exploitation to find optimal moves by simulating games from the current state.

Problem Description

You will implement an MCTS algorithm that can play a simple turn-based game. The game will be defined by a GameState class, which you will also need to design or be provided with. The GameState class should encapsulate the current state of the game, define valid moves, and determine when the game ends and who wins.

Your MCTS implementation should consist of the following core components:

  1. Node Representation: A class to represent a node in the search tree. Each node should store:

    • The game state it represents.
    • A reference to its parent node.
    • A list of child nodes, representing states reachable by one move.
    • Statistics: the number of times this node has been visited, and the total reward (or win count) accumulated from simulations passing through this node.
    • The move that led to this state from its parent (optional but helpful).
  2. MCTS Algorithm: A class or function that orchestrates the MCTS process. This will involve the following four steps, repeated for a specified number of iterations:

    • Selection: Starting from the root node, traverse down the tree by selecting the most promising child node at each step. A common strategy is UCB1 (Upper Confidence Bound 1).
    • Expansion: If the selected node is not a terminal game state and has unvisited children, create one or more child nodes representing new game states.
    • Simulation (Rollout): From the expanded node (or the selected node if no expansion occurred), simulate a game to completion using a random (or simple heuristic) policy.
    • Backpropagation: Propagate the result of the simulation (e.g., win/loss/draw) back up the tree to update the statistics of the visited nodes.
  3. Game State Interface: A clear interface for the GameState class. At a minimum, it should support:

    • get_legal_moves(): Returns a list of valid moves from the current state.
    • apply_move(move): Returns a new GameState object after applying the given move.
    • is_terminal(): Returns True if the game has ended, False otherwise.
    • get_winner(): Returns the winner of the game (e.g., 0 for draw, 1 for player 1, -1 for player 2) if is_terminal() is True.

Expected Behavior: Given a GameState and a number of simulations, the MCTS algorithm should return the "best" move from the root state, typically the move leading to the child node with the highest visit count or win rate.

Edge Cases to Consider:

  • Games with no legal moves (should be handled by is_terminal or game logic).
  • Games that end immediately.
  • Games where all moves lead to losing states.

Examples

Let's define a very simple game: "Take-Away". Two players start with a pile of N items. Players take turns removing 1, 2, or 3 items. The player who takes the last item wins.

Example GameState Implementation:

class TakeAwayGameState:
    def __init__(self, num_items, current_player):
        self.num_items = num_items
        self.current_player = current_player # 1 or -1

    def get_legal_moves(self):
        moves = []
        for i in range(1, min(self.num_items, 3) + 1):
            moves.append(i)
        return moves

    def apply_move(self, move):
        if move not in self.get_legal_moves():
            raise ValueError("Invalid move")
        new_num_items = self.num_items - move
        next_player = -self.current_player
        return TakeAwayGameState(new_num_items, next_player)

    def is_terminal(self):
        return self.num_items == 0

    def get_winner(self):
        if not self.is_terminal():
            return None # Game not over
        # If current_player is 1, it means the *previous* player made the last move
        # and won. So, the winner is -self.current_player.
        return -self.current_player

    def __str__(self):
        return f"Items: {self.num_items}, Player: {self.current_player}"

Example 1: Basic Play

Input:
Initial GameState: TakeAwayGameState(num_items=7, current_player=1)
Number of Simulations: 1000

Output:
Best Move: 1

Explanation: With 7 items, player 1 can take 1, 2, or 3. MCTS will simulate many games. If player 1 takes 1 item, 6 remain. If player 1 takes 2, 5 remain. If player 1 takes 3, 4 remain. The MCTS, after 1000 simulations, will determine that taking 1 item is the move that leads to the highest probability of winning. In this specific game, taking 1 item is indeed the optimal move if played perfectly, as it leaves the opponent with 6 items (a losing position if the opponent plays optimally).

Example 2: Near End Game

Input:
Initial GameState: TakeAwayGameState(num_items=3, current_player=1)
Number of Simulations: 1000

Output:
Best Move: 3

Explanation: With 3 items remaining, the current player can take 1, 2, or 3. Taking 3 items wins the game immediately. MCTS will quickly discover this winning move through simulations.

Example 3: Edge Case - Already Terminal State

Input:
Initial GameState: TakeAwayGameState(num_items=0, current_player=1)
Number of Simulations: 1000

Output:
Best Move: None (or an indication that no moves are possible)

Explanation: The game is already over. There are no legal moves to make. The MCTS should handle this gracefully, likely returning None or raising an error indicating an already terminal state.

Constraints

  • The GameState class will adhere to the described interface.
  • The number of simulations will be an integer between 100 and 100,000.
  • The GameState for num_items will be between 1 and 100.
  • Your MCTS implementation should be reasonably efficient to complete within a few seconds for the given number of simulations.

Notes

  • Consider using the UCB1 formula for the Selection phase: UCB1 = win_rate + C * sqrt(ln(parent_visits) / node_visits), where C is an exploration parameter (e.g., sqrt(2)).
  • For the Simulation phase, a simple random move selection is sufficient for this challenge.
  • You will need to implement the MCTS class and its methods, as well as the Node class.
  • The MCTS algorithm should ideally be implemented as a class with methods like search(game_state, num_simulations) and best_move(game_state).
  • Ensure your GameState and Node objects are properly immutable or handle state copying correctly to avoid unintended side effects during simulations.
Loading editor...
python