训练一个糟糕的井字游戏人工智能

社区文章 发布于 2024年6月18日

本项目演示了如何使用 PyTorch 构建和训练一个神经网络来玩井字游戏。该模型从包含所有可能游戏状态及其对应最佳走法的数据集中学习最优走法。

0. 设置设备

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Training on {device}.")

1. 棋盘表示与转换

我们将井字棋盘表示为一个3x3的列表,其中每个单元格可以是“x”、“o”或None。为了将其输入到我们的神经网络中,我们将其转换为一个张量。

def board_to_tensor(board):
    mapping = {'x': 1, 'o': -1, None: 0}
    return torch.tensor([[mapping[cell] for cell in row] for row in board], dtype=torch.float32).flatten()

此函数将“x”映射到1,“o”映射到-1,空单元格映射到0,然后将棋盘展平为1D张量。

2. 数据集创建

我们创建一个自定义的 PyTorch 数据集来存储我们的游戏状态及其对应的最佳走法。

class TicTacToeDataset(Dataset):
    def __init__(self, boards, moves):
        self.boards = boards
        self.moves = moves

    def __len__(self):
        return len(self.boards)

    def __getitem__(self, idx):
        board = self.boards[idx]
        move = self.moves[idx]
        return board, move

3. 神经网络架构

我们的井字游戏神经网络是一个简单的前馈网络。

class TicTacToeNN(nn.Module):
    def __init__(self):
        super(TicTacToeNN, self).__init__()
        self.fc1 = nn.Linear(9, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 9)
        self.softmax = nn.Softmax(dim=1)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return self.softmax(x)

它接收一个扁平化的棋盘(9个输入),并输出9个可能移动的概率。

4. 数据生成

我们使用暴力和启发式方法相结合,生成所有可能的有效井字棋盘及其对应的最佳走法。`find_best_move`函数实现了游戏逻辑,以确定任何给定棋盘状态的最优走法。

possible_items = ["x", "o", None]

all_boards = list(list(tup) for tup in itertools.product(possible_items, repeat=9))

valid_boards = [board for board in all_boards if None in board]

boards = []
for flat_board in valid_boards:
    board = [flat_board[i:i+3] for i in range(0, 9, 3)]
    boards.append(board)

boards[:9]
def find_best_move(board):
    def check_win(player):
        for row in board:
            if all(cell == player for cell in row):
                return True

        for col in range(3):
            if all(board[row][col] == player for row in range(3)):
                return True

        if all(board[i][i] == player for i in range(3)) or \
           all(board[i][2-i] == player for i in range(3)):
            return True
        return False
    
    def count_forks(player):
        forks = 0

        for row in board:
            if row.count(player) == 1 and row.count(None) == 2:
                forks += 1

        for col in range(3):
            if [board[row][col] for row in range(3)].count(player) == 1 and \
               [board[row][col] for row in range(3)].count(None) == 2:
                forks += 1
        
        if board[1][1] == player:
            if (board[0][0] == player and board[2][2] == None) or \
               (board[0][0] == None and board[2][2] == player):
                forks += 1
            if (board[0][2] == player and board[2][0] == None) or \
               (board[0][2] == None and board[2][0] == player):
                forks += 1
        return forks
    
    def board_full():
        return all(cell is not None for row in board for cell in row)
    
    def board_empty():
        return all(cell is None for row in board for cell in row)
    
    def is_valid_move(row, col):
        return 0 <= row < 3 and 0 <= col < 3 and board[row][col] is None
    
    def get_best_move():
        for row in range(3):
            for col in range(3):
                if board[row][col] is None:
                    board[row][col] = 'x'
                    if check_win('x'):
                        board[row][col] = None
                        return (row, col)
                    board[row][col] = None
        
        for row in range(3):
            for col in range(3):
                if board[row][col] is None:
                    board[row][col] = 'o'
                    if check_win('o'):
                        board[row][col] = None
                        return (row, col)
                    board[row][col] = None
        
        for row in range(3):
            for col in range(3):
                if board[row][col] is None:
                    board[row][col] = 'x'
                    if count_forks('x') > 1:
                        board[row][col] = None
                        return (row, col)
                    board[row][col] = None
        
        for row in range(3):
            for col in range(3):
                if board[row][col] is None:
                    board[row][col] = 'o'
                    if count_forks('o') > 1:
                        board[row][col] = None
                        return (row, col)
                    board[row][col] = None
        
        if board[1][1] is None:
            return (1, 1)
        
        for (r, c) in [(0, 0), (0, 2), (2, 0), (2, 2)]:
            if board[r][c] is None:
                return (r, c)
        
        for (r, c) in [(0, 1), (1, 0), (1, 2), (2, 1)]:
            if board[r][c] is None:
                return (r, c)
        
        return None
    
    return get_best_move()

moves = []

for board in boards:
    moves.append(list(find_best_move(board)))

moves[:9]

5. 设置数据加载器

确保将所有内容发送到`device`。

tensor_boards = [board_to_tensor(board).to(device) for board in boards]
tensor_moves = torch.tensor([move[0] * 3 + move[1] for move in moves], device=device)

dataset = TicTacToeDataset(tensor_boards, tensor_moves)
g = torch.Generator(device=device)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True, generator=g)

6. 训练循环

我们使用 Adam 优化器和交叉熵损失来训练我们的模型。

model = TicTacToeNN().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

epochs = 100
for epoch in range(epochs):
    for boards, moves in dataloader:
        boards = boards.to(device)
        moves = moves.to(device)
        optimizer.zero_grad()
        outputs = model(boards)
        loss = criterion(outputs, moves)
        loss.backward()
        optimizer.step()

    if (epoch + 1) % 10 == 0:
        print(f"Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}")

7. 模型评估

训练后,我们可以在测试板上评估我们的模型。

test_board = [[None, "o", "o"],
              [None, "o", None],
              [None, "x", "x"]]
test_tensor = board_to_tensor(test_board).unsqueeze(0).to(device)
model.eval()
with torch.no_grad():
    prediction = model(test_tensor)
    best_move_index = torch.argmax(prediction).item()
    best_move = [best_move_index // 3, best_move_index % 3]
    print(f"Best move for the test board: {best_move}")

这是一个愚蠢的项目,它行不通

这种设置存在许多问题,尤其是它经常会生成非法走法,因为它不检查合法走法,只检查总体上最有可能的下一步走法。

社区

注册登录 发表评论