我构建了我的第一个 AI，你也可以！一个完整的从零到一指南

社区文章发布于 2025 年 7 月 11 日

人工智能 (AI) 曾经对我来说就像一个“黑匣子”，一个精英研究人员专属的领域。但我最近发现，其核心原则却出奇地容易理解。如果我告诉你，你现在就可以从零开始构建一个功能齐全的对话式 AI 呢？

在这份综合指南中，我将引导您完成我旅程中的每一个步骤。本帖包含您所需的一切：Ubuntu 的分步环境设置指南、带有详细解释的完整、未经删节的源代码，甚至还有一个我们将一起解决的实际错误。到最后，您将不仅仅是运行了一些代码；您会理解它为什么起作用。

我们的目标简单而强大

构建一个 AI 模型，使其在听到“hello”时学会回复“hello to you too”。

让我们开始吧。

先决条件：设置您的开发环境（在 Ubuntu 上）

在构建 AI 之前，我们需要准备好我们的工作室。本指南将引导您设置一个干净、独立的 Python 环境。这是一个最佳实践，可以防止与其他项目发生冲突。

A 部分：仅 CPU 安装（推荐所有人）

无论您是否拥有专用显卡 (GPU)，此设置都适用于任何机器。

第 1 步：安装 Python 和基本工具 打开您的终端并运行以下命令，确保您拥有 Python 及其包管理器 (pip) 和其虚拟环境工具 (venv)。

# First, update your package list
sudo apt update

# Then, install pip and venv for Python 3
sudo apt install python3-pip python3-venv -y

第 2 步：创建项目目录和虚拟环境 让我们为项目创建一个专用文件夹。

# Create a folder for your project
mkdir my_ai_project
cd my_ai_project

# Create a virtual environment named 'ai_env'
python3 -m venv ai_env

第 3 步：激活虚拟环境 每次处理此项目时，您都必须激活环境。

source ai_env/bin/activate

当您的终端提示变为 (ai_env) 时，例如：(ai_env) user@machine:~/my_ai_project$，您就会知道它已激活。

第 4 步：安装 PyTorch 激活环境后，使用官方推荐的 CPU-only 设置命令安装 PyTorch。

pip install torch torchvision torchaudio

B 部分：GPU 安装（针对 NVIDIA 用户）

如果您拥有 NVIDIA GPU 并已安装相应的驱动程序，您可以安装 PyTorch 的一个版本，该版本将使用您的 GPU 进行更快的训练。

按照 A 部分的步骤 1-3 进行。当您到达第 4 步时，请改用此命令

# Make sure your NVIDIA drivers are installed first!
# This command installs PyTorch with CUDA 12.1 support.
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

验证您的安装

您可以通过在已激活的终端中运行以下单行命令来快速检查是否一切都已正确安装：

python -c "import torch; print(f'PyTorch version: {torch.__version__}'); print(f'CUDA available: {torch.cuda.is_available()}')"

如果您安装了 CPU 版本，它将显示 CUDA available: False。如果您安装了 GPU 版本并且您的驱动程序设置正确，它将显示 CUDA available: True。

环境准备就绪后，您现在可以在 my_ai_project 目录中创建我们项目的三个 Python 文件。

第 1 步：`model.py` - 构建 AI 的大脑和词典

此文件定义了我们 AI 的核心结构。它包含用于管理单词的 Vocab 类以及构成其“大脑”的 EncoderRNN 和 DecoderRNN 神经网络。

# Create this file as: my_ai_project/model.py

import torch
import torch.nn as nn

# --- Vocabulary Class ---
# Define special tokens that will be part of our vocabulary.
PAD_token = 0  # Used to pad shorter sentences to a standard length.
SOS_token = 1  # "Start Of Sentence" token. Marks the beginning of an input or output.
EOS_token = 2  # "End Of Sentence" token. Marks the end.

class Vocab:
    """
    A class to manage the vocabulary and the mapping between words and numerical indices.
    This is crucial because neural networks work with numbers, not text.
    """
    def __init__(self):
        # Initialize mappings and word counts. Start with special tokens.
        self.word2index = {"<PAD>": PAD_token, "<SOS>": SOS_token, "<EOS>": EOS_token}
        self.index2word = {PAD_token: "<PAD>", SOS_token: "<SOS>", EOS_token: "<EOS>"}
        self.n_words = 3  # Start counting from 3 to account for the special tokens.

    def add_sentence(self, sentence):
        """Splits a sentence into words and adds them to the vocabulary."""
        for word in sentence.split(' '):
            self.add_word(word)

    def add_word(self, word):
        """Adds a new word to the vocabulary if it's not already there."""
        if word not in self.word2index:
            # Assign a new index to the word and update the mappings.
            self.word2index[word] = self.n_words
            self.index2word[self.n_words] = word
            self.n_words += 1

# --- Neural Network Model Definition ---
class EncoderRNN(nn.Module):
    """The Encoder part of the Seq2Seq model. It reads and encodes the input sentence."""
    def __init__(self, input_size, hidden_size, device):
        super(EncoderRNN, self).__init__()
        self.hidden_size = hidden_size
        self.device = device

        # Embedding layer: Turns word indices into dense vectors of a specified size.
        self.embedding = nn.Embedding(input_size, hidden_size)
        # GRU (Gated Recurrent Unit): A type of recurrent neural network that processes sequences.
        self.gru = nn.GRU(hidden_size, hidden_size)

    def forward(self, input, hidden):
        """Defines the forward pass of the encoder."""
        # The input is a word index. The embedding layer turns it into a vector.
        embedded = self.embedding(input).view(1, 1, -1)
        # The GRU processes the embedded vector and the previous hidden state.
        output, hidden = self.gru(embedded, hidden)
        return output, hidden

    def initHidden(self):
        """Initializes the hidden state with zeros."""
        return torch.zeros(1, 1, self.hidden_size, device=self.device)

class DecoderRNN(nn.Module):
    """The Decoder part of the Seq2Seq model. It generates the output sentence."""
    def __init__(self, hidden_size, output_size, device):
        super(DecoderRNN, self).__init__()
        self.hidden_size = hidden_size
        self.device = device

        self.embedding = nn.Embedding(output_size, hidden_size)
        self.gru = nn.GRU(hidden_size, hidden_size)
        # Linear layer: Maps the GRU's output to the size of our vocabulary, giving a score for each word.
        self.out = nn.Linear(hidden_size, output_size)
        # LogSoftmax: Converts scores into log probabilities, which is suitable for the NLLLoss function.
        self.log_softmax = nn.LogSoftmax(dim=1)

    def forward(self, input, hidden):
        """Defines the forward pass of the decoder."""
        output = self.embedding(input).view(1, 1, -1)
        output = torch.relu(output) # Apply a ReLU activation function.
        output, hidden = self.gru(output, hidden)
        output = self.log_softmax(self.out(output[0]))
        return output, hidden

    def initHidden(self):
        """Initializes the hidden state with zeros."""
        return torch.zeros(1, 1, self.hidden_size, device=self.device)

第 2 步：`train.py` - 让 AI 去学习

此脚本处理整个训练过程。它加载数据，初始化模型，并运行训练循环。在每个循环中，它将我们的“hello”示例输入模型，计算模型预测的“错误”程度（即损失），然后调整模型的内部权重，使其下次表现更好一点。

# Create this file as: my_ai_project/train.py

import torch
import torch.optim as optim
import torch.nn as nn
import os
# Import our custom classes and tokens from model.py
from model import Vocab, EncoderRNN, DecoderRNN, SOS_token, EOS_token

def train_step(input_tensor, target_tensor, encoder, decoder, encoder_optimizer, decoder_optimizer, criterion, device):
    """Performs a single training step."""
    # Initialize the encoder's hidden state.
    encoder_hidden = encoder.initHidden()

    # Clear the gradients from the previous step.
    encoder_optimizer.zero_grad()
    decoder_optimizer.zero_grad()

    loss = 0

    # --- Encoding phase ---
    # Iterate through each word of the input sentence.
    for ei in range(input_tensor.size(0)):
        _, encoder_hidden = encoder(input_tensor[ei], encoder_hidden)

    # --- Decoding phase ---
    # The decoder starts with the <SOS> token.
    decoder_input = torch.tensor([[SOS_token]], device=device)
    # The encoder's final hidden state is used as the decoder's initial hidden state.
    decoder_hidden = encoder_hidden

    # "Teacher Forcing": We feed the actual correct word from the target sentence
    # as the input to the decoder at each step. This helps the model learn faster.
    for di in range(target_tensor.size(0)):
        decoder_output, decoder_hidden = decoder(decoder_input, decoder_hidden)
        # Calculate the loss between the decoder's prediction and the true word.
        loss += criterion(decoder_output, target_tensor[di])
        # Set the next decoder input to the current correct word.
        decoder_input = target_tensor[di]
        # Stop if we've reached the end of the sentence.
        if decoder_input.item() == EOS_token:
            break

    # Backpropagation: Calculate the gradients of the loss with respect to model parameters.
    loss.backward()

    # Update the model's weights using the optimizers.
    encoder_optimizer.step()
    decoder_optimizer.step()

    # Return the average loss for this step.
    return loss.item() / target_tensor.size(0)

def main():
    # --- Hyperparameters and Settings ---
    hidden_size = 256
    learning_rate = 0.01
    n_epochs = 1000
    checkpoint_path = 'model_checkpoint.pth'
    # Automatically select a device (GPU if available, otherwise CPU).
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print(f"Using device: {device}")

    # --- Data Preparation ---
    input_sentence = "hello"
    output_sentence = "hello to you too"
    vocab = Vocab()
    vocab.add_sentence(input_sentence)
    vocab.add_sentence(output_sentence)

    # Convert sentences to tensors of numerical indices.
    input_tensor = torch.tensor([vocab.word2index[word] for word in input_sentence.split(' ')] + [EOS_token], dtype=torch.long).view(-1, 1).to(device)
    target_tensor = torch.tensor([vocab.word2index[word] for word in output_sentence.split(' ')] + [EOS_token], dtype=torch.long).view(-1, 1).to(device)

    # --- Initialize Models and Optimizers ---
    encoder = EncoderRNN(vocab.n_words, hidden_size, device).to(device)
    decoder = DecoderRNN(hidden_size, vocab.n_words, device).to(device)
    encoder_optimizer = optim.SGD(encoder.parameters(), lr=learning_rate)
    decoder_optimizer = optim.SGD(decoder.parameters(), lr=learning_rate)
    criterion = nn.NLLLoss() # Negative Log Likelihood Loss is suitable for this type of classification task.

    start_epoch = 0

    # --- Load checkpoint if it exists to resume training ---
    if os.path.exists(checkpoint_path):
        print(f"Found checkpoint '{checkpoint_path}', loading...")
        checkpoint = torch.load(checkpoint_path)
        encoder.load_state_dict(checkpoint['encoder_state_dict'])
        decoder.load_state_dict(checkpoint['decoder_state_dict'])
        encoder_optimizer.load_state_dict(checkpoint['encoder_optimizer_state_dict'])
        decoder_optimizer.load_state_dict(checkpoint['decoder_optimizer_state_dict'])
        start_epoch = checkpoint['epoch'] + 1
        vocab = checkpoint['vocab']  # It's crucial to restore the vocabulary!
        print(f"Load successful! Resuming training from epoch {start_epoch}.")
    else:
        print("No checkpoint found, starting training from scratch.")

    # --- Training Loop ---
    print("\n--- Starting Training ---")
    for epoch in range(start_epoch, n_epochs):
        loss = train_step(input_tensor, target_tensor, encoder, decoder, encoder_optimizer, decoder_optimizer, criterion, device)

        # Periodically print the loss and save a checkpoint.
        if (epoch + 1) % 100 == 0:
            print(f"Epoch {epoch+1}/{n_epochs}, Loss: {loss:.4f}")
            print("Saving checkpoint...")
            torch.save({
                'epoch': epoch,
                'encoder_state_dict': encoder.state_dict(),
                'decoder_state_dict': decoder.state_dict(),
                'encoder_optimizer_state_dict': encoder_optimizer.state_dict(),
                'decoder_optimizer_state_dict': decoder_optimizer.state_dict(),
                'vocab': vocab, # Save the vocabulary along with the model.
            }, checkpoint_path)

    print("--- Training Complete ---")

if __name__ == '__main__':
    main()

第 3 步：`inference.py` - 毕业日和一份真正的工作

训练结束后，我们的 AI 就准备好了！此脚本加载已保存的模型并允许我们与它进行交互。这是我们看到劳动成果的地方。

# Create this file as: my_ai_project/inference.py

import torch
# Import the necessary classes and tokens from model.py
from model import EncoderRNN, DecoderRNN, Vocab, SOS_token, EOS_token

def evaluate(encoder, decoder, sentence, vocab, device, max_length=10):
    """Generates a response from the model for a given input sentence."""
    # `torch.no_grad()` tells PyTorch we are not training, so it doesn't need to calculate gradients.
    with torch.no_grad():
        try:
            # Convert the input sentence into a tensor of indices.
            input_tensor = torch.tensor([vocab.word2index[word] for word in sentence.split(' ')] + [EOS_token], dtype=torch.long, device=device).view(-1, 1)
        except KeyError as e:
            # Handle cases where the input contains a word not in our vocabulary.
            return f"Error: The word {e} is not in the vocabulary."

        # --- Encoding ---
        encoder_hidden = encoder.initHidden()
        for ei in range(input_tensor.size(0)):
            _, encoder_hidden = encoder(input_tensor[ei], encoder_hidden)

        # --- Decoding ---
        decoder_input = torch.tensor([[SOS_token]], device=device)
        decoder_hidden = encoder_hidden
        decoded_words = []

        # Generate the response word by word.
        for _ in range(max_length):
            decoder_output, decoder_hidden = decoder(decoder_input, decoder_hidden)
            # Get the word with the highest probability from the decoder's output.
            topv, topi = decoder_output.data.topk(1)
            
            if topi.item() == EOS_token:
                # If the model outputs the <EOS> token, stop generating.
                break
            else:
                decoded_words.append(vocab.index2word[topi.item()])
            
            # Use the predicted word as the next input to the decoder.
            decoder_input = topi.squeeze().detach()

        return ' '.join(decoded_words)

def main():
    # --- Settings ---
    hidden_size = 256
    checkpoint_path = 'model_checkpoint.pth'
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    # --- Load Checkpoint ---
    print(f"Loading model from '{checkpoint_path}'...")
    try:
        # Load the saved checkpoint file.
        # This is where we fixed the bug!
        # We must set `weights_only=False` because our checkpoint contains a custom Python object (the Vocab class).
        # This is safe because we trust the source of this file (we created it ourselves).
        checkpoint = torch.load(checkpoint_path, map_location=device, weights_only=False)
    except FileNotFoundError:
        print("Error: Model file not found! Please run train.py first to train and save a model.")
        return
    except Exception as e:
        print(f"An error occurred while loading the checkpoint: {e}")
        print("This might be due to a PyTorch version mismatch or a corrupted file.")
        return
    
    # Restore the vocabulary and model weights from the checkpoint.
    vocab = checkpoint['vocab']
    encoder_state_dict = checkpoint['encoder_state_dict']
    decoder_state_dict = checkpoint['decoder_state_dict']

    # --- Initialize models and load the saved parameters ---
    encoder = EncoderRNN(vocab.n_words, hidden_size, device).to(device)
    decoder = DecoderRNN(hidden_size, vocab.n_words, device).to(device)
    
    encoder.load_state_dict(encoder_state_dict)
    decoder.load_state_dict(decoder_state_dict)

    # Set the models to evaluation mode. This disables layers like Dropout that are only used during training.
    encoder.eval()
    decoder.eval()

    print("Model loaded successfully! You can start chatting now (type 'quit' to exit).\n")

    # --- Interactive Chat Loop ---
    while True:
        input_sentence = input('> ')
        if input_sentence.lower() == 'quit':
            break
        
        # Get the model's response and print it.
        response = evaluate(encoder, decoder, input_sentence, vocab, device)
        print('<', response)

if __name__ == '__main__':
    main()

最终结果

创建这三个文件后，从您已激活的终端运行训练脚本

# Make sure you are in the my_ai_project directory and your environment is active
# (ai_env) $
python train.py

您将看到损失值正在下降，这表明 AI 正在学习。完成后，运行推理脚本与您的新 AI 聊天

# (ai_env) $
python inference.py

这是预期的交互

Loading model from 'model_checkpoint.pth'...
Model loaded successfully! You can start chatting now (type 'quit' to exit).

> hello
< hello to you too
> how are you
< Error: The word 'how' is not in the vocabulary.
> quit

它完美运行！我们的 AI 正确响应了“hello”，并在遇到不认识的单词时智能地告诉我们。

结论：您已经构建了一个 AI

如果您按照本指南操作，您刚刚完成了机器学习项目的整个生命周期，从设置环境到部署一个可用的应用程序。您刚刚实现的核心概念——数据处理、模型架构、损失计算、优化、训练循环和推理——是当今最先进的 AI 系统中使用的基本构建块。

人工智能不是魔术。它是一个建立在可理解原则基础上的工程领域。您已经通过自己构建一些东西迈出了最重要的一步。现在，人工智能的世界向您敞开。接下来您将构建什么？

社区

通过拖放到文本输入框、粘贴或点击此处上传图片、音频和视频。

点击或粘贴此处以上传图片

· 注册或登录发表评论