Transformers 流式输出

社区文章 发布于 2025 年 3 月 15 日

介绍

随着 AI 驱动的聊天机器人的进步,交互式学习变得更加引人入胜。在本博客中,我们将探讨如何使用 Python、Gradio 和**基于 Qwen 的语言模型**构建**流式输出**。

先决条件

在开始之前,请确保您已安装以下组件

pip install gradio transformers torch

代码实现

import gradio as gr  # Import the Gradio library for creating user interfaces
from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer  # Import necessary classes from the transformers library
from threading import Thread  # Import Thread for concurrent execution
import time  # Import time for adding delays

model_name = "unsloth/DeepSeek-R1-Distill-Qwen-1.5B-unsloth-bnb-4bit"  # Define the model name or path

# Load the pre-trained model with automatic data type and device mapping
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# Load the tokenizer associated with the model
tokenizer = AutoTokenizer.from_pretrained(model_name)

def QwenChat(message, history):  # Define the QwenChat function
    # Construct the messages list with system, history, and user message
    messages = [
        {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
    ]
    messages.extend(history)  # Add chat history to the messages list
    messages.append({"role": "user", "content": message})  # Append the user's message

    # Apply chat template to format the messages for the model
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )

    # Set up the streamer for token generation
    streamer = TextIteratorStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

    # Prepare model inputs by tokenizing the text and moving it to the model's device
    model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

    # Set up generation arguments including max tokens and streamer
    generation_args = {
        "max_new_tokens": 512,
        "streamer": streamer,
        **model_inputs
    }

    # Start a separate thread for model generation to allow streaming output
    thread = Thread(
        target=model.generate,
        kwargs=generation_args,
    )
    thread.start()

    # Accumulate and yield text tokens as they are generated
    acc_text = ""
    for text_token in streamer:
        time.sleep(0.01)  # Simulate real-time output with a short delay
        acc_text += text_token  # Append the generated token to the accumulated text
        yield acc_text  # Yield the accumulated text

    # Ensure the generation thread completes
    thread.join()

# Create a Gradio chat interface with the QwenChat function
demo = gr.ChatInterface(fn=QwenChat, type="messages")

# Launch the Gradio interface on all available network interfaces
demo.launch(server_name="0.0.0.0")

此 AI 导师的特点

  • 实时响应:模型处理输入时动态生成单词。
  • 交互式学习:用户可以与 AI 导师进行对话练习。
  • 可自定义:修改系统提示以调整教学风格。

工作原理

  1. 用户输入消息。
  2. 系统构建包含先前对话的**聊天模板**。
  3. AI 模型处理输入并**逐字**实时生成响应。
  4. 响应逐渐出现,模拟自然对话。

结论

这种方法提供了一种**引人入胜的 AI 学习方式**。通过集成**流式输出**,学生可以体验动态、**逼真的互动**,而不是静态响应。

立即尝试,开始您的 AI 学习之旅!🚀

社区

注册登录 发表评论