在 Intel® Gaudi® 2 AI 加速器上进行文本生成

发布于2024年2月29日

访客

随着生成式 AI (GenAI) 革命如火如荼地进行，使用 Llama 2 等开源 Transformer 模型进行文本生成已成为热门话题。AI 爱好者和开发者都希望利用这类模型的生成能力来实现自己的用例和应用。本文将展示使用 Optimum Habana 和自定义 pipeline 类通过 Llama 2 系列模型（7b、13b 和 70b）生成文本是多么容易——只需几行代码即可运行这些模型！

这个自定义 pipeline 类旨在提供极大的灵活性和易用性。此外，它还提供了高水平的抽象，并执行端到端的文本生成，包括预处理和后处理。使用该 pipeline 的方式有很多种——您可以运行 Optimum Habana 仓库中的 run_pipeline.py 脚本，将 pipeline 类添加到您自己的 Python 脚本中，或者使用它初始化 LangChain 类。

先决条件

由于 Llama 2 模型是封闭式仓库的一部分，如果您尚未获得访问权限，则需要申请访问。首先，您必须访问 Meta 网站并接受条款和条件。在 Meta 授予您访问权限后（可能需要一两天），您必须使用在 Meta 表格中提供的相同电子邮件地址在 Hugging Face 中申请访问权限。

获得访问权限后，请运行以下命令登录您的 Hugging Face 帐户（您需要一个访问令牌，您可以从您的用户个人资料页面获取）

huggingface-cli login

您还需要安装最新版本的 Optimum Habana 并克隆仓库以访问 pipeline 脚本。以下是执行此操作的命令

pip install optimum-habana==1.10.4
git clone -b v1.10-release https://github.com/huggingface/optimum-habana.git

如果您计划运行分布式推理，请根据您的 SynapseAI 版本安装 DeepSpeed。在此示例中，我使用的是 SynapseAI 1.14.0。

pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.14.0

现在您已准备好使用 pipeline 执行文本生成！

使用 Pipeline

首先，进入您的 optimum-habana 检出目录，其中包含 pipeline 脚本，并按照 README 中的说明更新您的 PYTHONPATH。

cd optimum-habana/examples/text-generation
pip install -r requirements.txt
cd text-generation-pipeline

如果您希望从您选择的提示生成文本序列，以下是一个示例命令。

python run_pipeline.py  --model_name_or_path meta-llama/Llama-2-7b-hf --use_hpu_graphs --use_kv_cache --max_new_tokens 100 --do_sample --prompt "Here is my prompt"

您还可以传递多个提示作为输入，并更改生成时的 temperature 和 top_p 值，如下所示。

python run_pipeline.py --model_name_or_path meta-llama/Llama-2-13b-hf --use_hpu_graphs --use_kv_cache --max_new_tokens 100 --do_sample --temperature 0.5 --top_p 0.95 --prompt "Hello world" "How are you?"

对于使用 Llama-2-70b 等大型模型生成文本，以下是一个使用 DeepSpeed 启动 pipeline 的示例命令。

python ../../gaudi_spawn.py --use_deepspeed --world_size 8 run_pipeline.py --model_name_or_path meta-llama/Llama-2-70b-hf --max_new_tokens 100 --bf16 --use_hpu_graphs --use_kv_cache --do_sample --temperature 0.5 --top_p 0.95 --prompt "Hello world" "How are you?" "Here is my prompt" "Once upon a time"

在 Python 脚本中使用

您可以在自己的脚本中使用 pipeline 类，如下例所示。从 optimum-habana/examples/text-generation/text-generation-pipeline 运行以下示例脚本。

import argparse
import logging

from pipeline import GaudiTextGenerationPipeline
from run_generation import setup_parser

# Define a logger
logging.basicConfig(
    format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
    datefmt="%m/%d/%Y %H:%M:%S",
    level=logging.INFO,
)
logger = logging.getLogger(__name__)

# Set up an argument parser
parser = argparse.ArgumentParser()
args = setup_parser(parser)

# Define some pipeline arguments. Note that --model_name_or_path is a required argument for this script
args.num_return_sequences = 1
args.model_name_or_path = "meta-llama/Llama-2-7b-hf"
args.max_new_tokens = 100
args.use_hpu_graphs = True
args.use_kv_cache = True
args.do_sample = True

# Initialize the pipeline
pipe = GaudiTextGenerationPipeline(args, logger)

# You can provide input prompts as strings
prompts = ["He is working on", "Once upon a time", "Far far away"]

# Generate text with pipeline
for prompt in prompts:
    print(f"Prompt: {prompt}")
    output = pipe(prompt)
    print(f"Generated Text: {repr(output)}")

您需要使用 python .py --model_name_or_path a_model_name 运行上述脚本，因为 --model_name_or_path 是必需参数。但是，模型名称可以像 Python 片段中所示那样通过编程方式更改。

这表明 pipeline 类接受字符串输入，并为我们执行数据预处理和后处理。

LangChain 兼容性

文本生成 pipeline 可以通过 use_with_langchain 构造函数参数作为输入提供给 LangChain 类。您可以按如下方式安装 LangChain。

pip install langchain==0.0.191

以下是一个示例脚本，展示了如何将 pipeline 类与 LangChain 一起使用。

import argparse
import logging

from langchain.llms import HuggingFacePipeline
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

from pipeline import GaudiTextGenerationPipeline
from run_generation import setup_parser

# Define a logger
logging.basicConfig(
    format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
    datefmt="%m/%d/%Y %H:%M:%S",
    level=logging.INFO,
)
logger = logging.getLogger(__name__)

# Set up an argument parser
parser = argparse.ArgumentParser()
args = setup_parser(parser)

# Define some pipeline arguments. Note that --model_name_or_path is a required argument for this script
args.num_return_sequences = 1
args.model_name_or_path = "meta-llama/Llama-2-13b-chat-hf"
args.max_input_tokens = 2048
args.max_new_tokens = 1000
args.use_hpu_graphs = True
args.use_kv_cache = True
args.do_sample = True
args.temperature = 0.2
args.top_p = 0.95

# Initialize the pipeline
pipe = GaudiTextGenerationPipeline(args, logger, use_with_langchain=True)

# Create LangChain object
llm = HuggingFacePipeline(pipeline=pipe)

template = """Use the following pieces of context to answer the question at the end. If you don't know the answer,\
just say that you don't know, don't try to make up an answer.

Context: Large Language Models (LLMs) are the latest models used in NLP.
Their superior performance over smaller models has made them incredibly
useful for developers building NLP enabled applications. These models
can be accessed via Hugging Face's `transformers` library, via OpenAI
using the `openai` library, and via Cohere using the `cohere` library.

Question: {question}
Answer: """

prompt = PromptTemplate(input_variables=["question"], template=template)
llm_chain = LLMChain(prompt=prompt, llm=llm)

# Use LangChain object
question = "Which libraries and model providers offer LLMs?"
response = llm_chain(prompt.format(question=question))
print(f"Question 1: {question}")
print(f"Response 1: {response['text']}")

question = "What is the provided context about?"
response = llm_chain(prompt.format(question=question))
print(f"\nQuestion 2: {question}")
print(f"Response 2: {response['text']}")

该 pipeline 类已针对 LangChain 版本 0.0.191 进行了验证，可能不适用于其他版本的软件包。

总结

我们展示了在 Intel® Gaudi® 2 AI 加速器上自定义的文本生成 pipeline，它可以接受单个或多个提示作为输入。该 pipeline 在模型大小以及影响文本生成质量的参数方面提供了极大的灵活性。此外，它也非常易于使用并可插入您的脚本，并且与 LangChain 兼容。

预训练模型的使用受第三方许可协议的约束，包括“Llama 2 社区许可协议”（LLAMAV2）。有关 LLAMA2 模型预期用途、何为滥用和超出范围的用途、目标用户以及附加条款的指南，请查看并阅读此链接中的说明：https://ai.meta.com/llama/license/。用户承担遵守任何第三方许可协议的唯一责任，Habana Labs 对用户使用或遵守第三方许可协议不承担任何责任。要运行 Llama-2-70b-hf 等受限模型，您需要满足以下条件：

拥有 HuggingFace 帐户

同意模型卡中模型的使用条款

设置读取令牌

在启动脚本前使用 HF CLI 登录您的帐户：运行 huggingface-cli login

更多博客文章

大型语言模型快速推理：Habana Gaudi2 加速器上的 BLOOMZ

作者： 2023年3月28日 • 2

通过 Hugging Face 和 NVIDIA NIMs 进行无服务器推理

作者： 2024年7月29日 • 32

社区

通过拖放到文本输入框、粘贴或点击此处上传图片、音频和视频。

点击或粘贴此处以上传图片

· 注册或登录以发表评论