在服务器或容器上评估模型

本地启动评估的替代方案是在 TGI 兼容的服务器/容器上部署模型，然后通过向服务器发送请求来运行评估。命令与之前相同，只是您需要指定 yaml 配置文件的路径（详见下文）

lighteval endpoint {tgi,inference-endpoint} \
    "/path/to/config/file"\
    <task parameters>

有两种类型的配置文件可以用于在服务器上运行

Hugging Face 推理端点

要使用 Hugging Face 的推理端点启动模型，您需要提供以下文件：endpoint_model.yaml。Lighteval 将自动部署端点、运行评估，并在最后删除端点（除非您指定已启动的端点，在这种情况下，端点之后不会被删除）。

配置文件示例

model:
  base_params:
    # Pass either model_name, or endpoint_name and true reuse_existing
    # endpoint_name: "llama-2-7B-lighteval" # needs to be lower case without special characters
    # reuse_existing: true # defaults to false; if true, ignore all params in instance, and don't delete the endpoint after evaluation
    model_name: "meta-llama/Llama-2-7b-hf"
    # revision: "main" # defaults to "main"
    dtype: "float16" # can be any of "awq", "eetq", "gptq", "4bit' or "8bit" (will use bitsandbytes), "bfloat16" or "float16"
  instance:
    accelerator: "gpu"
    region: "eu-west-1"
    vendor: "aws"
    instance_type: "nvidia-a10g"
    instance_size: "x1"
    framework: "pytorch"
    endpoint_type: "protected"
    namespace: null # The namespace under which to launch the endpoint. Defaults to the current user's namespace
    image_url: null # Optionally specify the docker image to use when launching the endpoint model. E.g., launching models with later releases of the TGI container with support for newer models.
    env_vars:
      null # Optional environment variables to include when launching the endpoint. e.g., `MAX_INPUT_LENGTH: 2048`

文本生成推理 (TGI)

要使用已部署在 TGI 服务器上的模型，例如在 Hugging Face 的无服务器推理上。

配置文件示例

model:
  instance:
    inference_server_address: ""
    inference_server_auth: null
    model_id: null # Optional, only required if the TGI container was launched with model_id pointing to a local directory

OpenAI API

Lighteval 也支持在 OpenAI API 上评估模型。为此，您需要在环境变量中设置您的 OpenAI API 密钥。

export  OPENAI_API_KEY={your_key}

然后运行以下命令

lighteval endpoint openai \
    {model-name} \
    <task parameters>

< > 在 GitHub 上更新