TRL 文档

数据集格式和类型

TRL

加入 Hugging Face 社区

并获得增强的文档体验

在模型、数据集和 Spaces 上协作

通过加速推理获得更快的示例

切换文档主题

开始使用

数据集格式和类型

本指南概述了 TRL 中每个训练器支持的数据集格式和类型。

数据集格式和类型概述

数据集的格式指的是数据的结构方式，通常分为标准格式或对话格式。
类型与数据集设计的特定任务相关联，例如仅提示或偏好。每种类型的特点是其列，这些列根据任务而变化，如下表所示。

类型 \ 格式	标准	对话
语言建模	`{"text": "The sky is blue."}`	`{"messages": [{"role": "user", "content": "What color is the sky?"}, {"role": "assistant", "content": "It is blue."}]}`
仅提示	`{"prompt": "The sky is"}`	`{"prompt": [{"role": "user", "content": "What color is the sky?"}]}`
提示-补全	`{"prompt": "The sky is", "completion": " blue."}`	`{"prompt": [{"role": "user", "content": "What color is the sky?"}], "completion": [{"role": "assistant", "content": "It is blue."}]}`
偏好	`{"prompt": "The sky is", "chosen": " blue.", "rejected": " green."}` 或，带有隐式提示 `{"chosen": "The sky is blue.", "rejected": "The sky is green."}`	`{"prompt": [{"role": "user", "content": "What color is the sky?"}], "chosen": [{"role": "assistant", "content": "It is blue."}], "rejected": [{"role": "assistant", "content": "It is green."}]}` 或，带有隐式提示 `{"chosen": [{"role": "user", "content": "What color is the sky?"}, {"role": "assistant", "content": "It is blue."}], "rejected": [{"role": "user", "content": "What color is the sky?"}, {"role": "assistant", "content": "It is green."}]}`
非成对偏好	`{"prompt": "The sky is", "completion": " blue.", "label": True}`	`{"prompt": [{"role": "user", "content": "What color is the sky?"}], "completion": [{"role": "assistant", "content": "It is green."}], "label": False}`
逐步监督	`{"prompt": "Which number is larger, 9.8 or 9.11?", "completions": ["The fractional part of 9.8 is 0.8.", "The fractional part of 9.11 is 0.11.", "0.11 is greater than 0.8.", "Hence, 9.11 > 9.8."], "labels": [True, True, False, False]}`

格式

标准

标准数据集格式通常由纯文本字符串组成。数据集中的列根据任务而变化。这是 TRL 训练器期望的格式。以下是不同任务的标准数据集格式示例

# Language modeling
language_modeling_example = {"text": "The sky is blue."}
# Preference
preference_example = {"prompt": "The sky is", "chosen": " blue.", "rejected": " green."}
# Unpaired preference
unpaired_preference_example = {"prompt": "The sky is", "completion": " blue.", "label": True}

对话

对话数据集用于涉及用户和助手之间对话或聊天交互的任务。与标准数据集格式不同，这些数据集包含消息序列，其中每条消息都有一个 role（角色）（例如，"user" 或 "assistant"）和 content（内容）（消息文本）。

messages = [
    {"role": "user", "content": "Hello, how are you?"},
    {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
    {"role": "user", "content": "I'd like to show off how chat templating works!"},
]

与标准数据集一样，对话数据集中的列也根据任务而变化。以下是不同任务的对话数据集格式示例

# Prompt-completion
prompt_completion_example = {"prompt": [{"role": "user", "content": "What color is the sky?"}],
                             "completion": [{"role": "assistant", "content": "It is blue."}]}
# Preference
preference_example = {
    "prompt": [{"role": "user", "content": "What color is the sky?"}],
    "chosen": [{"role": "assistant", "content": "It is blue."}],
    "rejected": [{"role": "assistant", "content": "It is green."}],
}

对话数据集对于训练聊天模型很有用，但在与 TRL 训练器一起使用之前，必须将其转换为标准格式。这通常使用特定于所用模型的聊天模板来完成。有关更多信息，请参阅在 TRL 中使用对话数据集部分。

类型

语言建模

语言建模数据集包含一个 "text" 列（或对话数据集的 "messages" 列），其中包含完整的文本序列。

# Standard format
language_modeling_example = {"text": "The sky is blue."}
# Conversational format
language_modeling_example = {"messages": [
    {"role": "user", "content": "What color is the sky?"},
    {"role": "assistant", "content": "It is blue."}
]}

仅提示

在仅提示数据集中，仅提供初始提示（问题或部分句子），键为 "prompt"。训练通常涉及基于此提示生成补全，模型学习继续或完成给定的输入。

# Standard format
prompt_only_example = {"prompt": "The sky is"}
# Conversational format
prompt_only_example = {"prompt": [{"role": "user", "content": "What color is the sky?"}]}

有关仅提示数据集的示例，请参阅仅提示数据集集合。

虽然仅提示类型和语言建模类型都很相似，但它们的输入处理方式不同。在仅提示类型中，提示表示期望模型完成或继续的部分输入，而在语言建模类型中，输入被视为完整的句子或序列。 TRL 对这两种类型进行不同的处理。以下示例显示了每种类型的 apply_chat_template 函数输出的差异

from transformers import AutoTokenizer
from trl import apply_chat_template

tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-128k-instruct")

# Example for prompt-only type
prompt_only_example = {"prompt": [{"role": "user", "content": "What color is the sky?"}]}
apply_chat_template(prompt_only_example, tokenizer)
# Output: {'prompt': '<|user|>\nWhat color is the sky?<|end|>\n<|assistant|>\n'}

# Example for language modeling type
lm_example = {"messages": [{"role": "user", "content": "What color is the sky?"}]}
apply_chat_template(lm_example, tokenizer)
# Output: {'text': '<|user|>\nWhat color is the sky?<|end|>\n<|endoftext|>'}

仅提示输出包括 '<|assistant|>\n'，表示助手回合的开始，并期望模型生成补全。
相比之下，语言建模输出将输入视为完整序列，并以 '<|endoftext|>' 终止，表示文本结束，不期望任何其他内容。

提示-补全

提示-补全数据集包括 "prompt" 和 "completion"。

# Standard format
prompt_completion_example = {"prompt": "The sky is", "completion": " blue."}
# Conversational format
prompt_completion_example = {"prompt": [{"role": "user", "content": "What color is the sky?"}],
                             "completion": [{"role": "assistant", "content": "It is blue."}]}

有关提示-补全数据集的示例，请参阅提示-补全数据集集合。

偏好

偏好数据集用于训练模型在同一提示的两个或多个可能的补全之间进行选择的任务。此数据集包括 "prompt"、"chosen" 补全和 "rejected" 补全。训练模型以选择 "chosen" 响应而不是 "rejected" 响应。有些数据集可能不包括 "prompt" 列，在这种情况下，提示是隐式的，直接包含在 "chosen" 和 "rejected" 补全中。我们建议尽可能使用显式提示。

# Standard format
## Explicit prompt (recommended)
preference_example = {"prompt": "The sky is", "chosen": " blue.", "rejected": " green."}
# Implicit prompt
preference_example = {"chosen": "The sky is blue.", "rejected": "The sky is green."}

# Conversational format
## Explicit prompt (recommended)
preference_example = {"prompt": [{"role": "user", "content": "What color is the sky?"}],
                      "chosen": [{"role": "assistant", "content": "It is blue."}],
                      "rejected": [{"role": "assistant", "content": "It is green."}]}
## Implicit prompt
preference_example = {"chosen": [{"role": "user", "content": "What color is the sky?"},
                                 {"role": "assistant", "content": "It is blue."}],
                      "rejected": [{"role": "user", "content": "What color is the sky?"},
                                   {"role": "assistant", "content": "It is green."}]}

有关偏好数据集的示例，请参阅偏好数据集集合。

可以在 Hugging Face Hub 上找到一些带有 dpo 标签的偏好数据集。您还可以浏览 librarian-bots 的 DPO 集合，以识别偏好数据集。

非成对偏好

非成对偏好数据集与偏好数据集类似，但它不是为同一提示提供 "chosen" 和 "rejected" 补全，而是包括单个 "completion" 和一个 "label"，指示补全是否是首选。

# Standard format
unpaired_preference_example = {"prompt": "The sky is", "completion": " blue.", "label": True}
# Conversational format
unpaired_preference_example = {"prompt": [{"role": "user", "content": "What color is the sky?"}],
                               "completion": [{"role": "assistant", "content": "It is blue."}],
                               "label": True}

有关非成对偏好数据集的示例，请参阅非成对偏好数据集集合。

逐步监督

逐步（或过程）监督数据集与非成对偏好数据集类似，但包括多个步骤的补全，每个步骤都有自己的标签。这种结构对于需要详细的、逐步标记的任务（例如推理任务）非常有用。通过单独评估每个步骤并提供有针对性的标签，此方法有助于精确识别推理正确的位置和错误发生的位置，从而可以对推理过程的每个部分进行有针对性的反馈。

stepwise_example = {
    "prompt": "Which number is larger, 9.8 or 9.11?",
    "completions": ["The fractional part of 9.8 is 0.8, while the fractional part of 9.11 is 0.11.", "Since 0.11 is greater than 0.8, the number 9.11 is larger than 9.8."],
    "labels": [True, False]
}

有关逐步监督数据集的示例，请参阅逐步监督数据集集合。

应该使用哪种数据集类型？

选择正确的数据集类型取决于您正在处理的任务以及您正在使用的 TRL 训练器的具体要求。以下是每个 TRL 训练器支持的数据集类型的简要概述。

训练器	预期数据集类型
BCOTrainer	非成对偏好
CPOTrainer	偏好（建议使用显式提示）
DPOTrainer	偏好（建议使用显式提示）
GKDTrainer	提示-补全
GRPOTrainer	仅提示
IterativeSFTTrainer	非成对偏好
KTOTrainer	非成对偏好或偏好（建议使用显式提示）
NashMDTrainer	仅提示
OnlineDPOTrainer	仅提示
ORPOTrainer	偏好（建议使用显式提示）
PPOTrainer	分词化的语言建模
PRMTrainer	逐步监督
RewardTrainer	偏好（建议使用隐式提示）
SFTTrainer	语言建模
XPOTrainer	仅提示

TRL 训练器目前仅支持标准数据集格式，目前是这样。如果您有对话数据集，则必须首先将其转换为标准格式。有关如何使用对话数据集的更多信息，请参阅在 TRL 中使用对话数据集部分。

在 TRL 中使用对话数据集

对话数据集越来越常见，特别是对于训练聊天模型而言。但是，某些 TRL 训练器在其原始格式中不支持对话数据集。（有关更多信息，请参阅issue #2071。）这些数据集必须首先转换为标准格式。幸运的是，TRL 提供了轻松处理此转换的工具，详细信息如下。

将对话数据集转换为标准数据集

要将对话数据集转换为标准数据集，您需要对数据集应用聊天模板。聊天模板是一种预定义的结构，通常包括用户和助手消息的占位符。此模板由您使用的模型的 tokenizer 提供。

有关使用聊天模板的详细说明，请参阅 transformers 文档中的聊天模板部分。

在 TRL 中，您应用于转换数据集的方法将因任务而异。幸运的是，TRL 提供了一个名为 apply_chat_template() 的辅助函数来简化此过程。以下是如何使用它的示例

from transformers import AutoTokenizer
from trl import apply_chat_template

tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-128k-instruct")

example = {
    "prompt": [{"role": "user", "content": "What color is the sky?"}],
    "completion": [{"role": "assistant", "content": "It is blue."}]
}

apply_chat_template(example, tokenizer)
# Output:
# {'prompt': '<|user|>\nWhat color is the sky?<|end|>\n<|assistant|>\n', 'completion': 'It is blue.<|end|>\n<|endoftext|>'}

或者，您可以使用 map 方法在整个数据集上应用模板

from datasets import Dataset
from trl import apply_chat_template

dataset_dict = {
    "prompt": [[{"role": "user", "content": "What color is the sky?"}],
               [{"role": "user", "content": "Where is the sun?"}]],
    "completion": [[{"role": "assistant", "content": "It is blue."}],
                   [{"role": "assistant", "content": "In the sky."}]]
}

dataset = Dataset.from_dict(dataset_dict)
dataset = dataset.map(apply_chat_template, fn_kwargs={"tokenizer": tokenizer})
# Output:
# {'prompt': ['<|user|>\nWhat color is the sky?<|end|>\n<|assistant|>\n',
#             '<|user|>\nWhere is the sun?<|end|>\n<|assistant|>\n'],
#  'completion': ['It is blue.<|end|>\n<|endoftext|>', 'In the sky.<|end|>\n<|endoftext|>']}

我们建议使用 apply_chat_template() 函数，而不是直接调用 tokenizer.apply_chat_template。处理非语言建模数据集的聊天模板可能很棘手，并且可能导致错误，例如错误地将系统提示放置在对话中间。有关其他示例，请参阅 #1930 (评论)。 apply_chat_template() 旨在处理这些复杂性，并确保为各种任务正确应用聊天模板。

重要的是要注意，聊天模板是特定于模型的。例如，如果您将 meta-llama/Meta-Llama-3.1-8B-Instruct 的聊天模板与上面的示例一起使用，您将获得不同的输出

apply_chat_template(example, AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct"))
# Output:
# {'prompt': '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nWhat color is the sky?<|im_end|>\n<|im_start|>assistant\n',
#  'completion': 'It is blue.<|im_end|>\n'}

始终使用与您正在使用的模型关联的聊天模板。使用错误的模板可能会导致不准确或意外的结果。

在 TRL 中使用任何数据集：预处理和转换

许多数据集都采用针对特定任务量身定制的格式，这些格式可能与 TRL 不直接兼容。要将此类数据集与 TRL 一起使用，您可能需要对其进行预处理并将其转换为所需的格式。

为了简化此操作，我们提供了一组示例脚本，涵盖了常见的数据集转换。

示例：UltraFeedback 数据集

让我们以 UltraFeedback 数据集为例。以下是数据集的预览

如上所示，数据集格式与预期结构不符。它不是对话格式，列名不同，结果与不同的模型（例如，Bard、GPT-4）和方面（例如，“helpful”、“honest”）有关。

通过使用提供的转换脚本 examples/datasets/ultrafeedback.py，您可以将此数据集转换为非成对偏好类型，并将其推送到 Hub

python examples/datasets/ultrafeedback.py --push_to_hub --repo_id trl-lib/ultrafeedback-gpt-3.5-turbo-helpfulness

转换后，数据集将如下所示

现在，您可以将此数据集与 TRL 一起使用！

通过调整提供的脚本或创建您自己的脚本，您可以将任何数据集转换为与 TRL 兼容的格式。

用于转换数据集类型的实用工具

本节提供示例代码，以帮助您在不同数据集类型之间进行转换。虽然某些转换可以在应用聊天模板后执行（即，以标准格式），但我们建议在应用聊天模板之前执行转换，以确保其一致地工作。

为简单起见，以下一些示例未遵循此建议，而是使用了标准格式。但是，这些转换可以直接应用于对话格式，而无需修改。

从 \ 到	语言建模	提示-补全	仅提示	带有隐式提示的偏好	偏好	非成对偏好	逐步监督
语言建模	不适用	不适用	不适用	不适用	不适用	不适用	不适用
提示-补全	🔗	不适用	🔗	不适用	不适用	不适用	不适用
仅提示	不适用	不适用	不适用	不适用	不适用	不适用	不适用
带有隐式提示的偏好	🔗	🔗	🔗	不适用	🔗	🔗	不适用
偏好	🔗	🔗	🔗	🔗	不适用	🔗	不适用
非成对偏好	🔗	🔗	🔗	不适用	不适用	不适用	不适用
逐步监督	🔗	🔗	🔗	不适用	不适用	🔗	不适用

从提示-补全到语言建模数据集

要将提示-补全数据集转换为语言建模数据集，请连接提示和补全。

from datasets import Dataset

dataset = Dataset.from_dict({
    "prompt": ["The sky is", "The sun is"],
    "completion": [" blue.", " in the sky."],
})

def concat_prompt_completion(example):
    return {"text": example["prompt"] + example["completion"]}

dataset = dataset.map(concat_prompt_completion, remove_columns=["prompt", "completion"])

>>> dataset[0]
{'text': 'The sky is blue.'}

从提示-补全到仅提示数据集

要将提示-补全数据集转换为仅提示数据集，请删除补全。

from datasets import Dataset

dataset = Dataset.from_dict({
    "prompt": ["The sky is", "The sun is"],
    "completion": [" blue.", " in the sky."],
})

dataset = dataset.remove_columns("completion")

>>> dataset[0]
{'prompt': 'The sky is'}

从带有隐式提示的偏好到语言建模数据集

要将带有隐式提示的偏好数据集转换为语言建模数据集，请删除拒绝项，并将列 "chosen" 重命名为 "text"。

from datasets import Dataset

dataset = Dataset.from_dict({
    "chosen": ["The sky is blue.", "The sun is in the sky."],
    "rejected": ["The sky is green.", "The sun is in the sea."],
})

dataset = dataset.rename_column("chosen", "text").remove_columns("rejected")

>>> dataset[0]
{'text': 'The sky is blue.'}

从带有隐式提示的偏好到提示-补全数据集

要将带有隐式提示的偏好数据集转换为提示-补全数据集，请使用 extract_prompt() 提取提示，删除拒绝项，并将列 "chosen" 重命名为 "completion"。

from datasets import Dataset
from trl import extract_prompt

dataset = Dataset.from_dict({
    "chosen": [
        [{"role": "user", "content": "What color is the sky?"}, {"role": "assistant", "content": "It is blue."}],
        [{"role": "user", "content": "Where is the sun?"}, {"role": "assistant", "content": "In the sky."}],
    ],
    "rejected": [
        [{"role": "user", "content": "What color is the sky?"}, {"role": "assistant", "content": "It is green."}],
        [{"role": "user", "content": "Where is the sun?"}, {"role": "assistant", "content": "In the sea."}],
    ],
})
dataset = dataset.map(extract_prompt).remove_columns("rejected").rename_column("chosen", "completion")

>>> dataset[0]
{'prompt': [{'role': 'user', 'content': 'What color is the sky?'}], 'completion': [{'role': 'assistant', 'content': 'It is blue.'}]}

从带有隐式提示的偏好到仅提示数据集

要将带有隐式提示的偏好数据集转换为仅提示数据集，请使用 extract_prompt() 提取提示，并删除拒绝项和选择项。

from datasets import Dataset
from trl import extract_prompt

dataset = Dataset.from_dict({
    "chosen": [
        [{"role": "user", "content": "What color is the sky?"}, {"role": "assistant", "content": "It is blue."}],
        [{"role": "user", "content": "Where is the sun?"}, {"role": "assistant", "content": "In the sky."}],
    ],
    "rejected": [
        [{"role": "user", "content": "What color is the sky?"}, {"role": "assistant", "content": "It is green."}],
        [{"role": "user", "content": "Where is the sun?"}, {"role": "assistant", "content": "In the sea."}],
    ],
})
dataset = dataset.map(extract_prompt).remove_columns(["chosen", "rejected"])

>>> dataset[0]
{'prompt': [{'role': 'user', 'content': 'What color is the sky?'}]}

从隐式提示偏好数据集到显式提示偏好数据集

要将带有隐式提示的偏好数据集转换为带有显式提示的偏好数据集，请使用 extract_prompt() 提取提示。

from datasets import Dataset
from trl import extract_prompt

dataset = Dataset.from_dict({
    "chosen": [
        [{"role": "user", "content": "What color is the sky?"}, {"role": "assistant", "content": "It is blue."}],
        [{"role": "user", "content": "Where is the sun?"}, {"role": "assistant", "content": "In the sky."}],
    ],
    "rejected": [
        [{"role": "user", "content": "What color is the sky?"}, {"role": "assistant", "content": "It is green."}],
        [{"role": "user", "content": "Where is the sun?"}, {"role": "assistant", "content": "In the sea."}],
    ],
})

dataset = dataset.map(extract_prompt)

>>> dataset[0]
{'prompt': [{'role': 'user', 'content': 'What color is the sky?'}],
 'chosen': [{'role': 'assistant', 'content': 'It is blue.'}],
 'rejected': [{'role': 'assistant', 'content': 'It is green.'}]}

从带有隐式提示的偏好到非成对偏好数据集

要将带有隐式提示的偏好数据集转换为非成对偏好数据集，请使用 extract_prompt() 提取提示，并使用 unpair_preference_dataset() 解除数据集的配对。

from datasets import Dataset
from trl import extract_prompt, unpair_preference_dataset

dataset = Dataset.from_dict({
    "chosen": [
        [{"role": "user", "content": "What color is the sky?"}, {"role": "assistant", "content": "It is blue."}],
        [{"role": "user", "content": "Where is the sun?"}, {"role": "assistant", "content": "In the sky."}],
    ],
    "rejected": [
        [{"role": "user", "content": "What color is the sky?"}, {"role": "assistant", "content": "It is green."}],
        [{"role": "user", "content": "Where is the sun?"}, {"role": "assistant", "content": "In the sea."}],
    ],
})

dataset = dataset.map(extract_prompt)
dataset = unpair_preference_dataset(dataset)

>>> dataset[0]
{'prompt': [{'role': 'user', 'content': 'What color is the sky?'}],
 'completion': [{'role': 'assistant', 'content': 'It is blue.'}],
 'label': True}

请记住，偏好数据集中的 "chosen" 和 "rejected" 完成都可以是好的或坏的。在应用 unpair_preference_dataset() 之前，请确保所有 "chosen" 完成都可以标记为好，而所有 "rejected" 完成都可以标记为坏。这可以通过检查每个完成的绝对评分来确保，例如来自奖励模型。

从偏好到语言建模数据集

要将偏好数据集转换为语言建模数据集，请删除拒绝项，并将提示和选择项连接到 "text" 列中。

from datasets import Dataset

dataset = Dataset.from_dict({
    "prompt": ["The sky is", "The sun is"],
    "chosen": [" blue.", " in the sky."],
    "rejected": [" green.", " in the sea."],
})

def concat_prompt_chosen(example):
    return {"text": example["prompt"] + example["chosen"]}

dataset = dataset.map(concat_prompt_chosen, remove_columns=["prompt", "chosen", "rejected"])

>>> dataset[0]
{'text': 'The sky is blue.'}

从偏好到提示-补全数据集

要将偏好数据集转换为提示-补全数据集，请删除拒绝项，并将列 "chosen" 重命名为 "completion"。

from datasets import Dataset

dataset = Dataset.from_dict({
    "prompt": ["The sky is", "The sun is"],
    "chosen": [" blue.", " in the sky."],
    "rejected": [" green.", " in the sea."],
})

dataset = dataset.remove_columns("rejected").rename_column("chosen", "completion")

>>> dataset[0]
{'prompt': 'The sky is', 'completion': ' blue.'}

从偏好到仅提示数据集

要将偏好数据集转换为仅提示数据集，请删除拒绝项和选择项。

from datasets import Dataset

dataset = Dataset.from_dict({
    "prompt": ["The sky is", "The sun is"],
    "chosen": [" blue.", " in the sky."],
    "rejected": [" green.", " in the sea."],
})

dataset = dataset.remove_columns(["chosen", "rejected"])

>>> dataset[0]
{'prompt': 'The sky is'}

从显式提示偏好数据集到隐式提示偏好数据集

要将带有显式提示的偏好数据集转换为带有隐式提示的偏好数据集，请将提示连接到选择项和拒绝项，并删除提示。

from datasets import Dataset

dataset = Dataset.from_dict({
    "prompt": [
        [{"role": "user", "content": "What color is the sky?"}],
        [{"role": "user", "content": "Where is the sun?"}],
    ],
    "chosen": [
        [{"role": "assistant", "content": "It is blue."}],
        [{"role": "assistant", "content": "In the sky."}],
    ],
    "rejected": [
        [{"role": "assistant", "content": "It is green."}],
        [{"role": "assistant", "content": "In the sea."}],
    ],
})

def concat_prompt_to_completions(example):
    return {"chosen": example["prompt"] + example["chosen"], "rejected": example["prompt"] + example["rejected"]}

dataset = dataset.map(concat_prompt_to_completions, remove_columns="prompt")

>>> dataset[0]
{'chosen': [{'role': 'user', 'content': 'What color is the sky?'}, {'role': 'assistant', 'content': 'It is blue.'}],
 'rejected': [{'role': 'user', 'content': 'What color is the sky?'}, {'role': 'assistant', 'content': 'It is green.'}]}

从偏好到非成对偏好数据集

要将数据集转换为非成对偏好数据集，请使用 unpair_preference_dataset() 解除数据集的配对。

from datasets import Dataset
from trl import unpair_preference_dataset

dataset = Dataset.from_dict({
    "prompt": [
        [{"role": "user", "content": "What color is the sky?"}],
        [{"role": "user", "content": "Where is the sun?"}],
    ],
    "chosen": [
        [{"role": "assistant", "content": "It is blue."}],
        [{"role": "assistant", "content": "In the sky."}],
    ],
    "rejected": [
        [{"role": "assistant", "content": "It is green."}],
        [{"role": "assistant", "content": "In the sea."}],
    ],
})

dataset = unpair_preference_dataset(dataset)

>>> dataset[0]
{'prompt': [{'role': 'user', 'content': 'What color is the sky?'}],
 'completion': [{'role': 'assistant', 'content': 'It is blue.'}],
 'label': True}

从非成对偏好到语言建模数据集

要将非成对偏好数据集转换为语言建模数据集，请将提示与好的完成连接到 "text" 列中，并删除提示、完成和标签列。

from datasets import Dataset

dataset = Dataset.from_dict({
    "prompt": ["The sky is", "The sun is", "The sky is", "The sun is"],
    "completion": [" blue.", " in the sky.", " green.", " in the sea."],
    "label": [True, True, False, False],
})

def concatenate_prompt_completion(example):
    return {"text": example["prompt"] + example["completion"]}

dataset = dataset.filter(lambda x: x["label"]).map(concatenate_prompt_completion).remove_columns(["prompt", "completion", "label"])

>>> dataset[0]
{'text': 'The sky is blue.'}

从非成对偏好到提示-补全数据集

要将非成对偏好数据集转换为提示-补全数据集，请过滤好的标签，然后删除标签列。

from datasets import Dataset

dataset = Dataset.from_dict({
    "prompt": ["The sky is", "The sun is", "The sky is", "The sun is"],
    "completion": [" blue.", " in the sky.", " green.", " in the sea."],
    "label": [True, True, False, False],
})

dataset = dataset.filter(lambda x: x["label"]).remove_columns(["label"])

>>> dataset[0]
{'prompt': 'The sky is', 'completion': ' blue.'}

从非成对偏好到仅提示数据集

要将非成对偏好数据集转换为仅提示数据集，请删除完成和标签列。

from datasets import Dataset

dataset = Dataset.from_dict({
    "prompt": ["The sky is", "The sun is", "The sky is", "The sun is"],
    "completion": [" blue.", " in the sky.", " green.", " in the sea."],
    "label": [True, True, False, False],
})

dataset = dataset.remove_columns(["completion", "label"])

>>> dataset[0]
{'prompt': 'The sky is'}

从逐步监督到语言建模数据集

要将逐步监督数据集转换为语言建模数据集，请将提示与好的完成连接到 "text" 列中。

from datasets import Dataset

dataset = Dataset.from_dict({
    "prompt": ["Blue light", "Water"],
    "completions": [[" scatters more in the atmosphere,", " so the sky is green."],
                   [" forms a less dense structure in ice,", " which causes it to expand when it freezes."]],
    "labels": [[True, False], [True, True]],
})

def concatenate_prompt_completions(example):
    completion = "".join(example["completions"])
    return {"text": example["prompt"] + completion}

dataset = dataset.filter(lambda x: all(x["labels"])).map(concatenate_prompt_completions, remove_columns=["prompt", "completions", "labels"])

>>> dataset[0]
{'text': 'Blue light scatters more in the atmosphere, so the sky is green.'}

从逐步监督到提示-补全数据集

要将逐步监督数据集转换为提示-补全数据集，请连接好的完成并删除标签。

from datasets import Dataset

dataset = Dataset.from_dict({
    "prompt": ["Blue light", "Water"],
    "completions": [[" scatters more in the atmosphere,", " so the sky is green."],
                   [" forms a less dense structure in ice,", " which causes it to expand when it freezes."]],
    "labels": [[True, False], [True, True]],
})

def join_completions(example):
    completion = "".join(example["completions"])
    return {"completion": completion}

dataset = dataset.filter(lambda x: all(x["labels"])).map(join_completions, remove_columns=["completions", "labels"])

>>> dataset[0]
{'prompt': 'Blue light', 'completion': ' scatters more in the atmosphere, so the sky is green.'}

从逐步监督到仅提示数据集

要将逐步监督数据集转换为仅提示数据集，请删除完成和标签。

from datasets import Dataset

dataset = Dataset.from_dict({
    "prompt": ["Blue light", "Water"],
    "completions": [[" scatters more in the atmosphere,", " so the sky is green."],
                   [" forms a less dense structure in ice,", " which causes it to expand when it freezes."]],
    "labels": [[True, False], [True, True]],
})

dataset = dataset.remove_columns(["completions", "labels"])

>>> dataset[0]
{'prompt': 'Blue light'}

从逐步监督到非成对偏好数据集

要将逐步监督数据集转换为非成对偏好数据集，请连接完成并合并标签。

合并标签的方法取决于具体任务。在此示例中，我们使用逻辑 AND 运算。这意味着，如果步骤标签指示各个步骤的正确性，则结果标签将反映整个序列的正确性。

from datasets import Dataset

dataset = Dataset.from_dict({
    "prompt": ["Blue light", "Water"],
    "completions": [[" scatters more in the atmosphere,", " so the sky is green."],
                   [" forms a less dense structure in ice,", " which causes it to expand when it freezes."]],
    "labels": [[True, False], [True, True]],
})

def merge_completions_and_labels(example):
    return {"prompt": example["prompt"], "completion": "".join(example["completions"]), "label": all(example["labels"])}

dataset = dataset.map(merge_completions_and_labels, remove_columns=["completions", "labels"])

>>> dataset[0]
{'prompt': 'Blue light', 'completion': ' scatters more in the atmosphere, so the sky is green.', 'label': False}

视觉数据集

一些训练器还支持使用图像-文本对微调视觉-语言模型 (VLM)。在这种情况下，建议使用对话格式，因为每个模型以不同的方式处理文本中的图像占位符。

对话视觉数据集与标准对话数据集的不同之处在于两个关键方面

数据集必须包含带有图像数据的键 images 。
消息中的 "content" 字段必须是字典列表，其中每个字典指定数据类型： "image" 或 "text"。

示例

# Textual dataset:
"content": "What color is the sky?"

# Vision dataset:
"content": [
    {"type": "image"}, 
    {"type": "text", "text": "What color is the sky in the image?"}
]

对话视觉数据集的一个示例是 openbmb/RLAIF-V-Dataset。以下是数据集训练数据的嵌入视图，允许您直接探索它

< > 在 GitHub 上更新

←快速入门训练常见问题解答→