Lighteval 文档
保存和读取结果
加入 Hugging Face 社区
并获得增强的文档体验
开始使用
保存和读取结果
在本地保存结果
Lighteval 将自动将结果和评估详情保存在使用 --output-dir
选项设置的目录中。结果将保存在 {output_dir}/results/{model_name}/results_{timestamp}.json
中。 这是一个结果文件示例。输出路径可以是任何符合 fsspec 规范的路径(本地、s3、hf hub、gdrive、ftp 等)。
要保存评估的详细信息,您可以使用 --save-details
选项。详细信息将保存在 parquet 文件 {output_dir}/details/{model_name}/{timestamp}/details_{task}_{timestamp}.parquet
中。
将结果推送到 HuggingFace Hub
您可以将结果和评估详细信息推送到 HuggingFace Hub。为此,您需要设置 --push-to-hub
以及 --results-org
选项。结果将保存在名为 {results_org}/{model_org}/{model_name}
的数据集中。要推送详细信息,您需要设置 --save-details
选项。默认情况下,创建的数据集将是私有的,您可以通过设置 --public-run
选项将其公开。
将结果推送到 TensorBoard
您可以通过设置 --push-to-tensorboard
将结果推送到 TensorBoard。这将在使用 --results-org
选项设置的 HF 组织中创建一个 TensorBoard 仪表板。
如何加载和调查详细信息
从本地详细信息文件加载
from datasets import load_dataset
import os
output_dir = "evals_doc"
model_name = "HuggingFaceH4/zephyr-7b-beta"
timestamp = "latest"
task = "lighteval|gsm8k|0"
if timestamp == "latest":
path = f"{output_dir}/details/{model_org}/{model_name}/*/"
timestamps = glob.glob(path)
timestamp = sorted(timestamps)[-1].split("/")[-2]
print(f"Latest timestamp: {timestamp}")
details_path = f"{output_dir}/details/{model_name}/{timestamp}/details_{task}_{timestamp}.parquet"
# Load the details
details = load_dataset("parquet", data_files=details_path, split="train")
for detail in details:
print(detail)
从 HuggingFace Hub 加载
from datasets import load_dataset
results_org = "SaylorTwift"
model_name = "HuggingFaceH4/zephyr-7b-beta"
sanitized_model_name = model_name.replace("/", "__")
task = "lighteval|gsm8k|0"
public_run = False
dataset_path = f"{results_org}/details_{sanitized_model_name}{'_private' if not public_run else ''}"
details = load_dataset(dataset_path, task.replace("|", "_"), split="latest")
for detail in details:
print(detail)
详细信息文件包含以下列
choices
:在多项选择任务中呈现给模型的选项。gold
:正确答案。gold_index
:正确答案在选项列表中的索引。cont_tokens
:延续 tokens。example
:文本形式的输入。full_prompt
:完整的 prompt,将输入到模型中。input_tokens
:完整 prompt 的 tokens。instruction
:给予模型的指令。metrics
:为示例计算的指标。num_asked_few_shots
:向模型请求的少量示例数量。num_effective_few_shots
:有效的少量示例数量。padded
:输入是否被填充。pred_logits
:模型的 logits。predictions
:模型的预测。specifics
:任务的细节。truncated
:输入是否被截断。
结果文件示例
{
"config_general": {
"lighteval_sha": "203045a8431bc9b77245c9998e05fc54509ea07f",
"num_fewshot_seeds": 1,
"override_batch_size": 1,
"max_samples": 1,
"job_id": "",
"start_time": 620979.879320166,
"end_time": 621004.632108041,
"total_evaluation_time_secondes": "24.752787875011563",
"model_name": "gpt2",
"model_sha": "607a30d783dfa663caf39e06633721c8d4cfcd7e",
"model_dtype": null,
"model_size": "476.2 MB"
},
"results": {
"lighteval|gsm8k|0": {
"qem": 0.0,
"qem_stderr": 0.0,
"maj@8": 0.0,
"maj@8_stderr": 0.0
},
"all": {
"qem": 0.0,
"qem_stderr": 0.0,
"maj@8": 0.0,
"maj@8_stderr": 0.0
}
},
"versions": {
"lighteval|gsm8k|0": 0
},
"config_tasks": {
"lighteval|gsm8k": {
"name": "gsm8k",
"prompt_function": "gsm8k",
"hf_repo": "gsm8k",
"hf_subset": "main",
"metric": [
{
"metric_name": "qem",
"higher_is_better": true,
"category": "3",
"use_case": "5",
"sample_level_fn": "compute",
"corpus_level_fn": "mean"
},
{
"metric_name": "maj@8",
"higher_is_better": true,
"category": "5",
"use_case": "5",
"sample_level_fn": "compute",
"corpus_level_fn": "mean"
}
],
"hf_avail_splits": [
"train",
"test"
],
"evaluation_splits": [
"test"
],
"few_shots_split": null,
"few_shots_select": "random_sampling_from_train",
"generation_size": 256,
"generation_grammar": null,
"stop_sequence": [
"Question="
],
"num_samples": null,
"suite": [
"lighteval"
],
"original_num_docs": 1319,
"effective_num_docs": 1,
"trust_dataset": true,
"must_remove_duplicate_docs": null,
"version": 0
}
},
"summary_tasks": {
"lighteval|gsm8k|0": {
"hashes": {
"hash_examples": "8517d5bf7e880086",
"hash_full_prompts": "8517d5bf7e880086",
"hash_input_tokens": "29916e7afe5cb51d",
"hash_cont_tokens": "37f91ce23ef6d435"
},
"truncated": 2,
"non_truncated": 0,
"padded": 0,
"non_padded": 2,
"effective_few_shots": 0.0,
"num_truncated_few_shots": 0
}
},
"summary_general": {
"hashes": {
"hash_examples": "5f383c395f01096e",
"hash_full_prompts": "5f383c395f01096e",
"hash_input_tokens": "ac933feb14f96d7b",
"hash_cont_tokens": "9d03fb26f8da7277"
},
"truncated": 2,
"non_truncated": 0,
"padded": 0,
"non_padded": 2,
"num_truncated_few_shots": 0
}
}