Lighteval 文档

为多语言评估做贡献

Hugging Face's logo
加入 Hugging Face 社区

并获得增强的文档体验

开始使用

为多语言评估做贡献

贡献少量翻译

我们定义了 19 个 literals,它们是在自动创建评估提示时使用的基本关键字或标点符号,例如 yesnobecause 等。

我们欢迎您提供您所用语言的翻译!

要做出贡献,您需要:

  1. 打开 translation_literals 文件
  2. 编辑该文件,为您感兴趣的语言添加或扩展字面量。
    Language.ENGLISH: TranslationLiterals(
        language=Language.ENGLISH,
        question_word="question", # Usage: "Question: How are you?"
        answer="answer", # Usage: "Answer: I am fine"
        confirmation_word="right", # Usage: "He is smart, right?"
        yes="yes", # Usage: "Yes, he is"
        no="no", # Usage: "No, he is not"
        also="also", # Usage: "Also, she is smart."
        cause_word="because", # Usage: "She is smart, because she is tall"
        effect_word="therefore", # Usage: "He is tall therefore he is smart"
        or_word="or", # Usage: "He is tall or small"
        true="true", # Usage: "He is smart, true, false or neither?"
        false="false", # Usage: "He is smart, true, false or neither?"
        neither="neither", # Usage: "He is smart, true, false or neither?"
        # Punctuation and spacing: only adjust if your language uses something different than in English
        full_stop=".",
        comma=",",
        question_mark="?",
        exclamation_mark="!",
        word_space=" ",
        sentence_space=" ",
        colon=":",
        # The first characters of your alphabet used in enumerations, if different from English
        indices=["A", "B", "C", ...]
    )
  1. 提交包含您修改的 PR!就是这样!

贡献新的多语言任务

您应该首先阅读我们关于添加自定义任务的指南,以便更好地理解我们使用的不同参数。

然后,您应该查看当前的多语言任务文件,以了解它们是如何定义的。对于多语言评估,prompt_function 应通过适应语言的模板来实现。该模板将负责正确的格式化、正确且一致地使用适应语言的提示锚点(例如,Question/Answer)和标点符号。

在此处浏览所有模板列表,查看哪些最适合您自己的任务。

然后,准备好后,要定义您自己的任务,您应该:

  1. 按照上述指南创建一个 Python 文件
  2. 为您的任务类型导入相关模板(XNLI、Copa、多项选择、问答等)
  3. 使用我们可参数化的 LightevalTaskConfig 类,为每种相关语言和评估表述(用于多项选择)定义一个或一组任务
your_tasks = [
    LightevalTaskConfig(
        # Name of your evaluation
        name=f"evalname_{language.value}_{formulation.name.lower()}",
        # The evaluation is community contributed
        suite=["community"],
        # This will automatically get the correct metrics for your chosen formulation
        metric=get_metrics_for_formulation(
            formulation,
            [
                loglikelihood_acc_metric(normalization=None),
                loglikelihood_acc_metric(normalization=LogProbTokenNorm()),
                loglikelihood_acc_metric(normalization=LogProbCharNorm()),
            ],
        ),
        # In this function, you choose which template to follow and for which language and formulation
        prompt_function=get_template_prompt_function(
            language=language,
            # then use the adapter to define the mapping between the
            # keys of the template (left), and the keys of your dataset
            # (right)
            # To know which template keys are required and available,
            # consult the appropriate adapter type and doc-string.
            adapter=lambda line: {
                "key": line["relevant_key"],
                ...
            },
            formulation=formulation,
        ),
        # You can also add specific filters to remove irrelevant samples
        hf_filter=lambda line: line["label"] in <condition>,
        # You then select your huggingface dataset as well as
        # the splits available for evaluation
        hf_repo=<dataset>,
        hf_subset=<subset>,
        evaluation_splits=["train"],
        hf_avail_splits=["train"],
    )
    for language in [
        Language.YOUR_LANGUAGE, ...
    ]
    for formulation in [MCFFormulation(), CFFormulation(), HybridFormulation()]
]
  1. 然后,您可以返回指南,测试您的任务是否已正确实现!

所有 LightevalTaskConfig 参数都是强类型的,包括模板函数的输入。请确保利用您 IDE 的功能,以便更容易地正确填写这些参数。

一切就绪后,提交一个 PR,我们很乐意对其进行审查!

< > 在 GitHub 上更新