隆重推出 AI Sheets：一个使用 OpenAI 模型处理数据集的工具！

发布于 2025 年 8 月 8 日

在 GitHub 上更新

🧭内容摘要

Hugging Face AI Sheets 是一个新的开源工具，无需编码即可使用 AI 模型来构建、丰富和转换数据集。该工具可以本地部署，也可以在 Hub 上部署。它允许您通过推理服务提供商 (Inference Providers) 或本地模型使用 Hugging Face Hub 上数以千计的开放模型，包括来自 OpenAI 的 gpt-oss！

实用链接

免费试用该工具 (无需安装)： https://huggingface.co/spaces/aisheets/sheets
本地安装并运行： https://github.com/huggingface/sheets

什么是 AI Sheets

AI Sheets 是一个无需编码的工具，用于使用 (开源) AI 模型构建、转换和丰富数据集。它与 Hub 和开源 AI 生态系统紧密集成。

AI Sheets 采用类似电子表格的、易于上手的用户界面。该工具围绕快速实验而构建，从小型数据集开始，然后再运行耗时/成本高昂的数据生成流水线。

在 AI Sheets 中，通过编写提示来创建新列，您可以根据需要进行任意多次迭代，并编辑/验证单元格以“教”会模型您想要什么。但稍后会详细介绍！

我可以用它做什么

您可以使用 AI Sheets 来

比较和“氛围测试”模型。 想象一下，您想在您的数据上测试最新的模型。您可以导入一个带有提示/问题的数据集，并创建多个列 (每个模型一列)，使用这样的提示：回答以下问题：{{prompt}}，其中 prompt 是数据集中的一列。您可以手动验证结果，或使用“LLM-as-a-judge” (大语言模型作为评判者) 提示创建一个新列，例如：评估对以下问题的回答：{{prompt}}。回答 1：{{model1}}。回答 2：{{model2}}，其中 model1 和 model2 是数据集中包含不同模型回答的列。

为您的数据和特定模型改进提示。 想象一下，您想构建一个应用程序来处理客户请求并给出自动答复。您可以加载一个包含客户请求的样本数据集，并开始尝试和迭代不同的提示和模型来生成响应。AI Sheets 的一个很酷的功能是，您可以通过编辑或验证单元格来提供反馈。这些示例单元格将自动添加到您的提示中。您可以将其视为一个工具，通过实时查看您的数据来微调提示并非常高效地向提示中添加少样本示例 (few-shot examples)！

转换数据集。 想象一下，您想清理数据集中的一列。您可以添加一个新列，并使用这样的提示：从以下文本中删除多余的标点符号：{{text}}，其中 text 是数据集中包含您想清理的文本的列。

对数据集进行分类。 想象一下，您想对数据集中的某些内容进行分类。您可以添加一个新列，并使用这样的提示：对以下文本进行分类：{{text}}，其中 text 是数据集中包含您想分类的文本的列。

分析数据集。 想象一下，您想提取数据集中的主要思想。您可以添加一个新列，并使用这样的提示：从以下内容中提取最重要的思想：{{text}}，其中 text 是数据集中包含您想分析的文本的列。

丰富数据集。 想象一下，您有一个数据集，其中地址缺少邮政编码。您可以添加一个新列，并使用这样的提示：查找以下地址的邮政编码：{{address}} (在这种情况下，您必须启用“搜索网页”选项以确保结果准确)。

生成合成数据集。 想象一下，您需要一个包含真实电子邮件的数据集，但由于数据隐私原因无法获得这些数据。您可以创建一个数据集，并使用这样的提示：写一段关于制药公司领域专业人士的简短描述，并将该列命名为 person_bio。然后，您可以创建另一列，并使用这样的提示：写一封真实的专业电子邮件，就像由以下人士撰写的一样：{{person_bio}}。

现在让我们深入了解如何使用它！

如何使用

AI Sheets 为您提供了两种开始方式：导入现有数据或从头开始生成数据集。数据加载后，您可以通过添加列、编辑单元格和重新生成内容来优化它。

开始使用

要开始使用，您需要用自然语言描述来从头创建一个数据集，或导入一个现有数据集。

从头开始生成数据集

最适合： 熟悉 AI Sheets、头脑风暴、快速实验和创建测试数据集。

您可以将此功能视为自动数据集或“提示到数据集”功能——您描述您想要什么，AI Sheets 就会为您创建整个数据集结构和内容。

何时使用此功能

您是第一次探索 AI Sheets
您需要合成数据用于测试或原型设计
数据准确性和多样性不是关键 (例如，头脑风暴用例、快速研究、生成测试数据集)
您想快速试验想法

工作原理

在提示区域描述您想要的数据集
- 例如：“一份虚构的初创公司列表，包含名称、行业和口号”
AI Sheets 会生成模式并创建 5 个样本行
可扩展至最多 1000 行，或修改提示以更改结构

示例

如果您输入此提示：世界上的城市，以及它们所属的国家，并为每个城市生成一张吉卜力风格的地标图像

AI Sheets 将自动生成一个包含三列的数据集，如下所示：

这个数据集只包含五行，但您可以通过向下拖动每一列来添加更多单元格，包括图像列！您还可以在任何单元格中写入项目，然后通过拖动来完成其他单元格。

以下部分将向您展示如何迭代和扩展数据集。

导入您的数据集 (推荐)

最适合： 大多数您希望转换、分类、丰富和分析真实世界数据的用例。

对于大多数用例，推荐使用此方法，因为导入真实数据比从头开始提供了更多的控制和灵活性。

何时使用此功能

您有现有数据需要使用 AI 模型进行转换或丰富
您想生成合成数据，并且准确性和多样性很重要

工作原理

以 XLS、TSV、CSV 或 Parquet 格式上传您的数据
确保您的文件至少包含一个列名和一行数据
上传最多 1000 行 (列数不限)
您的数据以熟悉的电子表格格式显示

专业提示： 如果您的文件包含的数据很少，您可以通过直接在电子表格中输入来手动添加更多条目。

处理您的数据集

一旦您的数据加载完毕 (无论您是如何开始的)，您将在一个可编辑的电子表格界面中看到它。以下是您需要了解的内容

了解 AI Sheets

导入的单元格： 可手动编辑，但不能通过 AI 提示修改
AI 生成的单元格： 可以使用提示和您的反馈 (编辑 + 点赞) 进行重新生成和优化
新列： 始终由 AI 驱动并完全可定制

开始使用 AI 列

点击“+”按钮添加新列
从推荐的操作中选择
- 提取特定信息
- 总结长文本
- 翻译内容
- 或使用“对 {{column}} 做些什么”编写自定义提示

优化和扩展数据集

现在您有了 AI 列，您可以改进它们的结果并扩展您的数据。您可以通过手动编辑和点赞提供反馈来改进结果，或者调整列配置。这两种方法都需要重新生成才能生效。

1. 如何添加更多单元格

向下拖动： 从列中的最后一个单元格向下拖动，以立即生成额外的行
无需重新生成 - 新单元格即时创建
您也可以用这种方法重新生成出错的单元格

2. 手动编辑和反馈

编辑单元格： 单击任何单元格直接编辑内容 - 这为模型提供了您偏好输出的示例
点赞结果： 使用竖起大拇指的图标标记好的输出示例
重新生成以将反馈应用于列中的其他单元格。

在后台，这些手动编辑和点赞的单元格将在您重新生成或在列中添加更多单元格时，用作生成单元格的少样本示例！

3. 调整列配置 更改提示、切换模型或提供商、或修改设置，然后重新生成以获得更好的结果。

重写提示

每一列都有其生成提示
随时编辑以更改或改进输出
列会用新结果重新生成

切换模型/提供商

尝试不同的模型以获得不同的性能或进行比较。
对于特定任务，某些模型比其他模型更准确、更有创意或结构更清晰。
一些提供商具有更快的推理速度和不同的上下文长度；为所选模型测试不同的提供商。

切换搜索

启用：模型从网络上拉取最新信息
禁用：离线，仅使用模型生成

将最终数据集导出到 Hub

当您对新数据集满意后，可以将其导出到 Hub！这样做还有一个额外的好处，即生成一个配置文件，您可以重复使用该文件来 (1) 使用 HF jobs 通过此脚本生成更多数据，以及 (2) 在下游应用中重复使用提示，包括来自您编辑和点赞单元格的少样本示例。

这是一个使用 AISheets 创建的数据集示例，它生成此配置。

使用 HF Jobs 运行数据生成脚本

如果您想生成更大的数据集，可以使用上述的配置和脚本，像这样

hf jobs uv run \
-s HF_TOKEN=$HF_TOKEN \
https://huggingface.co/datasets/aisheets/uv-scripts/raw/main/extend_dataset/script.py \ # script for running the pipeline
--config https://huggingface.co/datasets/dvilasuero/nemotron-personas-kimi-questions/raw/main/config.yml \ # config with prompts
--num-rows 100 \ # limit to 100 rows, leave empty for the full dataset
nvidia/Nemotron-Personas dvilasuero/nemotron-kimi-qa-distilled

示例

本节提供了您可以使用 AI Sheets 构建的数据集示例，以激发您下一个项目的灵感。

模型氛围测试与比较

如果您想在您关心的数据和不同提示上测试最新的模型，AI Sheets 是您的完美伴侣。

您只需导入一个数据集 (或从头创建一个)，然后为您想测试的模型添加不同的列。

然后，您可以手动检查结果，或添加一列以使用 LLM 来评判每个模型的质量。

以下是一个示例，比较了用于迷你 Web 应用的开源前沿模型。AI Sheets 让您可以看到交互式结果并试玩每个应用。此外，该数据集还包括几列使用 LLM 来评判和比较应用质量。

从我们刚才描述的会话中导出的示例数据集： https://huggingface.co/datasets/dvilasuero/jsvibes-qwen-gpt-oss-judged

配置

columns:
  gpt-oss:
    modelName: openai/gpt-oss-120b
    modelProvider: groq
    userPrompt: Create a complete, runnable HTML+JS file implementing {{description}}
    searchEnabled: false
    columnsReferences:
      - description
  eval-qwen-coder:
    modelName: Qwen/Qwen3-Coder-480B-A35B-Instruct
    modelProvider: cerebras
    userPrompt: "Please compare the two apps and tell me which one is better and why:\n\nApp description:\n\n{{description}}\n\nmodel 1:\n\n{{qwen3-coder}}\n\nmodel 2:\n\n{{gpt-oss}}\n\nKeep it very short and focus on whether they work well for the purpose, make sure they work and are not incomplete, and the code quality, not on visual appeal and unrequested features. Assume the models might provide non working solutions, so be careful to assess that\n\nRespond with:\n\nchosen: {model 1, model 2}\n\nreason: ..."
    searchEnabled: false
    columnsReferences:
      - gpt-oss
      - description
      - qwen3-coder
  eval-gpt-oss:
    modelName: openai/gpt-oss-120b
    modelProvider: groq
    userPrompt: "Please compare the two apps and tell me which one is better and why:\n\nApp description:\n\n{{description}}\n\nmodel 1:\n\n{{qwen3-coder}}\n\nmodel 2:\n\n{{gpt-oss}}\n\nKeep it very short and focus on whether they work well for the purpose, make sure they work and are not incomplete, and the code quality, not on visual appeal and unrequested features. Assume the models might provide non working solutions, so be careful to assess that\n\nRespond with:\n\nchosen: {model 1, model 2}\n\nreason: ..."
    searchEnabled: false
    columnsReferences:
      - gpt-oss
      - description
      - qwen3-coder
  eval-kimi:
    modelName: moonshotai/Kimi-K2-Instruct
    modelProvider: groq
    userPrompt: "Please compare the two apps and tell me which one is better and why:\n\nApp description:\n\n{{description}}\n\nmodel 1:\n\n{{qwen3-coder}}\n\nmodel 2:\n\n{{gpt-oss}}\n\nKeep it very short and focus on whether they work well for the purpose, make sure they work and are not incomplete, and the code quality, not on visual appeal and unrequested features. Assume the models might provide non working solutions, so be careful to assess that\n\nRespond with:\n\nchosen: {model 1, model 2}\n\nreason: ..."
    searchEnabled: false
    columnsReferences:
      - gpt-oss
      - description
      - qwen3-coder

向 Hub 数据集添加分类

AI Sheets 还可以增强现有数据集，并帮助您快速完成涉及分析文本数据集的数据分析和数据科学项目。

这是一个向现有 Hub 数据集添加分类的示例。

一个很酷的功能是，您可以手动验证或编辑初始分类输出，然后重新生成整列以改进结果，如下所示

配置

columns:
  category:
    modelName: moonshotai/Kimi-K2-Instruct
    modelProvider: groq
    userPrompt: |-
      Categorize the main topics of the following question:

      {{question}}
    prompt: "

      You are a rigorous, intelligent data-processing engine. Generate only the
      requested response format, with no explanations following the user
      instruction. You might be provided with positive, accurate examples of how
      the user instruction must be completed.

      # Examples

      The following are correct, accurate example outputs with respect to the
      user instruction:

      ## Example

      ### Input

      question: Given the area of a parallelogram is 420 square centimeters and
      its height is 35 cm, find the corresponding base. Show all work and label
      your answer.

      ### Output

      Mathematics – Geometry

      ## Example

      ### Input

      question: What is the minimum number of red squares required to ensure
      that each of $n$ green axis-parallel squares intersects 4 red squares,
      assuming the green squares can be scaled and translated arbitrarily
      without intersecting each other?

      ### Output

      Geometry, Combinatorics
      # User instruction

      Categorize the main topics of the following question:

      {{question}}

      # Your response
      "
    searchEnabled: false
    columnsReferences:
      - question

使用“LLM-as-Judge”评估模型

另一个用例是使用“LLM-as-a-judge” (大语言模型作为评判者) 方法来评估模型输出。这对于比较模型或评估现有数据集的质量非常有用，例如，在 Hugging Face Hub 上的现有数据集上微调模型。

在第一个示例中，我们将氛围测试与一个评判 LLM 列相结合。这是评判提示

示例数据集： https://huggingface.co/datasets/dvilasuero/jsvibes-qwen-gpt-oss-judged

配置

columns:
  object_name:
    modelName: meta-llama/Llama-3.3-70B-Instruct
    modelProvider: groq
    userPrompt: Generate the name of a common day to day object
    searchEnabled: false
    columnsReferences: []
  object_description:
    modelName: meta-llama/Llama-3.3-70B-Instruct
    modelProvider: groq
    userPrompt: Describe a {{object_name}} with adjectives and short word groups separated by commas. No more than 10 words
    searchEnabled: false
    columnsReferences:
      - object_name
  object_image_with_desc:
    modelName: multimodalart/isometric-skeumorphic-3d-bnb
    modelProvider: fal-ai
    userPrompt: RBNBICN, icon, white background, isometric perspective, {{object_name}} , {{object_description}}
    searchEnabled: false
    columnsReferences:
      - object_description
      - object_name
  object_image_without_desc:
    modelName: multimodalart/isometric-skeumorphic-3d-bnb
    modelProvider: fal-ai
    userPrompt: "RBNBICN, icon, white background, isometric perspective, {{object_name}} "
    searchEnabled: false
    columnsReferences:
      - object_name
  glowing_colors:
    modelName: multimodalart/isometric-skeumorphic-3d-bnb
    modelProvider: fal-ai
    userPrompt: "RBNBICN, icon, white background, isometric perspective, {{object_name}}, glowing colors "
    searchEnabled: false
    columnsReferences:
      - object_name
  flux:
    modelName: black-forest-labs/FLUX.1-dev
    modelProvider: fal-ai
    userPrompt: Create an isometric icon for the object {{object_name}} based on {{object_description}}
    searchEnabled: false
    columnsReferences:
      - object_description
      - object_name