数据集查看器文档

预览数据集

Hugging Face's logo
加入 Hugging Face 社区

并获得增强的文档体验

开始使用

预览数据集

数据集查看器提供了一个 ` /first-rows ` 端点,用于可视化数据集的前 100 行。这将让您很好地了解数据集中包含的数据类型和示例数据。

本指南向您展示如何使用数据集查看器的 ` /first-rows ` 端点来预览数据集。也欢迎使用 PostmanRapidAPIReDoc 进行尝试。

` /first-rows ` 端点接受三个查询参数

  • dataset:数据集名称,例如 nyu-mll/gluemozilla-foundation/common_voice_10_0
  • config:子集名称,例如 cola
  • split:分片名称,例如 train
Python
JavaScript
cURL
import requests
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://datasets-server.huggingface.co/first-rows?dataset=ibm/duorc&config=SelfRC&split=train"
def query():
    response = requests.get(API_URL, headers=headers)
    return response.json()
data = query()

端点响应是一个 JSON,包含两个键

  • 数据集的 features,包括列名和数据类型。
  • 数据集的前 100 行 ` rows ` 以及特定行中每列包含的内容。

例如,以下是 ` ibm/duorc ` / ` SelfRC ` 训练拆分的 ` features ` 和前 100 行 ` rows `

{
  "dataset": "ibm/duorc",
  "config": "SelfRC",
  "split": "train",
  "features": [
    {
      "feature_idx": 0,
      "name": "plot_id",
      "type": { "dtype": "string", "_type": "Value" }
    },
    {
      "feature_idx": 1,
      "name": "plot",
      "type": { "dtype": "string", "_type": "Value" }
    },
    {
      "feature_idx": 2,
      "name": "title",
      "type": { "dtype": "string", "_type": "Value" }
    },
    {
      "feature_idx": 3,
      "name": "question_id",
      "type": { "dtype": "string", "_type": "Value" }
    },
    {
      "feature_idx": 4,
      "name": "question",
      "type": { "dtype": "string", "_type": "Value" }
    },
    {
      "feature_idx": 5,
      "name": "answers",
      "type": {
        "feature": { "dtype": "string", "_type": "Value" },
        "_type": "List"
      }
    },
    {
      "feature_idx": 6,
      "name": "no_answer",
      "type": { "dtype": "bool", "_type": "Value" }
    }
  ],
  "rows": [
    {
      "row_idx": 0,
      "row": {
        "plot_id": "/m/03vyhn",
        "plot": "200 years in the future, Mars has been colonized by a high-tech company.\nMelanie Ballard (Natasha Henstridge) arrives by train to a Mars mining camp which has cut all communication links with the company headquarters. She's not alone, as she is with a group of fellow police officers. They find the mining camp deserted except for a person in the prison, Desolation Williams (Ice Cube), who seems to laugh about them because they are all going to die. They were supposed to take Desolation to headquarters, but decide to explore first to find out what happened.They find a man inside an encapsulated mining car, who tells them not to open it. However, they do and he tries to kill them. One of the cops witnesses strange men with deep scarred and heavily tattooed faces killing the remaining survivors. The cops realise they need to leave the place fast.Desolation explains that the miners opened a kind of Martian construction in the soil which unleashed red dust. Those who breathed that dust became violent psychopaths who started to build weapons and kill the uninfected. They changed genetically, becoming distorted but much stronger.The cops and Desolation leave the prison with difficulty, and devise a plan to kill all the genetically modified ex-miners on the way out. However, the plan goes awry, and only Melanie and Desolation reach headquarters alive. Melanie realises that her bosses won't ever believe her. However, the red dust eventually arrives to headquarters, and Melanie and Desolation need to fight once again.",
        "title": "Ghosts of Mars",
        "question_id": "b440de7d-9c3f-841c-eaec-a14bdff950d1",
        "question": "How did the police arrive at the Mars mining camp?",
        "answers": ["They arrived by train."],
        "no_answer": false
      },
      "truncated_cells": []
    },
    {
      "row_idx": 1,
      "row": {
        "plot_id": "/m/03vyhn",
        "plot": "200 years in the future, Mars has been colonized by a high-tech company.\nMelanie Ballard (Natasha Henstridge) arrives by train to a Mars mining camp which has cut all communication links with the company headquarters. She's not alone, as she is with a group of fellow police officers. They find the mining camp deserted except for a person in the prison, Desolation Williams (Ice Cube), who seems to laugh about them because they are all going to die. They were supposed to take Desolation to headquarters, but decide to explore first to find out what happened.They find a man inside an encapsulated mining car, who tells them not to open it. However, they do and he tries to kill them. One of the cops witnesses strange men with deep scarred and heavily tattooed faces killing the remaining survivors. The cops realise they need to leave the place fast.Desolation explains that the miners opened a kind of Martian construction in the soil which unleashed red dust. Those who breathed that dust became violent psychopaths who started to build weapons and kill the uninfected. They changed genetically, becoming distorted but much stronger.The cops and Desolation leave the prison with difficulty, and devise a plan to kill all the genetically modified ex-miners on the way out. However, the plan goes awry, and only Melanie and Desolation reach headquarters alive. Melanie realises that her bosses won't ever believe her. However, the red dust eventually arrives to headquarters, and Melanie and Desolation need to fight once again.",
        "title": "Ghosts of Mars",
        "question_id": "a9f95c0d-121f-3ca9-1595-d497dc8bc56c",
        "question": "Who has colonized Mars 200 years in the future?",
        "answers": [
          "A high-tech company has colonized Mars 200 years in the future."
        ],
        "no_answer": false
      },
      "truncated_cells": []
    }
    ...
  ],
  "truncated": false
}

截断响应

对于某些数据集,来自 ` /first-rows ` 的响应大小可能超过 1MB,在这种情况下,响应将被截断,直到大小低于 1MB。这意味着您可能无法在响应中获得 100 行,因为行已被截断,在这种情况下,` truncated ` 字段将为 ` true ` 。

在某些情况下,即使前几行生成了超过 1MB 的响应,一些列也会被截断并转换为字符串。您将在 ` truncated_cells ` 字段中看到这些列表。

例如, ` GEM/SciDuet ` 数据集只返回 10 行,并且 ` paper_abstract ` 、 ` paper_content ` 、 ` paper_headers ` 、 ` slide_content_text ` 和 ` target ` 列被截断

  ...
  "rows": [
    {
      {
         "row_idx":8,
         "row":{
            "gem_id":"GEM-SciDuet-train-1#paper-954#slide-8",
            "paper_id":"954",
            "paper_title":"Incremental Syntactic Language Models for Phrase-based Translation",
            "paper_abstract":"\"This paper describes a novel technique for incorporating syntactic knowledge into phrasebased machi",
            "paper_content":"{\"paper_content_id\":[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29",
            "paper_headers":"{\"paper_header_number\":[\"1\",\"2\",\"3\",\"3.1\",\"3.3\",\"4\",\"4.1\",\"6\",\"7\"],\"paper_header_content\":[\"Introduc",
            "slide_id":"GEM-SciDuet-train-1#paper-954#slide-8",
            "slide_title":"Does an Incremental Syntactic LM Help Translation",
            "slide_content_text":"\"but will it make my BLEU score go up?\\nMotivation Syntactic LM Decoder Integration Questions?\\nMose",
            "target":"\"but will it make my BLEU score go up?\\nMotivation Syntactic LM Decoder Integration Questions?\\nMose",
            "references":[]
         },
         "truncated_cells":[
            "paper_abstract",
            "paper_content",
            "paper_headers",
            "slide_content_text",
            "target"
         ]
      },
      {
         "row_idx":9,
         "row":{
            "gem_id":"GEM-SciDuet-train-1#paper-954#slide-9",
            "paper_id":"954",
            "paper_title":"Incremental Syntactic Language Models for Phrase-based Translation",
            "paper_abstract":"\"This paper describes a novel technique for incorporating syntactic knowledge into phrasebased machi",
            "paper_content":"{\"paper_content_id\":[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29",
            "paper_headers":"{\"paper_header_number\":[\"1\",\"2\",\"3\",\"3.1\",\"3.3\",\"4\",\"4.1\",\"6\",\"7\"],\"paper_header_content\":[\"Introduc",
            "slide_id":"GEM-SciDuet-train-1#paper-954#slide-9",
            "slide_title":"Perplexity Results",
            "slide_content_text":"\"Language models trained on WSJ Treebank corpus\\nMotivation Syntactic LM Decoder Integration Questio",
            "target":"\"Language models trained on WSJ Treebank corpus\\nMotivation Syntactic LM Decoder Integration Questio",
            "references":[
               
            ]
         },
         "truncated_cells":[
            "paper_abstract",
            "paper_content",
            "paper_headers",
            "slide_content_text",
            "target"
         ]
      }
      "truncated_cells": ["target", "feat_dynamic_real"]
    },
  ...
  ],
  truncated: true
< > 在 GitHub 上更新