Dataset viewer 文档

预览数据集

Hugging Face's logo
加入 Hugging Face 社区

并获得增强的文档体验

开始使用

预览数据集

dataset viewer 提供了一个 /first-rows 端点,用于可视化数据集的前 100 行。这将让您很好地了解数据集中包含的数据类型和示例数据。

本指南向您展示如何使用 dataset viewer 的 /first-rows 端点来预览数据集。您也可以随意尝试使用 PostmanRapidAPIReDoc

/first-rows 端点接受三个查询参数

  • dataset:数据集名称,例如 nyu-mll/gluemozilla-foundation/common_voice_10_0
  • config:子集名称,例如 cola
  • split:拆分名称,例如 train
Python
JavaScript
cURL
import requests
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://datasets-server.huggingface.co/first-rows?dataset=ibm/duorc&config=SelfRC&split=train"
def query():
    response = requests.get(API_URL, headers=headers)
    return response.json()
data = query()

端点响应是一个包含两个键的 JSON

  • 数据集的 features,包括列的名称和数据类型。
  • 数据集的前 100 rows 以及特定行的每列中包含的内容。

例如,以下是 ibm/duorc/SelfRC train 拆分的 features 和前 100 rows

{
  "dataset": "ibm/duorc",
  "config": "SelfRC",
  "split": "train",
  "features": [
    {
      "feature_idx": 0,
      "name": "plot_id",
      "type": { "dtype": "string", "_type": "Value" }
    },
    {
      "feature_idx": 1,
      "name": "plot",
      "type": { "dtype": "string", "_type": "Value" }
    },
    {
      "feature_idx": 2,
      "name": "title",
      "type": { "dtype": "string", "_type": "Value" }
    },
    {
      "feature_idx": 3,
      "name": "question_id",
      "type": { "dtype": "string", "_type": "Value" }
    },
    {
      "feature_idx": 4,
      "name": "question",
      "type": { "dtype": "string", "_type": "Value" }
    },
    {
      "feature_idx": 5,
      "name": "answers",
      "type": {
        "feature": { "dtype": "string", "_type": "Value" },
        "_type": "Sequence"
      }
    },
    {
      "feature_idx": 6,
      "name": "no_answer",
      "type": { "dtype": "bool", "_type": "Value" }
    }
  ],
  "rows": [
    {
      "row_idx": 0,
      "row": {
        "plot_id": "/m/03vyhn",
        "plot": "200 years in the future, Mars has been colonized by a high-tech company.\nMelanie Ballard (Natasha Henstridge) arrives by train to a Mars mining camp which has cut all communication links with the company headquarters. She's not alone, as she is with a group of fellow police officers. They find the mining camp deserted except for a person in the prison, Desolation Williams (Ice Cube), who seems to laugh about them because they are all going to die. They were supposed to take Desolation to headquarters, but decide to explore first to find out what happened.They find a man inside an encapsulated mining car, who tells them not to open it. However, they do and he tries to kill them. One of the cops witnesses strange men with deep scarred and heavily tattooed faces killing the remaining survivors. The cops realise they need to leave the place fast.Desolation explains that the miners opened a kind of Martian construction in the soil which unleashed red dust. Those who breathed that dust became violent psychopaths who started to build weapons and kill the uninfected. They changed genetically, becoming distorted but much stronger.The cops and Desolation leave the prison with difficulty, and devise a plan to kill all the genetically modified ex-miners on the way out. However, the plan goes awry, and only Melanie and Desolation reach headquarters alive. Melanie realises that her bosses won't ever believe her. However, the red dust eventually arrives to headquarters, and Melanie and Desolation need to fight once again.",
        "title": "Ghosts of Mars",
        "question_id": "b440de7d-9c3f-841c-eaec-a14bdff950d1",
        "question": "How did the police arrive at the Mars mining camp?",
        "answers": ["They arrived by train."],
        "no_answer": false
      },
      "truncated_cells": []
    },
    {
      "row_idx": 1,
      "row": {
        "plot_id": "/m/03vyhn",
        "plot": "200 years in the future, Mars has been colonized by a high-tech company.\nMelanie Ballard (Natasha Henstridge) arrives by train to a Mars mining camp which has cut all communication links with the company headquarters. She's not alone, as she is with a group of fellow police officers. They find the mining camp deserted except for a person in the prison, Desolation Williams (Ice Cube), who seems to laugh about them because they are all going to die. They were supposed to take Desolation to headquarters, but decide to explore first to find out what happened.They find a man inside an encapsulated mining car, who tells them not to open it. However, they do and he tries to kill them. One of the cops witnesses strange men with deep scarred and heavily tattooed faces killing the remaining survivors. The cops realise they need to leave the place fast.Desolation explains that the miners opened a kind of Martian construction in the soil which unleashed red dust. Those who breathed that dust became violent psychopaths who started to build weapons and kill the uninfected. They changed genetically, becoming distorted but much stronger.The cops and Desolation leave the prison with difficulty, and devise a plan to kill all the genetically modified ex-miners on the way out. However, the plan goes awry, and only Melanie and Desolation reach headquarters alive. Melanie realises that her bosses won't ever believe her. However, the red dust eventually arrives to headquarters, and Melanie and Desolation need to fight once again.",
        "title": "Ghosts of Mars",
        "question_id": "a9f95c0d-121f-3ca9-1595-d497dc8bc56c",
        "question": "Who has colonized Mars 200 years in the future?",
        "answers": [
          "A high-tech company has colonized Mars 200 years in the future."
        ],
        "no_answer": false
      },
      "truncated_cells": []
    }
    ...
  ],
  "truncated": false
}

截断的响应

对于某些数据集,来自 /first-rows 的响应大小可能超过 1MB,在这种情况下,响应将被截断,直到大小小于 1MB。这意味着您可能无法在响应中获得 100 行,因为行被截断,在这种情况下,truncated 字段将为 true

在某些情况下,即使前几行生成的响应超过 1MB,某些列也会被截断并转换为字符串。您将在 truncated_cells 字段中看到这些列。

例如,GEM/SciDuet 数据集仅返回 10 行,并且 paper_abstractpaper_contentpaper_headersslide_content_texttarget 列被截断

  ...
  "rows": [
    {
      {
         "row_idx":8,
         "row":{
            "gem_id":"GEM-SciDuet-train-1#paper-954#slide-8",
            "paper_id":"954",
            "paper_title":"Incremental Syntactic Language Models for Phrase-based Translation",
            "paper_abstract":"\"This paper describes a novel technique for incorporating syntactic knowledge into phrasebased machi",
            "paper_content":"{\"paper_content_id\":[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29",
            "paper_headers":"{\"paper_header_number\":[\"1\",\"2\",\"3\",\"3.1\",\"3.3\",\"4\",\"4.1\",\"6\",\"7\"],\"paper_header_content\":[\"Introduc",
            "slide_id":"GEM-SciDuet-train-1#paper-954#slide-8",
            "slide_title":"Does an Incremental Syntactic LM Help Translation",
            "slide_content_text":"\"but will it make my BLEU score go up?\\nMotivation Syntactic LM Decoder Integration Questions?\\nMose",
            "target":"\"but will it make my BLEU score go up?\\nMotivation Syntactic LM Decoder Integration Questions?\\nMose",
            "references":[]
         },
         "truncated_cells":[
            "paper_abstract",
            "paper_content",
            "paper_headers",
            "slide_content_text",
            "target"
         ]
      },
      {
         "row_idx":9,
         "row":{
            "gem_id":"GEM-SciDuet-train-1#paper-954#slide-9",
            "paper_id":"954",
            "paper_title":"Incremental Syntactic Language Models for Phrase-based Translation",
            "paper_abstract":"\"This paper describes a novel technique for incorporating syntactic knowledge into phrasebased machi",
            "paper_content":"{\"paper_content_id\":[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29",
            "paper_headers":"{\"paper_header_number\":[\"1\",\"2\",\"3\",\"3.1\",\"3.3\",\"4\",\"4.1\",\"6\",\"7\"],\"paper_header_content\":[\"Introduc",
            "slide_id":"GEM-SciDuet-train-1#paper-954#slide-9",
            "slide_title":"Perplexity Results",
            "slide_content_text":"\"Language models trained on WSJ Treebank corpus\\nMotivation Syntactic LM Decoder Integration Questio",
            "target":"\"Language models trained on WSJ Treebank corpus\\nMotivation Syntactic LM Decoder Integration Questio",
            "references":[
               
            ]
         },
         "truncated_cells":[
            "paper_abstract",
            "paper_content",
            "paper_headers",
            "slide_content_text",
            "target"
         ]
      }
      "truncated_cells": ["target", "feat_dynamic_real"]
    },
  ...
  ],
  truncated: true
< > 更新 在 GitHub 上