使用 AutoTrain SpaceRunner 在 Hugging Face Spaces 上训练自定义模型

社区文章发布于 2024 年 5 月 9 日

你知道吗？你可以在 Hugging Face Spaces 上训练自己的模型！是的，这完全可能，而且使用 AutoTrain SpaceRunner 还能超级简单地完成 💥 你所需要的只是一个 Hugging Face 账户（你可能已经有了）和一个绑定了支付方式的账户（如果你想使用 GPU 的话，CPU 训练是免费的！）。所以，别再花时间在其他云服务提供商上配置环境了，直接用 AutoTrain SpaceRunner 来训练你的模型吧：训练环境已经为你准备好了，你还可以安装/卸载项目所需的任何依赖项！听起来很激动人心？让我们来看看怎么做吧！

第一步是创建一个项目文件夹。项目文件夹里可以包含任何东西，但必须有一个 script.py 文件。这个脚本文件是入口点。

-- my_project
---- some_module
---- some_other_module
---- script.py
---- requirements.txt

requirements.txt 是可选的，只有当你需要添加/删除某些依赖时才需要它。例如，下面的 requirements.txt 文件会移除预装的 xgboost，然后安装 catboost。

-xgboost
catboost

包名前的 - 表示卸载。

script.py 应该是什么样的？

嗯，你可以按你喜欢的方式编写。下面是一个示例：

for _ in range(10):
    print("Hello World!")

你可以在 script.py 中做任何你想做的事情。只要本地模块存在于项目目录中，你也可以导入它们。

最后一步是在 Spaces 上运行代码。操作方法如下。

如果还没安装 AutoTrain，请先安装：pip install -U autotrain-advanced。然后你可以运行 autotrain spacerunner --help。这将显示所有需要的参数。

❯ autotrain spacerunner --help
usage: autotrain <command> [<args>] spacerunner [-h] --project-name PROJECT_NAME --script-path SCRIPT_PATH --username USERNAME --token TOKEN
                                                --backend {spaces-a10gl,spaces-a10gs,spaces-a100,spaces-t4m,spaces-t4s,spaces-cpu,spaces-cpuf}
                                                [--env ENV] [--args ARGS]

✨ Run AutoTrain SpaceRunner

options:
  -h, --help            show this help message and exit
  --project-name PROJECT_NAME
                        Name of the project. Must be unique.
  --script-path SCRIPT_PATH
                        Path to the script
  --username USERNAME   Hugging Face Username, can also be an organization name
  --token TOKEN         Hugging Face API Token
  --backend {spaces-a10gl,spaces-a10gs,spaces-a100,spaces-t4m,spaces-t4s,spaces-cpu,spaces-cpuf}
                        Hugging Face backend to use
  --env ENV             Environment variables, e.g. --env FOO=bar;FOO2=bar2;FOO3=bar3
  --args ARGS           Arguments to pass to the script, e.g. --args foo=bar;foo2=bar2;foo3=bar3;store_true_arg

--project-name 是在 Hugging Face Hub 上创建 space 和数据集（包含你的项目文件）的唯一名称。所有内容都以私密方式存储，脚本运行完毕后你可以删除它。

--script-path 是包含 script.py 的目录的本地路径。

需要传递环境变量吗？使用 --env。如果你需要向 script.py 传递参数，请使用 --args。

你可以选择任何 spaces-* 后端来运行你的代码。任务完成后，space 会自动暂停（从而为你省钱）🚀

这是一个示例命令：

$ autotrain spacerunner \
    --project-name custom_llama_training \
    --script-path /path/to/script/py/ \
    --username abhishek \
    --token $HF_WRITE_TOKEN \
    --backend spaces-a10g-large \
    --args padding=right;push_to_hub
    --env TOKENIZERS_PARALLELISM=false;TRANSFORMERS_VERBOSITY=error

在本地，该脚本的运行方式如下：

$ TOKENIZERS_PARALLELISM=false;TRANSFORMERS_VERBOSITY=error python script.py --padding right --push_to_hub

可用后端

"spaces-a10g-large": "a10g-large",
"spaces-a10g-small": "a10g-small",
"spaces-a100-large": "a100-large",
"spaces-t4-medium": "t4-medium",
"spaces-t4-small": "t4-small",
"spaces-cpu-upgrade": "cpu-upgrade",
"spaces-cpu-basic": "cpu-basic",
"spaces-l4x1": "l4x1",
"spaces-l4x4": "l4x4",
"spaces-a10g-largex2": "a10g-largex2",
"spaces-a10g-largex4": "a10g-largex4",

运行 spacerunner 命令后，你会得到一个 space 链接，用于监控你的训练过程。就是这么简单！

注意：autotrain spacerunner 不会自动保存产物，所以你必须在 script.py 中编写代码来保存产物/输出。另外，建议将它们保存在一个 Hugging Face 数据集仓库中 ;)

有任何问题、评论、功能请求或 issue 吗？请使用 AutoTrain Advanced 的 GitHub issues 页面：https://github.com/huggingface/autotrain-advanced ⭐️

社区

9voltfan2009

7月11日

说真的，这确实是个很棒的功能。

通过拖放到文本输入框、粘贴或点击此处上传图片、音频和视频。

点击或粘贴此处以上传图片

· 注册或登录以发表评论