主类

SetFitModel

class setfit.SetFitModel

( model_body: typing.Optional[sentence_transformers.SentenceTransformer.SentenceTransformer] = None model_head: typing.Union[setfit.modeling.SetFitHead, sklearn.linear_model._logistic.LogisticRegression, NoneType] = None multi_target_strategy: typing.Optional[str] = None normalize_embeddings: bool = False labels: typing.Optional[typing.List[str]] = None model_card_data: typing.Optional[setfit.model_card.SetFitModelCardData] = None sentence_transformers_kwargs: typing.Optional[typing.Dict] = None **kwargs )

一个集成了 Hugging Face Hub 的 SetFit 模型。

示例

>>> from setfit import SetFitModel
>>> model = SetFitModel.from_pretrained("tomaarsen/setfit-bge-small-v1.5-sst2-8-shot")
>>> model.predict([
...     "It's a charming and often affecting journey.",
...     "It's slow -- very, very slow.",
...     "A sometimes tedious film.",
... ])
['positive', 'negative', 'negative']

from_pretrained

< 来源 >

( force_download: bool = False resume_download: typing.Optional[bool] = None proxies: typing.Optional[typing.Dict] = None token: typing.Union[bool, str, NoneType] = None cache_dir: typing.Union[str, pathlib.Path, NoneType] = None local_files_only: bool = False revision: typing.Optional[str] = None **model_kwargs )

参数

pretrained_model_name_or_path (str, Path) —
- Hub 上模型的 model_id (字符串)，例如 bigscience/bloom。
- 或者包含使用 [~transformers.PreTrainedModel.save_pretrained] 保存的模型权重的目录路径，例如 ../path/to/my_model_directory/。
revision (str, 可选) — Hub 上模型的修订版本。可以是分支名称、git 标签或任何提交 ID。默认为 main 分支上的最新提交。
force_download (bool, 可选, 默认为 False) — 是否强制（重新）从 Hub 下载模型权重和配置文件，覆盖现有缓存。
proxies (Dict[str, str], 可选) — 要按协议或端点使用的代理服务器字典，例如 {‘http’: ‘foo.bar:3128’, ‘http://hostname’: ‘foo.bar:4012’}。每个请求都会使用代理。
token (str 或 bool, 可选) — 用于远程文件的 HTTP Bearer 授权令牌。默认情况下，它将使用运行 hf auth login 时缓存的令牌。
cache_dir (str, Path, 可选) — 缓存文件存储的文件夹路径。
local_files_only (bool, 可选, 默认为 False) — 如果为 True，则避免下载文件，如果本地缓存文件存在则返回其路径。
labels (List[str], 可选) — 如果标签是 0 到 num_classes-1 范围内的整数，则这些标签表示相应的标签。
model_card_data (SetFitModelCardData, 可选) — 一个 SetFitModelCardData 实例，存储模型语言、许可证、数据集名称等数据，用于自动生成的模型卡。
multi_target_strategy (str, 可选) — 与多标签分类一起使用的策略。可以是 “one-vs-rest”、“multi-output” 或 “classifier-chain” 之一。
use_differentiable_head (bool, 可选) — 是否使用可微分（即 Torch）头部而不是逻辑回归来加载 SetFit。
normalize_embeddings (bool, 可选) — 是否对 Sentence Transformer 主体生成的嵌入应用归一化。
device (Union[torch.device, str], 可选) — 加载 SetFit 模型的设备，例如 “cuda:0”、 “mps” 或 torch.device(“cuda”)。
trust_remote_code (bool, 默认为 False) — 是否允许在 Hub 上自己的建模文件中定义的自定义 Sentence Transformers 模型。此选项仅应设置为您信任且已阅读其代码的仓库，因为它将在您的本地机器上执行 Hub 上存在的代码。默认为 False。

从 Huggingface Hub 下载模型并实例化它。

示例

>>> from setfit import SetFitModel
>>> model = SetFitModel.from_pretrained(
...     "sentence-transformers/paraphrase-mpnet-base-v2",
...     labels=["positive", "negative"],
... )

save_pretrained

< 来源 >

( save_directory: typing.Union[str, pathlib.Path] config: typing.Union[dict, huggingface_hub.hub_mixin.DataclassInstance, NoneType] = None repo_id: typing.Optional[str] = None push_to_hub: bool = False model_card_kwargs: typing.Optional[typing.Dict[str, typing.Any]] = None **push_to_hub_kwargs ) → str 或 None

参数

save_directory (str 或 Path) — 保存模型权重和配置的目录路径。
config (dict 或 DataclassInstance, 可选) — 指定为键/值字典或数据类实例的模型配置。
push_to_hub (bool, 可选, 默认为 False) — 保存模型后是否将其推送到 Huggingface Hub。
repo_id (str, 可选) — 您在 Hub 上的仓库 ID。仅在 push_to_hub=True 时使用。如果未提供，将默认为文件夹名称。
model_card_kwargs (Dict[str, Any], 可选) — 传递给模型卡模板的其他参数，用于自定义模型卡。
push_to_hub_kwargs — 传递给 ~ModelHubMixin.push_to_hub 方法的其他关键字参数。

str 或 None

如果 push_to_hub=True，则为 Hub 上提交的 URL，否则为 None。

将权重保存到本地目录。

push_to_hub

< 来源 >

( repo_id: str config: typing.Union[dict, huggingface_hub.hub_mixin.DataclassInstance, NoneType] = None commit_message: str = '使用 huggingface_hub 推送模型。' private: typing.Optional[bool] = None token: typing.Optional[str] = None branch: typing.Optional[str] = None create_pr: typing.Optional[bool] = None allow_patterns: typing.Union[str, typing.List[str], NoneType] = None ignore_patterns: typing.Union[str, typing.List[str], NoneType] = None delete_patterns: typing.Union[str, typing.List[str], NoneType] = None model_card_kwargs: typing.Optional[typing.Dict[str, typing.Any]] = None )

参数

repo_id (str) — 要推送到的仓库 ID（例如："username/my-model"）。
config (dict 或 DataclassInstance, 可选) — 指定为键/值字典或数据类实例的模型配置。
commit_message (str, 可选) — 推送时提交的消息。
private (bool, 可选) — 创建的仓库是否应为私有。如果为 None（默认），则仓库将为公开，除非组织的默认设置为私有。
token (str, 可选) — 用于远程文件的 HTTP Bearer 授权令牌。默认情况下，它将使用运行 hf auth login 时缓存的令牌。
branch (str, 可选) — 推送模型的 git 分支。默认为 "main"。
create_pr (boolean, 可选) — 是否从 branch 创建带有该提交的 Pull Request。默认为 False。
allow_patterns (List[str] 或 str, 可选) — 如果提供，则只推送至少匹配一个模式的文件。
ignore_patterns (List[str] 或 str, 可选) — 如果提供，则不推送匹配任何模式的文件。
delete_patterns (List[str] 或 str, 可选) — 如果提供，则匹配任何模式的远程文件将从仓库中删除。
model_card_kwargs (Dict[str, Any], 可选) — 传递给模型卡模板的其他参数，用于自定义模型卡。

将模型检查点上传到 Hub。

使用 allow_patterns 和 ignore_patterns 精确筛选哪些文件应推送到 Hub。使用 delete_patterns 在同一提交中删除现有远程文件。有关更多详细信息，请参阅 upload_folder 参考。

call

< 来源 >

( inputs: typing.Union[str, typing.List[str]] batch_size: int = 32 as_numpy: bool = False use_labels: bool = True show_progress_bar: typing.Optional[bool] = None ) → Union[torch.Tensor, np.ndarray, List[str], int, str]

参数

inputs (Union[str, List[str]]) — 用于预测类别的输入句子或句子列表。
batch_size (int, 默认为 32) — 用于将句子编码为嵌入的批大小。越大通常意味着更快的处理速度，但内存使用量也越大。
as_numpy (bool, 默认为 False) — 是否输出为 numpy 数组。
use_labels (bool, 默认为 True) — 是否尝试返回 SetFitModel.labels 的元素。
show_progress_bar (Optional[bool], 默认为 None) — 编码时是否显示进度条。

Union[torch.Tensor, np.ndarray, List[str], int, str]

如果 use_labels 为 True 且 SetFitModel.labels 已定义，则返回与输入长度相同的字符串标签列表。否则返回与输入长度相同的向量，表示每个输入所属的预测类别。如果输入是单个字符串，则输出也是单个标签。

预测各种类别。

示例

>>> model = SetFitModel.from_pretrained(...)
>>> model(["What a boring display", "Exhilarating through and through", "I'm wowed!"])
["negative", "positive", "positive"]
>>> model("That was cool!")
"positive"

label2id

< 来源 >

( )

返回从字符串标签到整数 ID 的映射。

id2label

< 来源 >

( )

返回从整数 ID 到字符串标签的映射。

创建模型卡片

< 来源 >

( path: str model_name: typing.Optional[str] = 'SetFit 模型' )

参数

path (str) — 保存模型卡的路径。
model_name (str, 可选) — 模型的名称。默认为 SetFit Model。

为 SetFit 模型创建并保存模型卡。

SetFit

主类

SetFitModel

class setfit.SetFitModel

from_pretrained

save_pretrained

push_to_hub

__call__

label2id

id2label

创建模型卡片

编码

fit

freeze

generate_model_card

predict

predict_proba

到

unfreeze

SetFitHead

class setfit.SetFitHead

forward

SetFitModelCardData

class setfit.SetFitModelCardData

to_dict

to_yaml

AbsaModel

类 setfit.AbsaModel

__call__

设备

from_pretrained

predict

push_to_hub

到

save_pretrained

AspectModel

类 setfit.AspectModel

__call__

设备

from_pretrained

predict

push_to_hub

save_pretrained

到

极性模型

class setfit.PolarityModel

__call__

设备

from_pretrained

predict

push_to_hub

save_pretrained

到

call

call

call

call