Transformers.js 文档

utils/data-structures

Transformers.js

您正在查看的是需要从源码安装。如果您想进行常规 npm 安装，请查看最新的稳定版本 (v3.0.0)。

加入 Hugging Face 社区

并获得增强的文档体验

在模型、数据集和 Spaces 上进行协作

通过加速推理获得更快的示例

切换文档主题

开始使用

utils/data-structures

自定义数据结构。

这些仅在内部使用，这意味着最终用户无需访问此处。

utils/data-structures
- 静态
  - .PriorityQueue
    - new PriorityQueue(comparator)
    - .size
    - .isEmpty() ⇒ boolean
    - .peek() ⇒ any
    - .push(...values) ⇒ number
    - .extend(values) ⇒ number
    - .pop() ⇒ any
    - .replace(value) ⇒ *
    - ._siftUpFrom(node)
  - .CharTrie
  - .TokenLattice
    - new TokenLattice(sentence, bosTokenId, eosTokenId)
    - .insert(pos, length, score, tokenId)
    - .viterbi() ⇒ Array.<TokenLatticeNode>
    - .piece(node) ⇒ string
    - .tokens() ⇒ Array.<string>
    - .tokenIds() ⇒ Array.<number>
  - .DictionarySplitter
    - new DictionarySplitter(dictionary)
    - .split(text) ⇒ Array.<string>
  - .LRUCache
    - new LRUCache(capacity)
    - .get(key) ⇒ any
    - .put(key, value)
    - .clear()
- 内部
  - ~CharTrieNode
    - new CharTrieNode(isLeaf, children)
    - .default() ⇒ CharTrieNode
  - ~TokenLatticeNode
    - new TokenLatticeNode(tokenId, nodeId, pos, length, score)
    - .clone() ⇒ TokenLatticeNode

utils/data-structures.PriorityQueue

基于堆的优先级队列的有效实现。它使用基于数组的二叉堆，其中根位于索引 0，节点 i 的子节点分别位于索引 2i + 1 和 2i + 2。

改编自以下来源

https://stackoverflow.com/a/42919752/13989043（原文）
https://github.com/belladoreai/llama-tokenizer-js（小改进）

类型：utils/data-structures 的静态类

.PriorityQueue
- new PriorityQueue(comparator)
- .size
- .isEmpty() ⇒ boolean
- .peek() ⇒ any
- .push(...values) ⇒ number
- .extend(values) ⇒ number
- .pop() ⇒ any
- .replace(value) ⇒ *
- ._siftUpFrom(node)

new PriorityQueue(comparator)

创建一个新的 PriorityQueue。

参数量	类型	描述
comparator	`函数`	用于确定优先级的比较器函数。默认为 MaxHeap。

priorityQueue.size

队列的大小

类型：PriorityQueue 的实例属性

priorityQueue.isEmpty() ⇒ <code> boolean </code>

检查队列是否为空。

类型：PriorityQueue 的实例方法
返回：boolean - 如果队列为空，则为true，否则为false。

priorityQueue.peek() ⇒ <code> any </code>

返回队列中优先级最高的元素。

类型：PriorityQueue 的实例方法
返回：any - 队列中优先级最高的元素。

priorityQueue.push(...values) ⇒ <code> number </code>

向队列添加一个或多个元素。

类型：PriorityQueue 的实例方法
返回：number - 队列的新大小。

参数量	类型	描述
...values	`任何`	要推送到队列中的值。

priorityQueue.extend(values) ⇒ <code> number </code>

向队列添加多个元素。

类型：PriorityQueue 的实例方法
返回：number - 队列的新大小。

参数量	类型	描述
值	`Array.<any>`	要推送到队列中的值。

priorityQueue.pop() ⇒ <code> any </code>

移除并返回队列中优先级最高的元素。

类型：PriorityQueue 的实例方法
返回：any - 队列中优先级最高的元素。

priorityQueue.replace(value) ⇒ <code> * </code>

将队列中优先级最高的元素替换为新值。

类型：PriorityQueue 的实例方法
返回：* - 被替换的值。

参数量	类型	描述
值	`*`	新值。

priorityQueue._siftUpFrom(node)

从给定节点向上筛选的辅助函数。

类型：PriorityQueue 的实例方法

参数量	类型	描述
node	`数字`	开始向上筛选的节点索引。

utils/data-structures.CharTrie

一种用于高效存储和搜索字符串的 trie 结构。

类型：utils/data-structures 的静态类

.CharTrie

charTrie.extend(texts)

将一个或多个 texts 添加到 trie 中。

类型：CharTrie 的实例方法

参数量	类型	描述
texts	`Array.<string>`	要添加到 trie 中的字符串。

charTrie.push(text)

将文本添加到 trie 中。

类型：CharTrie 的实例方法

参数量	类型	描述
text	`字符串`	要添加到 trie 中的字符串。

charTrie.commonPrefixSearch(text)

在 trie 中搜索所有以 text 为公共前缀的字符串。

类型：CharTrie 的实例方法

参数量	类型	描述
text	`字符串`	要搜索的公共前缀。

utils/data-structures.TokenLattice

用于分词的格结构数据。

类型：utils/data-structures 的静态类

.TokenLattice
- new TokenLattice(sentence, bosTokenId, eosTokenId)
- .insert(pos, length, score, tokenId)
- .viterbi() ⇒ Array.<TokenLatticeNode>
- .piece(node) ⇒ string
- .tokens() ⇒ Array.<string>
- .tokenIds() ⇒ Array.<number>

new TokenLattice(sentence, bosTokenId, eosTokenId)

创建一个新的 TokenLattice 实例。

参数量	类型	描述
sentence	`字符串`	要进行分词的输入语句。
bosTokenId	`数字`	序列开始标记 ID。
eosTokenId	`数字`	序列结束标记 ID。

tokenLattice.insert(pos, length, score, tokenId)

将新的 token 节点插入 token lattice。

类型：TokenLattice 的实例方法

参数量	类型	描述
pos	`数字`	token 的起始位置。
length	`数字`	token 的长度。
score	`数字`	token 的分数。
tokenId	`数字`	token 的 token ID。

tokenLattice.viterbi() ⇒ <code> Array. < TokenLatticeNode > </code>

实现维特比算法以计算最可能的 token 序列。

类型：TokenLattice 的实例方法
返回：Array.<TokenLatticeNode> - 最可能的 token 序列。

tokenLattice.piece(node) ⇒ <code> string </code>

类型：TokenLattice 的实例方法
返回：string - 表示最可能 token 序列的节点数组。

参数量	类型
node	`TokenLatticeNode`

tokenLattice.tokens() ⇒ <code> Array. < string > </code>

类型：TokenLattice 的实例方法
返回：Array.<string> - 最可能的 token 序列。

tokenLattice.tokenIds() ⇒ <code> Array. < number > </code>

类型：TokenLattice 的实例方法
返回：Array.<number> - 最可能的 token ID 序列。

utils/data-structures.DictionarySplitter

一种数据结构，它使用 trie 根据字典将字符串拆分为标记。它还可以使用正则表达式在拆分前预处理输入文本。

注意：为确保多字节字符得到正确处理，我们在字节级别而不是字符级别操作。

类型：utils/data-structures 的静态类

.DictionarySplitter
- new DictionarySplitter(dictionary)
- .split(text) ⇒ Array.<string>

new DictionarySplitter(dictionary)

参数量	类型	描述
dictionary	`Array.<string>`	用于拆分的单词字典。

dictionarySplitter.split(text) ⇒ <code> Array. < string > </code>

根据字典将输入文本拆分为 token。

类型：DictionarySplitter 的实例方法
返回：Array.<string> - 一个 token 数组。

参数量	类型	描述
text	`字符串`	要拆分的输入文本。

utils/data-structures.LRUCache

JavaScript 中最近最少使用 (LRU) 缓存的简单实现。此缓存存储键值对，并在超出容量时逐出最近最少使用的项目。

类型：utils/data-structures 的静态类

.LRUCache
- new LRUCache(capacity)
- .get(key) ⇒ any
- .put(key, value)
- .clear()

new LRUCache(capacity)

创建 LRUCache 实例。

参数量	类型	描述
capacity	`数字`	缓存可以容纳的最大项目数。

lruCache.get(key) ⇒ <code> any </code>

检索与给定键关联的值，并将该键标记为最近使用。

类型：LRUCache 的实例方法
返回：any - 与键关联的值，如果键不存在则为 undefined。

参数量	类型	描述
key	`任何`	要检索的键。

lruCache.put(key, value)

在缓存中插入或更新键值对。如果键已经存在，则更新并标记为最近使用。如果缓存超出其容量，则逐出最近最少使用的项目。

类型：LRUCache 的实例方法

参数量	类型	描述
key	`任何`	要添加或更新的键。
值	`任何`	要与键关联的值。

lruCache.clear()

清除缓存。

类型：LRUCache 的实例方法

utils/data-structures~CharTrieNode

表示字符 trie 中的一个节点。

类型：utils/data-structures 的内部类

~CharTrieNode
- new CharTrieNode(isLeaf, children)
- .default() ⇒ CharTrieNode

new CharTrieNode(isLeaf, children)

创建一个新的 CharTrieNode。

参数量	类型	描述
isLeaf	`boolean`	该节点是否为叶节点。
children	`Map.<string, CharTrieNode>`	包含节点子节点的映射，其中键是字符，值是 `CharTrieNode`。

CharTrieNode.default() ⇒ <code> CharTrieNode </code>

返回一个具有默认值的新 CharTrieNode 实例。

类型：CharTrieNode 的静态方法
返回：CharTrieNode - 一个新的 CharTrieNode 实例，其中 isLeaf 设置为 false 且 children 映射为空。

utils/data-structures~TokenLatticeNode

类型：utils/data-structures 的内部类

~TokenLatticeNode
- new TokenLatticeNode(tokenId, nodeId, pos, length, score)
- .clone() ⇒ TokenLatticeNode

new TokenLatticeNode(tokenId, nodeId, pos, length, score)

表示给定语句的 token lattice 中的一个节点。

参数量	类型	描述
tokenId	`数字`	与此节点关联的 token ID。
nodeId	`数字`	此节点的 ID。
pos	`数字`	token 在语句中的起始位置。
length	`数字`	token 的长度。
score	`数字`	与 token 关联的分数。

tokenLatticeNode.clone() ⇒ <code> TokenLatticeNode </code>

返回此节点的克隆。

类型：TokenLatticeNode 的实例方法
返回：TokenLatticeNode - 此节点的克隆。

< > 在 GitHub 上更新

←数学