Transformers.js 文档

Transformers.js

您正在查看 main 版本,该版本需要从源代码安装。如果您想要常规 npm 安装,请查看最新的稳定版本 (v3.0.0)。
Hugging Face's logo
加入 Hugging Face 社区

并获得增强的文档体验

开始使用

Transformers.js

用于 Web 的最先进的机器学习

直接在您的浏览器中运行 🤗 Transformers,无需服务器!

Transformers.js 旨在功能上等同于 Hugging Face 的 transformers python 库,这意味着您可以使用非常相似的 API 运行相同的预训练模型。这些模型支持不同模态的常见任务,例如

  • 📝 自然语言处理:文本分类、命名实体识别、问答、语言建模、摘要、翻译、多项选择和文本生成。
  • 🖼️ 计算机视觉:图像分类、物体检测、分割和深度估计。
  • 🗣️ 音频:自动语音识别、音频分类和文本到语音。
  • 🐙 多模态:嵌入、零样本音频分类、零样本图像分类和零样本物体检测。

Transformers.js 使用 ONNX Runtime 在浏览器中运行模型。 最棒的是,您可以使用 🤗 Optimum 轻松地将预训练的 PyTorch、TensorFlow 或 JAX 模型转换为 ONNX。

有关更多信息,请查看完整的文档

快速入门

从现有代码进行转换非常简单!与 python 库一样,我们支持 pipeline API。Pipelines 将预训练模型与输入预处理和输出后处理组合在一起,使其成为使用库运行模型的最简单方法。

Python (原始) Javascript (我们的)
from transformers import pipeline

# Allocate a pipeline for sentiment-analysis
pipe = pipeline('sentiment-analysis')

out = pipe('I love transformers!')
# [{'label': 'POSITIVE', 'score': 0.999806941}]
import { pipeline } from '@huggingface/transformers';

// Allocate a pipeline for sentiment-analysis
const pipe = await pipeline('sentiment-analysis');

const out = await pipe('I love transformers!');
// [{'label': 'POSITIVE', 'score': 0.999817686}]

您还可以通过将模型 ID 或路径指定为 pipeline 函数的第二个参数来使用不同的模型。例如

// Use a different model for sentiment-analysis
const pipe = await pipeline('sentiment-analysis', 'Xenova/bert-base-multilingual-uncased-sentiment');

默认情况下,在浏览器中运行时,模型将在您的 CPU 上运行(通过 WASM)。如果您想在 GPU 上运行模型(通过 WebGPU),您可以通过设置 device: 'webgpu' 来实现,例如

// Run the model on WebGPU
const pipe = await pipeline('sentiment-analysis', 'Xenova/distilbert-base-uncased-finetuned-sst-2-english', {
  device: 'webgpu',
});

有关更多信息,请查看 WebGPU 指南

WebGPU API 在许多浏览器中仍处于实验阶段,因此如果您遇到任何问题,请提交错误报告

在资源受限的环境(如 Web 浏览器)中,建议使用模型的量化版本,以降低带宽并优化性能。这可以通过调整 dtype 选项来实现,该选项允许您为模型选择适当的数据类型。虽然可用选项可能因特定模型而异,但典型选择包括 "fp32"(WebGPU 的默认值)、"fp16""q8"(WASM 的默认值)和 "q4"。有关更多信息,请查看量化指南

// Run the model at 4-bit quantization
const pipe = await pipeline('sentiment-analysis', 'Xenova/distilbert-base-uncased-finetuned-sst-2-english', {
  dtype: 'q4',
});

目录

文档分为 4 个部分

  1. 开始使用 提供库的快速入门和安装说明,以帮助您启动并运行。
  2. 教程 如果您是初学者,这是一个很好的起点!我们还提供了示例应用程序供您试用!
  3. 开发者指南 向您展示如何使用该库来实现特定目标。
  4. API 参考 描述了所有类和函数,以及它们的可用参数和类型。

示例

想要直接开始?从我们的示例应用程序/模板之一开始,可以在此处找到。

名称 描述 链接
Whisper Web 使用 Whisper 进行语音识别 代码, 演示
Doodle Dash 实时草图识别游戏 博客, 代码, 演示
代码游乐场 浏览器内代码补全网站 代码, 演示
语义图像搜索(客户端) 使用文本搜索图像 代码, 演示
语义图像搜索(服务器端) 使用文本搜索图像 (Supabase) 代码, 演示
Vanilla JavaScript 浏览器内物体检测 视频, 代码, 演示
React 多语言翻译网站 代码, 演示
文本到语音(客户端) 浏览器内语音合成 代码, 演示
浏览器扩展 文本分类扩展 代码
Electron 文本分类应用程序 代码
Next.js(客户端) 情感分析(浏览器内推理) 代码, 演示
Next.js(服务器端) 情感分析 (Node.js 推理) 代码, 演示
Node.js 情感分析 API 代码
演示站点 演示集合 代码, 演示

查看 Hugging Face 上的 Transformers.js 模板,一键开始!

支持的任务/模型

以下是 Transformers.js 当前支持的所有任务和架构的列表。如果您在此处没有看到您的任务/模型,或者尚未支持,请随时在此处打开功能请求此处

要在 Hub 上查找兼容的模型,请在过滤器菜单中选择“transformers.js”库标签(或访问此链接)。您可以通过选择您感兴趣的任务来优化搜索(例如,文本分类)。

任务

自然语言处理

任务 ID 描述 是否支持?
填充掩码 fill-mask 掩盖句子中的某些词,并预测应该用哪些词来替换这些掩码。 (文档)
(模型)
问答 question-answering 从给定的文本中检索问题的答案。 (文档)
(模型)
句子相似度 sentence-similarity 确定两个文本的相似程度。 (文档)
(模型)
摘要 summarization 生成文档的较短版本,同时保留其重要信息。 (文档)
(模型)
表格问答 table-question-answering 回答有关给定表格中信息的问题。
文本分类 text-classificationsentiment-analysis 为给定文本分配标签或类别。 (文档)
(模型)
文本生成 text-generation 通过预测序列中的下一个词来生成新文本。 (文档)
(模型)
文本到文本生成 text2text-generation 将一个文本序列转换为另一个文本序列。 (文档)
(模型)
标记分类 token-classificationner 为文本中的每个标记分配标签。 (文档)
(模型)
翻译 translation 将文本从一种语言转换为另一种语言。 (文档)
(模型)
零样本分类 zero-shot-classification 将文本分类为训练期间未见过的类别。 (文档)
(模型)
特征提取 feature-extraction 将原始数据转换为数值特征,这些特征可以在保留原始数据集中的信息的同时进行处理。 (文档)
(模型)

视觉

任务 ID 描述 是否支持?
背景移除 background-removal 通过移除背景或使其透明来隔离图像的主要对象。 (文档)
(模型)
深度估计 depth-estimation 预测图像中存在的物体的深度。 (文档)
(模型)
图像分类 image-classification 为整个图像分配标签或类别。 (文档)
(模型)
图像分割 image-segmentation 将图像划分为多个片段,其中每个像素都映射到一个对象。此任务有多种变体,例如实例分割、全景分割和语义分割。 (文档)
(模型)
图像到图像 image-to-image 转换源图像以匹配目标图像或目标图像域的特征。 (文档)
(模型)
掩码生成 mask-generation 为图像中的对象生成掩码。
物体检测 object-detection 识别图像中某些已定义类别的物体。 (文档)
(模型)
视频分类 不适用 为整个视频分配标签或类别。
无条件图像生成 不适用 生成在任何上下文中都没有条件的图像(例如提示文本或另一张图像)。
图像特征提取 image-feature-extraction 将原始数据转换为数值特征,这些特征可以在保留原始图像中的信息的同时进行处理。 (文档)
(模型)

音频

任务 ID 描述 是否支持?
音频分类 audio-classification 为给定的音频分配标签或类别。 (文档)
(模型)
音频到音频 不适用 从输入音频源生成音频。
自动语音识别 automatic-speech-recognition 将给定的音频转录为文本。 (文档)
(模型)
文本到语音 text-to-speechtext-to-audio 根据文本输入生成自然发声的语音。 (文档)
(模型)

表格数据

任务 ID 描述 是否支持?
表格分类 不适用 根据一组属性对目标类别(组)进行分类。
表格回归 不适用 根据一组属性预测数值。

多模态

任务 ID 描述 是否支持?
文档问答 document-question-answering 回答有关文档图像的问题。 (文档)
(模型)
图像到文本 image-to-text 从给定的图像输出文本。 (文档)
(模型)
文本到图像 text-to-image 从输入文本生成图像。
视觉问答 visual-question-answering 根据图像回答开放式问题。
零样本音频分类 zero-shot-audio-classification 将音频分类为训练期间未见过的类别。 (文档)
(模型)
零样本图像分类 zero-shot-image-classification 将图像分类为训练期间未见过的类别。 (文档)
(模型)
零样本物体检测 zero-shot-object-detection 识别训练期间未见过的类别的物体。 (文档)
(模型)

强化学习

任务 ID 描述 是否支持?
强化学习 不适用 通过与环境进行试错交互并接收奖励(负面或正面)作为反馈,从行动中学习。

模型

  1. ALBERT (来自 Google Research 和芝加哥丰田技术研究所),论文为 ALBERT: A Lite BERT for Self-supervised Learning of Language Representations,作者为 Zhenzhong Lan、Mingda Chen、Sebastian Goodman、Kevin Gimpel、Piyush Sharma、Radu Soricut。
  2. 音频频谱图转换器 (来自 MIT),论文为 AST: Audio Spectrogram Transformer,作者为 Yuan Gong、Yu-An Chung、James Glass。
  3. BART (来自 Facebook),论文为 BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension,作者为 Mike Lewis、Yinhan Liu、Naman Goyal、Marjan Ghazvininejad、Abdelrahman Mohamed、Omer Levy、Ves Stoyanov 和 Luke Zettlemoyer。
  4. BEiT (来自 Microsoft) 发布,并附带论文 BEiT: BERT Pre-Training of Image Transformers,作者为 Hangbo Bao, Li Dong, Furu Wei。
  5. BERT (来自 Google) 发布,并附带论文 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,作者为 Jacob Devlin, Ming-Wei Chang, Kenton Lee 和 Kristina Toutanova。
  6. Blenderbot (来自 Facebook) 发布,并附带论文 Recipes for building an open-domain chatbot,作者为 Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston。
  7. BlenderbotSmall (来自 Facebook) 发布,并附带论文 Recipes for building an open-domain chatbot,作者为 Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston。
  8. BLOOM (来自 BigScience workshop) 由 BigScience Workshop 发布。
  9. CamemBERT (来自 Inria/Facebook/索邦大学) 发布,并附带论文 CamemBERT: a Tasty French Language Model,作者为 Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah 和 Benoît Sagot。
  10. Chinese-CLIP (来自 OFA-Sys) 发布,并附带论文 Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese,作者为 An Yang, Junshu Pan, Junyang Lin, Rui Men, Yichang Zhang, Jingren Zhou, Chang Zhou。
  11. CLAP (来自 LAION-AI) 发布,并附带论文 Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation,作者为 Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, Shlomo Dubnov。
  12. CLIP (来自 OpenAI) 发布,并附带论文 Learning Transferable Visual Models From Natural Language Supervision,作者为 Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever。
  13. CLIPSeg (来自哥廷根大学) 发布,并附带论文 Image Segmentation Using Text and Image Prompts,作者为 Timo Lüddecke 和 Alexander Ecker。
  14. CodeGen (来自 Salesforce) 发布,并附带论文 A Conversational Paradigm for Program Synthesis,作者为 Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong。
  15. CodeLlama (来自 MetaAI) 发布,并附带论文 Code Llama: Open Foundation Models for Code,作者为 Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve。
  16. Cohere (来自 Cohere) 发布,并附带论文 Command-R: Retrieval Augmented Generation at Production Scale,作者为 Cohere。
  17. ConvBERT (来自 YituTech) 发布,并附带论文 ConvBERT: Improving BERT with Span-based Dynamic Convolution,作者为 Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan。
  18. ConvNeXT (来自 Facebook AI) 发布,并附带论文 A ConvNet for the 2020s,作者为 Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie。
  19. ConvNeXTV2 (来自 Facebook AI) 发布,并附带论文 ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders,作者为 Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie。
  20. DAC (来自 Descript) 发布,并附带论文 Descript Audio Codec: High-Fidelity Audio Compression with Improved RVQGAN,作者为 Rithesh Kumar, Prem Seetharaman, Alejandro Luebs, Ishaan Kumar, Kundan Kumar。
  21. DeBERTa (来自 Microsoft) 发布,并附带论文 DeBERTa: Decoding-enhanced BERT with Disentangled Attention,作者为 Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen。
  22. DeBERTa-v2 (来自 Microsoft) 发布,并附带论文 DeBERTa: Decoding-enhanced BERT with Disentangled Attention,作者为 Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen。
  23. Decision Transformer (来自 Berkeley/Facebook/Google) 发布,并附带论文 Decision Transformer: Reinforcement Learning via Sequence Modeling,作者为 Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch。
  24. DeiT (来自 Facebook) 发布,并附带论文 Training data-efficient image transformers & distillation through attention,作者为 Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou。
  25. Depth Anything (来自香港大学和 TikTok) 发布,并附带论文 Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data,作者为 Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao。
  26. Depth Pro (来自 Apple) 发布,并附带论文 Depth Pro: Sharp Monocular Metric Depth in Less Than a Second,作者为 Aleksei Bochkovskii, Amaël Delaunoy, Hugo Germain, Marcel Santos, Yichao Zhou, Stephan R. Richter, Vladlen Koltun。
  27. DETR (来自 Facebook) 发布,并附带论文 End-to-End Object Detection with Transformers,作者为 Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko。
  28. DINOv2 (来自 Meta AI) 发布,并附带论文 DINOv2: Learning Robust Visual Features without Supervision,作者为 Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, Piotr Bojanowski。
  29. DINOv2 with Registers (来自 Meta AI) 发布,并附带论文 DINOv2 with Registers,作者为 Timothée Darcet, Maxime Oquab, Julien Mairal, Piotr Bojanowski。
  30. DistilBERT (来自 HuggingFace) 发布,并附带论文 DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter,作者为 Victor Sanh, Lysandre Debut 和 Thomas Wolf。同样的方法也被应用于压缩 GPT2 到 DistilGPT2,RoBERTa 到 DistilRoBERTa,Multilingual BERT 到 DistilmBERT 以及德语版本的 DistilBERT。
  31. DiT (来自 Microsoft Research) 发布,并附带论文 DiT: Self-supervised Pre-training for Document Image Transformer,作者为 Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei。
  32. Donut (来自 NAVER) 发布,并附带论文 OCR-free Document Understanding Transformer,作者为 Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park。
  33. DPT (来自 Intel Labs) 发布,并附带论文 Vision Transformers for Dense Prediction,作者为 René Ranftl, Alexey Bochkovskiy, Vladlen Koltun。
  34. EfficientNet (来自 Google Brain) 发布,并附带论文 EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,作者为 Mingxing Tan, Quoc V. Le。
  35. ELECTRA (来自 Google Research/斯坦福大学) 发布,并附带论文 ELECTRA: Pre-training text encoders as discriminators rather than generators,作者为 Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning。
  36. ESM (来自 Meta AI) 是 transformer 蛋白质语言模型。ESM-1b 发布,并附带论文 Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences,作者为 Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, 和 Rob Fergus。ESM-1v 发布,并附带论文 Language models enable zero-shot prediction of the effects of mutations on protein function,作者为 Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu 和 Alexander Rives。ESM-2 和 ESMFold 发布,并附带论文 Language models of protein sequences at the scale of evolution enable accurate structure prediction,作者为 Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives。
  37. EXAONE (来自 LG AI Research) 发布,并附带论文 EXAONE 3.0 7.8B Instruction Tuned Language ModelEXAONE 3.5: Series of Large Language Models for Real-world Use Cases,作者为 LG AI Research 团队。
  38. Falcon (来自 Technology Innovation Institute) 由 Almazrouei, Ebtesam, Alobeidli, Hamza, Alshamsi, Abdulaziz, Cappelli, Alessandro, Cojocaru, Ruxandra, Debbah, Merouane, Goffinet, Etienne, Heslow, Daniel, Launay, Julien, Malartic, Quentin, Noune, Badreddine, Pannier, Baptiste 和 Penedo, Guilherme 发布。
  39. FastViT (来自 Apple) 发布,并附带论文 FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization,作者为 Pavan Kumar Anasosalu Vasu, James Gabriel, Jeff Zhu, Oncel Tuzel 和 Anurag Ranjan。
  40. FLAN-T5 (来自 Google AI) 在仓库 google-research/t5x 中发布,作者为 Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, 和 Jason Wei。
  41. Florence2 (来自 Microsoft) 发布,并附带论文 Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks,作者为 Bin Xiao, Haiping Wu, Weijian Xu, Xiyang Dai, Houdong Hu, Yumao Lu, Michael Zeng, Ce Liu, Lu Yuan。
  42. Gemma (来自 Google) 发布,并附带论文 Gemma: Open Models Based on Gemini Technology and Research,作者为 Gemma Google 团队。
  43. Gemma2 (来自 Google) 发布,并附带论文 Gemma2: Open Models Based on Gemini Technology and Research,作者为 Gemma Google 团队。
  44. Gemma3 (来自 Google) 发布,并附带论文 Introducing Gemma 3: The most capable model you can run on a single GPU or TPU,作者为 Gemma Google 团队。
  45. GLM (来自 GLM Team, THUDM & ZhipuAI) 发布,并附带论文 ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools,作者为 Team GLM: Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Dan Zhang, Diego Rojas, Guanyu Feng, Hanlin Zhao, Hanyu Lai, Hao Yu, Hongning Wang, Jiadai Sun, Jiajie Zhang, Jiale Cheng, Jiayi Gui, Jie Tang, Jing Zhang, Jingyu Sun, Juanzi Li, Lei Zhao, Lindong Wu, Lucen Zhong, Mingdao Liu, Minlie Huang, Peng Zhang, Qinkai Zheng, Rui Lu, Shuaiqi Duan, Shudan Zhang, Shulin Cao, Shuxun Yang, Weng Lam Tam, Wenyi Zhao, Xiao Liu, Xiao Xia, Xiaohan Zhang, Xiaotao Gu, Xin Lv, Xinghan Liu, Xinyi Liu, Xinyue Yang, Xixuan Song, Xunkai Zhang, Yifan An, Yifan Xu, Yilin Niu, Yuantao Yang, Yueyan Li, Yushi Bai, Yuxiao Dong, Zehan Qi, Zhaoyu Wang, Zhen Yang, Zhengxiao Du, Zhenyu Hou, Zihan Wang。
  46. GLPN (来自 KAIST) 发布,并附带论文 Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth,作者为 Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim。
  47. GPT Neo (来自 EleutherAI) 在仓库 EleutherAI/gpt-neo 中发布,作者为 Sid Black, Stella Biderman, Leo Gao, Phil Wang 和 Connor Leahy。
  48. GPT NeoX (来自 EleutherAI) 发布,并附带论文 GPT-NeoX-20B: An Open-Source Autoregressive Language Model,作者为 Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach
  49. GPT-2 (来自 OpenAI) 发布,并附带论文 Language Models are Unsupervised Multitask Learners,作者为 Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei和 Ilya Sutskever
  50. GPT-J (来自 EleutherAI) 在仓库 kingoflolz/mesh-transformer-jax 中发布,作者为 Ben Wang 和 Aran Komatsuzaki。
  51. GPTBigCode (来自 BigCode) 发布,并附带论文 SantaCoder: don’t reach for the stars!,作者为 Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra。
  52. Granite (来自 IBM) 发布,并附带论文 Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler,作者为 Yikang Shen, Matthew Stallone, Mayank Mishra, Gaoyuan Zhang, Shawn Tan, Aditya Prasad, Adriana Meza Soria, David D. Cox, Rameswar Panda。
  53. Grounding DINO (来自 IDEA-Research) 发布,并附带论文 Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection,作者为 Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang。
  54. GroupViT (来自 UCSD, NVIDIA) 发布,并附带论文 GroupViT: Semantic Segmentation Emerges from Text Supervision,作者为 Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang。
  55. Helium (来自 Kyutai Team) 通过博客文章 Announcing Helium-1 Preview 由 Kyutai Team 发布。
  56. HerBERT (来自 Allegro.pl, AGH University of Science and Technology) 发布,并附带论文 KLEJ: Comprehensive Benchmark for Polish Language Understanding,作者为 Piotr Rybak, Robert Mroczkowski, Janusz Tracz, Ireneusz Gawlik。
  57. Hiera (来自 Meta) 发布,并附带论文 Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles,作者为 Chaitanya Ryali, Yuan-Ting Hu, Daniel Bolya, Chen Wei, Haoqi Fan, Po-Yao Huang, Vaibhav Aggarwal, Arkabandhu Chowdhury, Omid Poursaeed, Judy Hoffman, Jitendra Malik, Yanghao Li, Christoph Feichtenhofer。
  58. Hubert (来自 Facebook) 发布,并附带论文 HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units,作者为 Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed。
  59. I-JEPA (来自 Meta) 发布,并附带论文 Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture,作者为 Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann LeCun, Nicolas Ballas。
  60. Idefics3 (来自 Hugging Face) 发布,并附带论文 Building and better understanding vision-language models: insights and future directions,作者为 Hugo Laurençon, Andrés Marafioti, Victor Sanh, Léo Tronchon。
  61. JAIS (来自 Core42) 发布,并附带论文 Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models,作者为 Neha Sengupta, Sunil Kumar Sahu, Bokang Jia, Satheesh Katipomu, Haonan Li, Fajri Koto, William Marshall, Gurpreet Gosal, Cynthia Liu, Zhiming Chen, Osama Mohammed Afzal, Samta Kamboj, Onkar Pandit, Rahul Pal, Lalit Pradhan, Zain Muhammad Mujahid, Massa Baali, Xudong Han, Sondos Mahmoud Bsharat, Alham Fikri Aji, Zhiqiang Shen, Zhengzhong Liu, Natalia Vassilieva, Joel Hestness, Andy Hock, Andrew Feldman, Jonathan Lee, Andrew Jackson, Hector Xuguang Ren, Preslav Nakov, Timothy Baldwin, Eric Xing。
  62. Janus (来自 DeepSeek) 发布,并附带论文 Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation,作者为 Chengyue Wu, Xiaokang Chen, Zhiyu Wu, Yiyang Ma, Xingchao Liu, Zizheng Pan, Wen Liu, Zhenda Xie, Xingkai Yu, Chong Ruan, Ping Luo。
  63. JinaCLIP (来自 Jina AI) 发布,并附带论文 Jina CLIP: Your CLIP Model Is Also Your Text Retriever,作者为 Andreas Koukounas, Georgios Mastrapas, Michael Günther, Bo Wang, Scott Martens, Isabelle Mohr, Saba Sturua, Mohammad Kalim Akram, Joan Fontanals Martínez, Saahil Ognawala, Susana Guzman, Maximilian Werk, Nan Wang, Han Xiao。
  64. LiteWhisper (来自 University of Washington, Kotoba Technologies) 发布,并附带论文 LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation,作者为 Keisuke Kamahori, Jungo Kasai, Noriyuki Kojima, Baris Kasikci。
  65. LongT5 (来自 Google AI) 发布,并附带论文 LongT5: Efficient Text-To-Text Transformer for Long Sequences,作者为 Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang。
  66. LLaMA (来自 Meta AI 的 FAIR 团队) 发布,并附带论文 LLaMA: Open and Efficient Foundation Language Models,作者为 Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample。
  67. Llama2 (来自 Meta AI 的 FAIR 团队) 发布,并附带论文 Llama2: Open Foundation and Fine-Tuned Chat Models,作者为 Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushka rMishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing EllenTan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, Thomas Scialom。
  68. LLaVa (来自 Microsoft Research & 威斯康星大学麦迪逊分校) 发布,并附带论文 Visual Instruction Tuning,作者为 Haotian Liu, Chunyuan Li, Yuheng Li 和 Yong Jae Lee。
  69. LLaVA-OneVision (来自 ByteDance & NTU & CUHK & HKUST) 发布,并附带论文 LLaVA-OneVision: Easy Visual Task Transfer,作者为 Bo Li, Yuanhan Zhang, Dong Guo, Renrui Zhang, Feng Li, Hao Zhang, Kaichen Zhang, Yanwei Li, Ziwei Liu, Chunyuan Li
  70. M2M100 (来自 Facebook) 发布,并附带论文 Beyond English-Centric Multilingual Machine Translation,作者为 Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin。
  71. MarianMT 机器翻译模型,使用 OPUS 数据训练,由 Jörg Tiedemann 开发。Marian Framework 由 Microsoft Translator Team 团队开发。
  72. MaskFormer (来自 Meta 和 UIUC) 发布,并附带论文 Per-Pixel Classification is Not All You Need for Semantic Segmentation,作者为 Bowen Cheng, Alexander G. Schwing, Alexander Kirillov。
  73. mBART (来自 Facebook) 发布,并附带论文 Multilingual Denoising Pre-training for Neural Machine Translation,作者为 Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer。
  74. mBART-50 (来自 Facebook) 发布,并附带论文 Multilingual Translation with Extensible Multilingual Pretraining and Finetuning,作者为 Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan。
  75. Metric3D 发布,并附带论文 Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image,作者为 Wei Yin, Chi Zhang, Hao Chen, Zhipeng Cai, Gang Yu, Kaixuan Wang, Xiaozhi Chen, Chunhua Shen。
  76. Metric3Dv2 发布,并附带论文 Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation,作者为 Mu Hu, Wei Yin, Chi Zhang, Zhipeng Cai, Xiaoxiao Long, Kaixuan Wang, Hao Chen, Gang Yu, Chunhua Shen, Shaojie Shen。
  77. MusicGen (来自 Meta) 发布,并附带论文 Simple and Controllable Music Generation,作者为 Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi 和 Alexandre Défossez。
  78. MGP-STR (来自 阿里巴巴研究) 发布,并附带论文 Multi-Granularity Prediction for Scene Text Recognition,作者为 Peng Wang, Cheng Da, 和 Cong Yao。
  79. Mimi (来自 Kyutai) 发布,并附带论文 Moshi: a speech-text foundation model for real-time dialogue,作者为 Alexandre Défossez, Laurent Mazaré, Manu Orsini, Amélie Royer, Patrick Pérez, Hervé Jégou, Edouard Grave 和 Neil Zeghidour。
  80. Mistral (来自 Mistral AI) 由 Mistral AI 团队发布:Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed。
  81. MMS (来自 Facebook) 发布,并附带论文 Scaling Speech Technology to 1,000+ Languages,作者为 Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli。
  82. MobileBERT (来自 CMU/Google Brain) 发布,并附带论文 MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices,作者为 Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, 和 Denny Zhou。
  83. MobileCLIP (来自 Apple) 发布,并附带论文 MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training,作者为 Pavan Kumar Anasosalu Vasu, Hadi Pouransari, Fartash Faghri, Raviteja Vemulapalli, Oncel Tuzel。
  84. MobileLLM (来自 Meta) 发布,并附带论文 MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases,作者为 Zechun Liu, Changsheng Zhao, Forrest Iandola, Chen Lai, Yuandong Tian, Igor Fedorov, Yunyang Xiong, Ernie Chang, Yangyang Shi, Raghuraman Krishnamoorthi, Liangzhen Lai, Vikas Chandra。
  85. MobileNetV1 (来自 Google Inc.) 发布,并附带论文 MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,作者为 Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam。
  86. MobileNetV2 (来自 Google Inc.) 发布,并附带论文 MobileNetV2: Inverted Residuals and Linear Bottlenecks,作者为 Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen。
  87. MobileNetV3 (来自 Google Inc.) 发布,并附带论文 Searching for MobileNetV3,作者为 Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, Hartwig Adam。
  88. MobileNetV4 (来自 Google Inc.) 发布,并附带论文 MobileNetV4 - Universal Models for the Mobile Ecosystem,作者为 Danfeng Qin, Chas Leichner, Manolis Delakis, Marco Fornoni, Shixin Luo, Fan Yang, Weijun Wang, Colby Banbury, Chengxi Ye, Berkin Akin, Vaibhav Aggarwal, Tenghui Zhu, Daniele Moro, Andrew Howard。
  89. MobileViT (来自 Apple) 发布,并附带论文 MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer,作者为 Sachin Mehta 和 Mohammad Rastegari。
  90. MobileViTV2 (来自 Apple) 发布,并附带论文 Separable Self-attention for Mobile Vision Transformers,作者为 Sachin Mehta 和 Mohammad Rastegari。
  91. ModernBERT (来自 Answer.AI 和 LightOn) 随论文 Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference 发布,作者为 Benjamin Warner, Antoine Chaffin, Benjamin Clavié, Orion Weller, Oskar Hallström, Said Taghadouini, Alexis Gallagher, Raja Biswas, Faisal Ladhak, Tom Aarsen, Nathan Cooper, Griffin Adams, Jeremy Howard, Iacopo Poli。
  92. Moondream1 在仓库 moondream 中发布,作者为 vikhyat。
  93. Moonshine (来自 Useful Sensors) 随论文 Moonshine: Speech Recognition for Live Transcription and Voice Commands 发布,作者为 Nat Jeffries, Evan King, Manjunath Kudlur, Guy Nicholson, James Wang, Pete Warden。
  94. MPNet (来自 Microsoft Research) 随论文 MPNet: Masked and Permuted Pre-training for Language Understanding 发布,作者为 Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu。
  95. MPT (来自 MosaicML) 随仓库 llm-foundry 发布,作者为 MosaicML NLP 团队。
  96. MT5 (来自 Google AI) 随论文 mT5: A massively multilingual pre-trained text-to-text transformer 发布,作者为 Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel。
  97. NLLB (来自 Meta) 随论文 No Language Left Behind: Scaling Human-Centered Machine Translation 发布,作者为 NLLB 团队。
  98. Nougat (来自 Meta AI) 随论文 Nougat: Neural Optical Understanding for Academic Documents 发布,作者为 Lukas Blecher, Guillem Cucurull, Thomas Scialom, Robert Stojnic。
  99. OLMo (来自 Ai2) 随论文 OLMo: Accelerating the Science of Language Models 发布,作者为 Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Valentina Pyatkin, Abhilasha Ravichander, Dustin Schwenk, Saurabh Shah, Will Smith, Emma Strubell, Nishant Subramani, Mitchell Wortsman, Pradeep Dasigi, Nathan Lambert, Kyle Richardson, Luke Zettlemoyer, Jesse Dodge, Kyle Lo, Luca Soldaini, Noah A. Smith, Hannaneh Hajishirzi。
  100. OLMo2 (来自 Ai2) 随博客 OLMo 2: The best fully open language model to date 发布,作者为 Ai2 OLMo 团队。
  101. OpenELM (来自 Apple) 随论文 OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework 发布,作者为 Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari。
  102. OPT (来自 Meta AI) 随论文 OPT: Open Pre-trained Transformer Language Models 发布,作者为 Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen 等。
  103. OWL-ViT (来自 Google AI) 随论文 Simple Open-Vocabulary Object Detection with Vision Transformers 发布,作者为 Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf 和 Neil Houlsby。
  104. OWLv2 (来自 Google AI) 随论文 Scaling Open-Vocabulary Object Detection 发布,作者为 Matthias Minderer, Alexey Gritsenko, Neil Houlsby。
  105. PaliGemma (来自 Google) 随论文 PaliGemma: A versatile 3B VLM for transferPaliGemma 2: A Family of Versatile VLMs for Transfer 发布,作者为 PaliGemma Google 团队。
  106. PatchTSMixer (来自 IBM) 随论文 TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting 发布,作者为 Vijay Ekambaram, Arindam Jati, Nam Nguyen, Phanwadee Sinthong, Jayant Kalagnanam。
  107. PatchTST (来自 Princeton University, IBM) 随论文 A Time Series is Worth 64 Words: Long-term Forecasting with Transformers 发布,作者为 Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, Jayant Kalagnanam。
  108. Phi (来自 Microsoft) 随论文 - Textbooks Are All You Need,作者为 Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee 和 Yuanzhi Li, Textbooks Are All You Need II: phi-1.5 technical report,作者为 Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar 和 Yin Tat Lee。
  109. Phi3 (来自 Microsoft) 随论文 Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone 发布,作者为 Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Parul Chopra, Allie Del Giorno, Gustavo de Rosa, Matthew Dixon, Ronen Eldan, Dan Iter, Amit Garg, Abhishek Goswami, Suriya Gunasekar, Emman Haider, Junheng Hao, Russell J. Hewett, Jamie Huynh, Mojan Javaheripi, Xin Jin, Piero Kauffmann, Nikos Karampatziakis, Dongwoo Kim, Mahoud Khademi, Lev Kurilenko, James R. Lee, Yin Tat Lee, Yuanzhi Li, Chen Liang, Weishung Liu, Eric Lin, Zeqi Lin, Piyush Madan, Arindam Mitra, Hardik Modi, Anh Nguyen, Brandon Norick, Barun Patra, Daniel Perez-Becker, Thomas Portet, Reid Pryzant, Heyang Qin, Marko Radmilac, Corby Rosset, Sambudha Roy, Olatunji Ruwase, Olli Saarikivi, Amin Saied, Adil Salim, Michael Santacroce, Shital Shah, Ning Shang, Hiteshi Sharma, Xia Song, Masahiro Tanaka, Xin Wang, Rachel Ward, Guanhua Wang, Philipp Witte, Michael Wyatt, Can Xu, Jiahang Xu, Sonali Yadav, Fan Yang, Ziyi Yang, Donghan Yu, Chengruidong Zhang, Cyril Zhang, Jianwen Zhang, Li Lyna Zhang, Yi Zhang, Yue Zhang, Yunan Zhang, Xiren Zhou。
  110. Phi3V (来自 Microsoft) 随论文 Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone 发布,作者为 Marah Abdin, Jyoti Aneja, Hany Awadalla, Ahmed Awadallah, Ammar Ahmad Awan, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Martin Cai, Qin Cai, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Weizhu Chen, Yen-Chun Chen, Yi-Ling Chen, Hao Cheng, Parul Chopra, Xiyang Dai, Matthew Dixon, Ronen Eldan, Victor Fragoso, Jianfeng Gao, Mei Gao, Min Gao, Amit Garg, Allie Del Giorno, Abhishek Goswami, Suriya Gunasekar, Emman Haider, Junheng Hao, Russell J. Hewett, Wenxiang Hu, Jamie Huynh, Dan Iter, Sam Ade Jacobs, Mojan Javaheripi, Xin Jin, Nikos Karampatziakis, Piero Kauffmann, Mahoud Khademi, Dongwoo Kim, Young Jin Kim, Lev Kurilenko, James R. Lee, Yin Tat Lee, Yuanzhi Li, Yunsheng Li, Chen Liang, Lars Liden, Xihui Lin, Zeqi Lin, Ce Liu, Liyuan Liu, Mengchen Liu, Weishung Liu, Xiaodong Liu, Chong Luo, Piyush Madan, Ali Mahmoudzadeh, David Majercak, Matt Mazzola, Caio César Teodoro Mendes, Arindam Mitra, Hardik Modi, Anh Nguyen, Brandon Norick, Barun Patra, Daniel Perez-Becker, Thomas Portet, Reid Pryzant, Heyang Qin, Marko Radmilac, Liliang Ren, Gustavo de Rosa, Corby Rosset, Sambudha Roy, Olatunji Ruwase, Olli Saarikivi, Amin Saied, Adil Salim, Michael Santacroce, Shital Shah, Ning Shang, Hiteshi Sharma, Yelong Shen, Swadheen Shukla, Xia Song, Masahiro Tanaka, Andrea Tupini, Praneetha Vaddamanu, Chunyu Wang, Guanhua Wang, Lijuan Wang , Shuohang Wang, Xin Wang, Yu Wang, Rachel Ward, Wen Wen, Philipp Witte, Haiping Wu, Xiaoxia Wu, Michael Wyatt, Bin Xiao, Can Xu, Jiahang Xu, Weijian Xu, Jilong Xue, Sonali Yadav, Fan Yang, Jianwei Yang, Yifan Yang, Ziyi Yang, Donghan Yu, Lu Yuan, Chenruidong Zhang, Cyril Zhang, Jianwen Zhang, Li Lyna Zhang, Yi Zhang, Yue Zhang, Yunan Zhang, Xiren Zhou。
  111. PVT (来自南京大学,香港大学等) 随论文 Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions 发布,作者为 Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao。
  112. PyAnnote 在仓库 pyannote/pyannote-audio 中发布,作者为 Hervé Bredin。
  113. Qwen2 (来自 Qwen 团队,阿里巴巴集团) 随论文 Qwen Technical Report 发布,作者为 Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng Xu, Jin Xu, An Yang, Hao Yang, Jian Yang, Shusheng Yang, Yang Yao, Bowen Yu, Hongyi Yuan, Zheng Yuan, Jianwei Zhang, Xingxuan Zhang, Yichang Zhang, Zhenru Zhang, Chang Zhou, Jingren Zhou, Xiaohuan Zhou 和 Tianhang Zhu。
  114. Qwen2-VL (来自 Qwen 团队,阿里巴巴集团) 随论文 Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond 发布,作者为 Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, Jingren Zhou。
  115. ResNet (来自 Microsoft Research) 随论文 Deep Residual Learning for Image Recognition 发布,作者为 Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun。
  116. RF-DETR (来自 Roboflow) 随博客文章 RF-DETR: A SOTA Real-Time Object Detection Model 发布,作者为 Peter Robicheaux, James Gallagher, Joseph Nelson, Isaac Robinson。
  117. RoBERTa (来自 Facebook),随论文 RoBERTa: A Robustly Optimized BERT Pretraining Approach 一起发布,作者为 Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov。
  118. RoFormer (来自 ZhuiyiTechnology),随论文 RoFormer: Enhanced Transformer with Rotary Position Embedding 一起发布,作者为 Jianlin Su 和 Yu Lu 和 Shengfeng Pan 和 Bo Wen 和 Yunfeng Liu。
  119. RT-DETR (来自 Baidu),随论文 DETRs Beat YOLOs on Real-time Object Detection 一起发布,作者为 Yian Zhao, Wenyu Lv, Shangliang Xu, Jinman Wei, Guanzhong Wang, Qingqing Dang, Yi Liu, Jie Chen。
  120. RT-DETRv2 (来自 Baidu),随论文 RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer 一起发布,作者为 Wenyu Lv, Yian Zhao, Qinyao Chang, Kui Huang, Guanzhong Wang, Yi Liu。
  121. Sapiens (来自 Meta AI) 随论文 Sapiens: Foundation for Human Vision Models 发布,作者为 Rawal Khirodkar, Timur Bagautdinov, Julieta Martinez, Su Zhaoen, Austin James, Peter Selednik, Stuart Anderson, Shunsuke Saito。
  122. SegFormer (来自 NVIDIA) 随论文 SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers 发布,作者为 Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo。
  123. Segment Anything (来自 Meta AI) 随论文 Segment Anything 发布,作者为 Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick。
  124. SigLIP (来自 Google AI) 随论文 Sigmoid Loss for Language Image Pre-Training 发布,作者为 Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, Lucas Beyer。
  125. **SmolVLM (来自 Hugging Face) 随博客文章 SmolVLM - small yet mighty Vision Language ModelSmolVLM Grows Smaller – Introducing the 250M & 500M Models! 发布,作者为 Hugging Face TB Research 团队。
  126. SNAC (来自 Papla Media, ETH Zurich) 随论文 SNAC: Multi-Scale Neural Audio Codec 发布,作者为 Hubert Siuzdak, Florian Grötschla, Luca A. Lanzendörfer。
  127. SpeechT5 (来自 Microsoft Research) 随论文 SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing 发布,作者为 Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei。
  128. SqueezeBERT (来自 Berkeley) 随论文 SqueezeBERT: What can computer vision teach NLP about efficient neural networks? 发布,作者为 Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, 和 Kurt W. Keutzer。
  129. StableLm (来自 Stability AI) 随论文 StableLM 3B 4E1T (Technical Report) 发布,作者为 Jonathan Tow, Marco Bellagente, Dakota Mahan, Carlos Riquelme Ruiz, Duy Phung, Maksym Zhuravinskyi, Nathan Cooper, Nikhil Pinnaparaju, Reshinth Adithyan, 和 James Baicoianu。
  130. Starcoder2 (来自 BigCode 团队) 随论文 StarCoder 2 and The Stack v2: The Next Generation 发布,作者为 Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil Paul, Zhuang Li, Wen-Ding Li, Megan Risdal, Jia Li, Jian Zhu, Terry Yue Zhuo, Evgenii Zheltonozhskii, Nii Osae Osae Dade, Wenhao Yu, Lucas Krauß, Naman Jain, Yixuan Su, Xuanli He, Manan Dey, Edoardo Abati, Yekun Chai, Niklas Muennighoff, Xiangru Tang, Muhtasham Oblokulov, Christopher Akiki, Marc Marone, Chenghao Mou, Mayank Mishra, Alex Gu, Binyuan Hui, Tri Dao, Armel Zebaze, Olivier Dehaene, Nicolas Patry, Canwen Xu, Julian McAuley, Han Hu, Torsten Scholak, Sebastien Paquet, Jennifer Robinson, Carolyn Jane Anderson, Nicolas Chapados, Mostofa Patwary, Nima Tajbakhsh, Yacine Jernite, Carlos Muñoz Ferrandis, Lingming Zhang, Sean Hughes, Thomas Wolf, Arjun Guha, Leandro von Werra, 和 Harm de Vries。
  131. StyleTTS 2 (来自 Columbia University) 随论文 StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models 发布,作者为 Yinghao Aaron Li, Cong Han, Vinay S. Raghavan, Gavin Mischler, Nima Mesgarani。
  132. Swin Transformer (来自 Microsoft) 随论文 Swin Transformer: Hierarchical Vision Transformer using Shifted Windows 发布,作者为 Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo。
  133. Swin2SR (来自 University of Würzburg) 随论文 Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration 发布,作者为 Marcos V. Conde, Ui-Jin Choi, Maxime Burchi, Radu Timofte。
  134. T5 (来自 Google AI) 随论文 Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer 发布,作者为 Colin Raffel 和 Noam Shazeer 和 Adam Roberts 和 Katherine Lee 和 Sharan Narang 和 Michael Matena 和 Yanqi Zhou 和 Wei Li 和 Peter J. Liu。
  135. T5v1.1 (来自 Google AI) 在仓库 google-research/text-to-text-transfer-transformer 中发布,作者为 Colin Raffel 和 Noam Shazeer 和 Adam Roberts 和 Katherine Lee 和 Sharan Narang 和 Michael Matena 和 Yanqi Zhou 和 Wei Li 和 Peter J. Liu。
  136. Table Transformer (来自 Microsoft Research) 随论文 PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents 发布,作者为 Brandon Smock, Rohith Pesala, Robin Abraham。
  137. TrOCR (来自 Microsoft),随论文 TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models 一起发布,作者为 Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei。
  138. Ultravox (来自 Fixie.ai) 在仓库 fixie-ai/ultravox 中发布,作者为 Fixie.ai 团队。
  139. UniSpeech (来自 Microsoft Research) 随论文 UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data 发布,作者为 Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang。
  140. UniSpeechSat (来自 Microsoft Research) 随论文 UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING 发布,作者为 Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu。
  141. Vision Transformer (ViT) (来自 Google AI) 随论文 An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale 发布,作者为 Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby。
  142. ViTMAE (来自 Meta AI) 随论文 Masked Autoencoders Are Scalable Vision Learners 发布,作者为 Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick。
  143. ViTMatte (来自 HUST-VL) 随论文 ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers 发布,作者为 Jingfeng Yao, Xinggang Wang, Shusheng Yang, Baoyuan Wang。
  144. ViTMSN (来自 Meta AI) 随论文 Masked Siamese Networks for Label-Efficient Learning 发布,作者为 Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas。
  145. ViTPose (来自悉尼大学) 随论文 ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation 发布,作者为 Yufei Xu, Jing Zhang, Qiming Zhang, Dacheng Tao。
  146. VITS (来自 Kakao Enterprise) 随论文 Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech 发布,作者为 Jaehyeon Kim, Jungil Kong, Juhee Son。
  147. Wav2Vec2 (来自 Facebook AI) 随论文 wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations 发布,作者为 Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli。
  148. Wav2Vec2-BERT (来自 Meta AI) 随论文 Seamless: Multilingual Expressive and Streaming Speech Translation 发布,作者为 Seamless Communication 团队。
  149. WavLM (来自 Microsoft Research) 随论文 WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing 发布,作者为 Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei。
  150. Whisper (来自 OpenAI) 随论文 Robust Speech Recognition via Large-Scale Weak Supervision 发布,作者为 Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever。
  151. XLM (来自 Facebook) 随论文 Cross-lingual Language Model Pretraining 一起发布,作者为 Guillaume Lample 和 Alexis Conneau。
  152. XLM-RoBERTa (来自 Facebook AI),随论文 Unsupervised Cross-lingual Representation Learning at Scale 一起发布,作者为 Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer 和 Veselin Stoyanov。
  153. YOLOS (来自华中科技大学) 随论文 You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection 发布,作者为 Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu。
< > 在 GitHub 上更新