更新日志

Jan 19, 2025

修复 LeViT safetensor 权重的加载，移除应已停用的转换代码
添加使用 SBB 配方训练的 ‘SO150M’ ViT 权重，效果不错，但不是 ImageNet-12k/1k 预训练/微调的最佳形状
- vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k_ft_in1k - 86.7% top-1
- vit_so150m_patch16_reg4_gap_384.sbb_e250_in12k_ft_in1k - 87.4% top-1
- vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k
其他类型、错别字等清理
1.0.14 版本发布以推出上述 LeViT 修复

Jan 9, 2025

添加在纯 bfloat16 或 float16 中训练和验证的支持
wandb 项目名称参数由 https://github.com/caojiaolong 添加，使用 arg.experiment 作为名称
修复在没有硬链接支持的文件系统上检查点保存不起作用的旧问题 (例如 FUSE fs 挂载)
1.0.13 版本发布

Jan 6, 2025

在 timm.models 中添加 torch.utils.checkpoint.checkpoint() 包装器，默认 use_reentrant=False，除非在 env 中设置 TIMM_REENTRANT_CKPT=1。

Dec 31, 2024

convnext_nano 384x384 ImageNet-12k 预训练和微调。 https://huggingface.co/models?search=convnext_nano%20r384
从 https://github.com/apple/ml-aim 添加 AIM-v2 编码器，请在 Hub 上查看： https://huggingface.co/models?search=timm%20aimv2
从 https://github.com/google-research/big_vision 向现有 PaliGemma 添加 PaliGemma2 编码器，请在 Hub 上查看： https://huggingface.co/models?search=timm%20pali2
添加缺少的 L/14 DFN2B 39B CLIP ViT, vit_large_patch14_clip_224.dfn2b_s39b
修复现有的 RmsNorm 层和 fn 以匹配标准公式，尽可能使用 PT 2.5 实现。将旧的实现移至 SimpleNorm 层，它是没有居中或偏置的 LN。只有两个 timm 模型使用它，并且它们已更新。
允许覆盖模型创建的 cache_dir 参数
传递 trust_remote_code 以用于 HF 数据集包装器
inception_next_atto 模型由创建者添加
Adan 优化器注意事项，以及 Lamb 解耦权重衰减选项
一些 feature_info 元数据由 https://github.com/brianhou0208 修复
所有使用加载时重映射的 OpenCLIP 和 JAX (CLIP, SigLIP, Pali 等) 模型权重都获得了自己的 HF Hub 实例，以便它们可以与 hf-hub: 基于加载一起使用，因此将与新的 Transformers TimmWrapperModel 一起使用

Nov 28, 2024

更多优化器
- 添加 MARS 优化器 (https://arxiv.org/abs/2411.10438, https://github.com/AGI-Arena/MARS)
- 添加 LaProp 优化器 (https://arxiv.org/abs/2002.04839, https://github.com/Z-T-WANG/LaProp-Optimizer)
- 从 ‘Cautious Optimizers’ (https://arxiv.org/abs/2411.16085, https://github.com/kyleliang919/C-Optim) 向 Adafactor, Adafactor Big Vision, AdamW (legacy), Adopt, Lamb, LaProp, Lion, NadamW, RMSPropTF, SGDW 添加掩码
- 清理一些关于优化器和工厂的文档字符串和类型注释
添加在 in12k 上预训练并在 384x384 的 in1k 上微调的 MobileNet-V4 Conv Medium 模型
添加小型 cs3darknet，速度相当快
- https://huggingface.co/timm/cs3darknet_focus_s.ra4_e3600_r256_in1k

Nov 12, 2024

优化器工厂重构
- 新工厂通过使用带有某些关键特征的 OptimInfo 数据类注册优化器来工作
- 将 list_optimizers、get_optimizer_class、get_optimizer_info 添加到重新设计的 create_optimizer_v2 fn 以探索优化器、获取信息或类
- 弃用 optim.optim_factory，将 fns 移动到 optim/_optim_factory.py 和 optim/_param_groups.py，并鼓励通过 timm.optim 导入
添加 Adopt (https://github.com/iShohei220/adopt) 优化器
添加 Adafactor 的 ‘Big Vision’ 变体 (https://github.com/google-research/big_vision/blob/main/big_vision/optax.py) 优化器
修复原始 Adafactor 以选择更好的卷积分解维度
调整 LAMB 优化器，自原始版本以来 torch.where 功能有一些改进，稍微重构了剪裁
vit、deit、eva 中的动态图像尺寸支持得到改进，以支持从非正方形补丁网格调整大小，感谢 https://github.com/wojtke

Oct 31, 2024

添加一组新的训练有素的 ResNet 和 ResNet-V2 18/34 (基本块) 权重。请参阅 https://huggingface.co/blog/rwightman/resnet-trick-or-treat

Oct 19, 2024

清理 torch amp 用法以避免 cuda 特定调用，合并来自 MengqingCao 的 Ascend (NPU) 设备支持，该支持现在应该可以在 PyTorch 2.5 中使用新的设备扩展自动加载功能。还在 Pytorch 2.5 中测试了 Intel Arc (XPU)，它 (大部分) 工作正常。

Oct 16, 2024

修复从已弃用的路径 timm.models.registry 导入时出错的问题，提高了现有弃用警告的优先级以使其可见
将 InternViT-300M (https://huggingface.co/OpenGVLab/InternViT-300M-448px) 的权重移植到 timm 作为 vit_intern300m_patch14_448

Oct 14, 2024

应请求添加了 18/18d/34/34d ResNet 模型定义的预激活 (ResNetV2) 版本 (权重待定)
发布 1.0.10 版本

Oct 11, 2024

添加了 MambaOut (https://github.com/yuweihao/MambaOut) 模型和权重。对没有 SSM 的 SSM 视觉模型 (本质上是带有门控的 ConvNeXt) 的一种大胆尝试。原始权重 + 自定义变体和权重的混合。

model	img_size	top1	top5	param_count
mambaout_base_plus_rw.sw_e150_r384_in12k_ft_in1k	384	87.506	98.428	101.66
mambaout_base_plus_rw.sw_e150_in12k_ft_in1k	288	86.912	98.236	101.66
mambaout_base_plus_rw.sw_e150_in12k_ft_in1k	224	86.632	98.156	101.66
mambaout_base_tall_rw.sw_e500_in1k	288	84.974	97.332	86.48
mambaout_base_wide_rw.sw_e500_in1k	288	84.962	97.208	94.45
mambaout_base_short_rw.sw_e500_in1k	288	84.832	97.27	88.83
mambaout_base.in1k	288	84.72	96.93	84.81
mambaout_small_rw.sw_e450_in1k	288	84.598	97.098	48.5
mambaout_small.in1k	288	84.5	96.974	48.49
mambaout_base_wide_rw.sw_e500_in1k	224	84.454	96.864	94.45
mambaout_base_tall_rw.sw_e500_in1k	224	84.434	96.958	86.48
mambaout_base_short_rw.sw_e500_in1k	224	84.362	96.952	88.83
mambaout_base.in1k	224	84.168	96.68	84.81
mambaout_small.in1k	224	84.086	96.63	48.49
mambaout_small_rw.sw_e450_in1k	224	84.024	96.752	48.5
mambaout_tiny.in1k	288	83.448	96.538	26.55
mambaout_tiny.in1k	224	82.736	96.1	26.55
mambaout_kobe.in1k	288	81.054	95.718	9.14
mambaout_kobe.in1k	224	79.986	94.986	9.14
mambaout_femto.in1k	288	79.848	95.14	7.3
mambaout_femto.in1k	224	78.87	94.408	7.3

SigLIP SO400M ViT 在 378x378 的 ImageNet-1k 上进行微调，为现有的 SigLIP 384x384 模型添加了 378x378 选项
- vit_so400m_patch14_siglip_378.webli_ft_in1k - 89.42 top-1
- vit_so400m_patch14_siglip_gap_378.webli_ft_in1k - 89.03
来自最近多语言 (i18n) 变体的 SigLIP SO400M ViT 编码器，patch16 @ 256x256 (https://huggingface.co/timm/ViT-SO400M-16-SigLIP-i18n-256)。OpenCLIP 更新待定。
添加两个 ConvNeXt ‘Zepto’ 模型和权重 (一个带有重叠 stem，另一个带有补丁 stem)。使用 RMSNorm，比以前的 ‘Atto’ 更小，2.2M 参数。
- convnext_zepto_rms_ols.ra4_e3600_r224_in1k - 73.20 top-1 @ 224
- convnext_zepto_rms.ra4_e3600_r224_in1k - 72.81 @ 224

Sept 2024

添加一套微小的测试模型，以改进单元测试和利基低资源应用程序 (https://huggingface.co/blog/rwightman/timm-tiny-test)
添加 MobileNetV4-Conv-Small (0.5x) 模型 (https://huggingface.co/posts/rwightman/793053396198664)
- mobilenetv4_conv_small_050.e3000_r224_in1k - 65.81 top-1 @ 256, 64.76 @ 224
添加使用 MNV4 Small 配方训练的 MobileNetV3-Large 变体
- mobilenetv3_large_150d.ra4_e3600_r256_in1k - 81.81 @ 320, 80.94 @ 256
- mobilenetv3_large_100.ra4_e3600_r224_in1k - 77.16 @ 256, 76.31 @ 224

Aug 21, 2024

更新了在 ImageNet-12k 上训练并在 ImageNet-1k 上微调的 SBB ViT 模型，挑战了许多更大、更慢的模型

model	top1	top5	param_count	img_size
vit_mediumd_patch16_reg4_gap_384.sbb2_e200_in12k_ft_in1k	87.438	98.256	64.11	384
vit_mediumd_patch16_reg4_gap_256.sbb2_e200_in12k_ft_in1k	86.608	97.934	64.11	256
vit_betwixt_patch16_reg4_gap_384.sbb2_e200_in12k_ft_in1k	86.594	98.02	60.4	384
vit_betwixt_patch16_reg4_gap_256.sbb2_e200_in12k_ft_in1k	85.734	97.61	60.4	256

带有 MNV4 基线挑战配方的 MobileNet-V1 1.25、EfficientNet-B1 和 ResNet50-D 权重

model	top1	top5	param_count	img_size
resnet50d.ra4_e3600_r224_in1k	81.838	95.922	25.58	288
efficientnet_b1.ra4_e3600_r240_in1k	81.440	95.700	7.79	288
resnet50d.ra4_e3600_r224_in1k	80.952	95.384	25.58	224
efficientnet_b1.ra4_e3600_r240_in1k	80.406	95.152	7.79	240
mobilenetv1_125.ra4_e3600_r224_in1k	77.600	93.804	6.27	256
mobilenetv1_125.ra4_e3600_r224_in1k	76.924	93.234	6.27	224

添加 SAM2 (HieraDet) 主干架构和权重加载支持
添加在 in12k 上使用 abswin pos embed 训练并在 1k 上微调的 Hiera Small 权重

model	top1	top5	param_count
hiera_small_abswin_256.sbb2_e200_in12k_ft_in1k	84.912	97.260	35.01
hiera_small_abswin_256.sbb2_pd_e200_in12k_ft_in1k	84.560	97.106	35.01

Aug 8, 2024

添加 RDNet (‘DenseNets Reloaded’, https://arxiv.org/abs/2403.19588), 感谢 Donghyun Kim

July 28, 2024

添加 mobilenet_edgetpu_v2_m 权重，带有 ra4 mnv4-small 基础配方。80.1% top-1 @ 224 和 80.7 @ 256。
发布 1.0.8 版本

July 26, 2024

更多 MobileNet-v4 权重、带有微调的 ImageNet-12k 预训练以及抗锯齿 ConvLarge 模型

model	top1	top1_err	top5	top5_err	param_count	img_size
mobilenetv4_conv_aa_large.e230_r448_in12k_ft_in1k	84.99	15.01	97.294	2.706	32.59	544
mobilenetv4_conv_aa_large.e230_r384_in12k_ft_in1k	84.772	15.228	97.344	2.656	32.59	480
mobilenetv4_conv_aa_large.e230_r448_in12k_ft_in1k	84.64	15.36	97.114	2.886	32.59	448
mobilenetv4_conv_aa_large.e230_r384_in12k_ft_in1k	84.314	15.686	97.102	2.898	32.59	384
mobilenetv4_conv_aa_large.e600_r384_in1k	83.824	16.176	96.734	3.266	32.59	480
mobilenetv4_conv_aa_large.e600_r384_in1k	83.244	16.756	96.392	3.608	32.59	384
mobilenetv4_hybrid_medium.e200_r256_in12k_ft_in1k	82.99	17.01	96.67	3.33	11.07	320
mobilenetv4_hybrid_medium.e200_r256_in12k_ft_in1k	82.364	17.636	96.256	3.744	11.07	256

令人印象深刻的 MobileNet-V1 和 EfficientNet-B0 基线挑战 (https://huggingface.co/blog/rwightman/mobilenet-baselines)

model	top1	top1_err	top5	top5_err	param_count	img_size
efficientnet_b0.ra4_e3600_r224_in1k	79.364	20.636	94.754	5.246	5.29	256
efficientnet_b0.ra4_e3600_r224_in1k	78.584	21.416	94.338	5.662	5.29	224
mobilenetv1_100h.ra4_e3600_r224_in1k	76.596	23.404	93.272	6.728	5.28	256
mobilenetv1_100.ra4_e3600_r224_in1k	76.094	23.906	93.004	6.996	4.23	256
mobilenetv1_100h.ra4_e3600_r224_in1k	75.662	24.338	92.504	7.496	5.28	224
mobilenetv1_100.ra4_e3600_r224_in1k	75.382	24.618	92.312	7.688	4.23	224

set_input_size() 的原型已添加到 vit 和 swin v1/v2 模型中，以允许在模型创建后更改图像尺寸、补丁尺寸、窗口尺寸。
swin 中改进了对不同尺寸处理的支持，除了 set_input_size 之外，__init__ 中还添加了 always_partition 和 strict_img_size 参数，以允许更灵活的输入尺寸约束。
修复中间 ‘Getter’ 特征包装器的乱序索引信息，检查相同的超出范围索引。
添加几个用于测试的微小 < 0.5M 参数模型，这些模型实际上是在 ImageNet-1k 上训练的

model	top1	top1_err	top5	top5_err	param_count	img_size	crop_pct
test_efficientnet.r160_in1k	47.156	52.844	71.726	28.274	0.36	192	1.0
test_byobnet.r160_in1k	46.698	53.302	71.674	28.326	0.46	192	1.0
test_efficientnet.r160_in1k	46.426	53.574	70.928	29.072	0.36	160	0.875
test_byobnet.r160_in1k	45.378	54.622	70.572	29.428	0.46	160	0.875
test_vit.r160_in1k	42.0	58.0	68.664	31.336	0.37	192	1.0
test_vit.r160_in1k	40.822	59.178	67.212	32.788	0.37	160	0.875

修复 vit reg token 初始化，感谢 Promisery
其他杂项修复

June 24, 2024

新增 3 个使用不同 MQA 权重初始化方案的 MobileNetV4 hyrid 权重

model	top1	top1_err	top5	top5_err	param_count	img_size
mobilenetv4_hybrid_large.ix_e600_r384_in1k	84.356	15.644	96.892	3.108	37.76	448
mobilenetv4_hybrid_large.ix_e600_r384_in1k	83.990	16.010	96.702	3.298	37.76	384
mobilenetv4_hybrid_medium.ix_e550_r384_in1k	83.394	16.606	96.760	3.240	11.07	448
mobilenetv4_hybrid_medium.ix_e550_r384_in1k	82.968	17.032	96.474	3.526	11.07	384
mobilenetv4_hybrid_medium.ix_e550_r256_in1k	82.492	17.508	96.278	3.722	11.07	320
mobilenetv4_hybrid_medium.ix_e550_r256_in1k	81.446	18.554	95.704	4.296	11.07	256

DaViT 模型中 florence2 权重加载

June 12, 2024

添加了 MobileNetV4 模型和初始的 timm 训练权重集

model	top1	top1_err	top5	top5_err	param_count	img_size
mobilenetv4_hybrid_large.e600_r384_in1k	84.266	15.734	96.936	3.064	37.76	448
mobilenetv4_hybrid_large.e600_r384_in1k	83.800	16.200	96.770	3.230	37.76	384
mobilenetv4_conv_large.e600_r384_in1k	83.392	16.608	96.622	3.378	32.59	448
mobilenetv4_conv_large.e600_r384_in1k	82.952	17.048	96.266	3.734	32.59	384
mobilenetv4_conv_large.e500_r256_in1k	82.674	17.326	96.31	3.69	32.59	320
mobilenetv4_conv_large.e500_r256_in1k	81.862	18.138	95.69	4.31	32.59	256
mobilenetv4_hybrid_medium.e500_r224_in1k	81.276	18.724	95.742	4.258	11.07	256
mobilenetv4_conv_medium.e500_r256_in1k	80.858	19.142	95.768	4.232	9.72	320
mobilenetv4_hybrid_medium.e500_r224_in1k	80.442	19.558	95.38	4.62	11.07	224
mobilenetv4_conv_blur_medium.e500_r224_in1k	80.142	19.858	95.298	4.702	9.72	256
mobilenetv4_conv_medium.e500_r256_in1k	79.928	20.072	95.184	4.816	9.72	256
mobilenetv4_conv_medium.e500_r224_in1k	79.808	20.192	95.186	4.814	9.72	256
mobilenetv4_conv_blur_medium.e500_r224_in1k	79.438	20.562	94.932	5.068	9.72	224
mobilenetv4_conv_medium.e500_r224_in1k	79.094	20.906	94.77	5.23	9.72	224
mobilenetv4_conv_small.e2400_r224_in1k	74.616	25.384	92.072	7.928	3.77	256
mobilenetv4_conv_small.e1200_r224_in1k	74.292	25.708	92.116	7.884	3.77	256
mobilenetv4_conv_small.e2400_r224_in1k	73.756	26.244	91.422	8.578	3.77	224
mobilenetv4_conv_small.e1200_r224_in1k	73.454	26.546	91.34	8.66	3.77	224

Apple MobileCLIP (https://arxiv.org/pdf/2311.17049, FastViT 和 ViT-B) 图像塔模型支持和权重已添加（OpenCLIP 支持的一部分）。
ViTamin (https://arxiv.org/abs/2404.02132) CLIP 图像塔模型和权重已添加（OpenCLIP 支持的一部分）。
OpenAI CLIP Modified ResNet 图像塔建模和权重支持（通过 ByobNet）。重构 AttentionPool2d。

May 14, 2024

支持将 PaliGemma jax 权重加载到带有平均池化的 SigLIP ViT 模型中。
添加来自 Meta 的 Hiera 模型 (https://github.com/facebookresearch/hiera)。
为 transforms 添加 normalize= 标志，返回带有原始 dytpe 的非标准化 torch.Tensor（用于 chug）
版本 1.0.3 发布

May 11, 2024

发布了 Searching for Better ViT Baselines (For the GPU Poor) 权重和 vit 变体。探索 Tiny 和 Base 之间的模型形状。

model	top1	top5	param_count	img_size
vit_mediumd_patch16_reg4_gap_256.sbb_in12k_ft_in1k	86.202	97.874	64.11	256
vit_betwixt_patch16_reg4_gap_256.sbb_in12k_ft_in1k	85.418	97.48	60.4	256
vit_mediumd_patch16_rope_reg1_gap_256.sbb_in1k	84.322	96.812	63.95	256
vit_betwixt_patch16_rope_reg4_gap_256.sbb_in1k	83.906	96.684	60.23	256
vit_base_patch16_rope_reg1_gap_256.sbb_in1k	83.866	96.67	86.43	256
vit_medium_patch16_rope_reg1_gap_256.sbb_in1k	83.81	96.824	38.74	256
vit_betwixt_patch16_reg4_gap_256.sbb_in1k	83.706	96.616	60.4	256
vit_betwixt_patch16_reg1_gap_256.sbb_in1k	83.628	96.544	60.4	256
vit_medium_patch16_reg4_gap_256.sbb_in1k	83.47	96.622	38.88	256
vit_medium_patch16_reg1_gap_256.sbb_in1k	83.462	96.548	38.88	256
vit_little_patch16_reg4_gap_256.sbb_in1k	82.514	96.262	22.52	256
vit_wee_patch16_reg1_gap_256.sbb_in1k	80.256	95.360	13.42	256
vit_pwee_patch16_reg1_gap_256.sbb_in1k	80.072	95.136	15.25	256
vit_mediumd_patch16_reg4_gap_256.sbb_in12k	N/A	N/A	64.11	256
vit_betwixt_patch16_reg4_gap_256.sbb_in12k	N/A	N/A	60.4	256

添加了 AttentionExtract 助手，用于从 timm 模型中提取注意力图。请参阅 https://github.com/huggingface/pytorch-image-models/discussions/1232#discussioncomment-9320949 中的示例
forward_intermediates() API 得到改进，并添加到更多模型中，包括一些具有其他提取方法的 ConvNet。
1047 个模型架构中的 1017 个支持 features_only=True 特征提取。剩余的 34 个架构可以支持，但基于优先级请求。
删除 torch.jit.script 注释函数，包括旧的 JIT 激活。与 dynamo 冲突，并且 dynamo 在使用时效果更好。

April 11, 2024

为期待已久的 1.0 版本做准备，事情已经稳定一段时间了。
长期以来缺失的重要功能，对具有扁平隐藏状态或非标准模块布局的 ViT 模型提供 features_only=True 支持（目前涵盖 'vit_*', 'twins_*', 'deit*', 'beit*', 'mvitv2*', 'eva*', 'samvit_*', 'flexivit*'）
上述功能支持是通过新的 forward_intermediates() API 实现的，该 API 可以与特征包装模块一起使用，也可以直接使用。

model = timm.create_model('vit_base_patch16_224')
final_feat, intermediates = model.forward_intermediates(input)
output = model.forward_head(final_feat)  # pooling + classifier head

print(final_feat.shape)
torch.Size([2, 197, 768])

for f in intermediates:
    print(f.shape)
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])

print(output.shape)
torch.Size([2, 1000])

model = timm.create_model('eva02_base_patch16_clip_224', pretrained=True, img_size=512, features_only=True, out_indices=(-3, -2,))
output = model(torch.randn(2, 3, 512, 512))

for o in output:
    print(o.shape)
torch.Size([2, 768, 32, 32])
torch.Size([2, 768, 32, 32])

添加了 TinyCLIP 视觉塔权重，感谢 Thien Tran

Feb 19, 2024

添加了 Next-ViT 模型。改编自 https://github.com/bytedance/Next-ViT
添加了 HGNet 和 PP-HGNetV2 模型。由 SeeFun 改编自 https://github.com/PaddlePaddle/PaddleClas
移除了 setup.py，迁移到 PDM 支持的基于 pyproject.toml 的构建
添加了使用 _for_each 的更新模型 EMA 实现，以减少开销
在训练脚本中支持非 GPU 设备的 device 参数
其他杂项修复和小幅添加
最低支持的 Python 版本提高到 3.8
发布 0.9.16 版本

Jan 8, 2024

数据集和转换重构

HuggingFace 流式（可迭代）数据集支持 (--dataset hfids:org/dataset)
Webdataset 包装器调整，以改进拆分信息获取，可以从受支持的 HF hub webdataset 自动获取拆分
使用最近上传到 https://huggingface.co/timm 的 timm ImageNet，测试了来自 HF hub 的 HF datasets 和 webdataset 包装器流式传输
使输入和目标列/字段键在数据集之间保持一致，并通过 args 传递
使用例如 --input-size 1 224 224 或 --in-chans 1 时的完全单色支持，在数据集中适当地设置 PIL 图像转换
改进了几个备用裁剪和调整大小转换（ResizeKeepRatio、RandomCropOrPad 等），用于 PixParse 文档 AI 项目
将 SimCLR 样式颜色抖动概率以及灰度和高斯模糊选项添加到增强和 args
允许在没有验证集的情况下进行训练 (--val-split '') 在训练脚本中
添加 --bce-sum（类维度求和）和 --bce-pos-weight（正权重）args 用于训练，因为它们是我经常硬编码的常见 BCE 损失调整

Nov 23, 2023

添加了 EfficientViT-Large 模型，感谢 SeeFun
修复 Python 3.7 兼容性，即将停止对其的支持
其他杂项修复
发布 0.9.12 版本

Nov 20, 2023

通过 model_args 配置条目，为基于 Hugging Face Hub 的 timm 模型添加了重要的灵活性。model_args 将在创建时作为 kwargs 传递给模型。
- 请参阅 https://huggingface.co/gaunernst/vit_base_patch16_1024_128.audiomae_as2m_ft_as20k/blob/main/config.json 中的示例
- 用法：https://github.com/huggingface/pytorch-image-models/discussions/2035
使用最新模型更新了 imagenet eval 和测试集 csv 文件
Laureηt 对 vision_transformer.py 进行了类型标注和文档清理
0.9.11 版本发布

Nov 3, 2023

添加了 DFN (数据过滤网络) 和 MetaCLIP ViT 权重
添加了 DINOv2 ‘register’ ViT 模型权重 (https://huggingface.ac.cn/papers/2309.16588, https://huggingface.co/papers/2304.07193)
为使用它的 OpenAI、DFN、MetaCLIP 权重添加了 quickgelu ViT 变体（效率较低）
感谢 Aryan，为 ResNet、MobileNet-v3 添加了改进的类型标注
ImageNet-12k 微调（来自 LAION-2B CLIP）convnext_xxlarge
0.9.9 版本发布

Oct 20, 2023

vision_transformer.py 中支持 SigLIP 图像塔权重。
- 在微调和下游特征使用方面具有巨大潜力。
根据 Vision Transformers Need Registers，在 vit 模型中进行实验性 ‘register’ 支持
使用新的权重版本更新了 RepViT。感谢 wangao
向 Swin 模型添加了补丁大小调整支持（在预训练权重加载时）
0.9.8 版本即将发布

Sep 1, 2023

由 SeeFun 添加了 TinyViT
修复 EfficientViT (MIT) 以使用 torch.autocast，使其可以向后兼容 PT 1.10
0.9.7 版本发布

Aug 28, 2023

向 vision_transformer.py, vision_transformer_hybrid.py, deit.py 和 eva.py 中的模型添加动态图像大小支持，且不破坏向后兼容性。
- 在模型创建时向 args 添加 dynamic_img_size=True，以允许更改网格大小（每次前向传递都插值 abs 和/或 ROPE pos embed）。
- 添加 dynamic_img_pad=True 以允许图像大小不能被补丁大小整除（每次前向传递都将右下角填充到补丁大小）。
- 除非将 PatchEmbed 模块添加为叶节点，否则启用任一动态模式都会破坏 FX 跟踪。
- 通过在创建时传递不同的 img_size（一次插值预训练嵌入权重）来调整位置嵌入的现有方法仍然有效。
- 更改 patch_size（一次调整预训练 patch_embed 权重）的现有方法仍然有效。
- 示例验证命令 python validate.py --data-dir /imagenet --model vit_base_patch16_224 --amp --amp-dtype bfloat16 --img-size 255 --crop-pct 1.0 --model-kwargs dynamic_img_size=True dyamic_img_pad=True

Aug 25, 2023

自上次发布以来，新增了许多模型
- FastViT - https://arxiv.org/abs/2303.14189
- MobileOne - https://arxiv.org/abs/2206.04040
- InceptionNeXt - https://arxiv.org/abs/2303.16900
- RepGhostNet - https://arxiv.org/abs/2211.06088 (感谢 https://github.com/ChengpengChen)
- GhostNetV2 - https://arxiv.org/abs/2211.12905 (感谢 https://github.com/yehuitang)
- EfficientViT (MSRA) - https://arxiv.org/abs/2305.07027 (感谢 https://github.com/seefun)
- EfficientViT (MIT) - https://arxiv.org/abs/2205.14756 (感谢 https://github.com/seefun)
向 benchmark.py, onnx_export.py 和 validate.py 添加了 --reparam arg，以触发对具有 reparameterize(), switch_to_deploy() 或 fuse() 中任何一个模型的层重新参数化/融合
- 包括 FastViT、MobileOne、RepGhostNet、EfficientViT (MSRA)、RepViT、RepVGG 和 LeViT
准备 0.9.6 ‘返校’ 版本

Aug 11, 2023

Swin、MaxViT、CoAtNet 和 BEiT 模型支持在创建时调整图像/窗口大小，并适应预训练权重
示例验证命令，用于测试非方形调整大小 python validate.py --data-dir /imagenet --model swin_base_patch4_window7_224.ms_in22k_ft_in1k --amp --amp-dtype bfloat16 --input-size 3 256 320 --model-kwargs window_size=8,10 img_size=256,320

Aug 3, 2023

为 HRNet w18_small 和 w18_small_v2 添加了 GluonCV 权重。由 SeeFun 转换
修复 selecsls* 模型命名回归
ViT/EVA 的补丁和位置嵌入适用于加载时的 bfloat16/float16 权重（或用于即时调整大小的激活）
v0.9.5 版本准备

July 27, 2023

添加了 timm 训练的 seresnextaa201d_32x8d.sw_in12k_ft_in1k_384 权重（和 .sw_in12k 预训练），在 ImageNet-1k 上达到 87.3% 的 top-1 准确率，这是我所知的最佳 ImageNet ResNet 系列模型。
由 wangao 添加了 RepViT 模型和权重 (https://arxiv.org/abs/2307.09283)
由 SeeFun 添加了 I-JEPA ViT 特征权重（无分类器）
由 SeeFun 添加了 SAM-ViT（segment anything）特征权重（无分类器）
为 EfficientNet 添加了对备用特征提取方法和 -ve 索引的支持
添加了 NAdamW 优化器
杂项修复

May 11, 2023

timm 0.9 发布，从 0.8.xdev 版本过渡

May 10, 2023

Hugging Face Hub 下载现在是默认设置，https://huggingface.co/timm 上有 1132 个模型，timm 中有 1163 个权重
感谢 Leng Yue，添加了 DINOv2 vit 特征骨干权重
添加了 FB MAE vit 特征骨干权重
添加了 OpenCLIP DataComp-XL L/14 feat 骨干权重
由 Fredo Guan 添加了 MetaFormer（poolformer-v2、caformer、convformer、更新的 poolformer (v1)）及其权重
vit/deit 模型上用于抓取隐藏状态的实验性 get_intermediate_layers 函数（灵感来自 DINO 实现）。这是 WIP，可能会发生重大变化……欢迎反馈。
如果 pretrained=True 且不存在权重，则模型创建会抛出错误（而不是继续随机初始化）
修复了原始分类器中具有 1001 个类别的 inception / nasnet TF 来源权重的回归问题
将 bitsandbytes (https://github.com/TimDettmers/bitsandbytes) 优化器添加到工厂，使用 bnb 前缀，例如 bnbadam8bit
杂项清理和修复
切换到 0.9 并将 timm 从预发布状态中移除之前的最终测试

April 27, 2023

97% 的 timm 模型已上传到 HF Hub，并且几乎所有模型都已更新以支持多权重预训练配置
在添加多权重时，对另一批模型进行了少量清理和重构。更多 fused_attn (F.sdpa) 和 features_only 支持，以及 torchscript 修复。

April 21, 2023

梯度累积支持已添加到训练脚本并经过测试 (--grad-accum-steps)，感谢 Taeksang Kim
HF Hub 上有更多权重（cspnet、cait、volo、xcit、tresnet、hardcorenas、densenet、dpn、vovnet、xception_aligned）
向 train.py 添加了 --head-init-scale 和 --head-init-bias，以缩放分类器头并为微调设置固定偏差
删除所有 InplaceABN (inplace_abn) 用法，将 tresnet 中的用法替换为标准 BatchNorm（相应地修改了权重）。

April 12, 2023

添加了 ONNX 导出脚本、验证脚本、我长期以来一直在使用的助手。调整 ‘same’ 填充，以更好地导出最近的 ONNX + pytorch。
重构 vit 和类 vit 模型的 dropout args，将 drop_rate 分离为 drop_rate（分类器 dropout）、proj_drop_rate（块 mlp / 输出投影）、pos_drop_rate（位置嵌入 dropout）、attn_drop_rate（注意力 dropout）。还为 vit 和 eva 模型添加了补丁 dropout (FLIP)。
更多 vit 模型支持 fused F.scaled_dot_product_attention，添加 env var (TIMM_FUSED_ATTN) 来控制，以及配置界面来启用/禁用
添加了带有图像塔权重的 EVA-CLIP 骨干网络，一直到 4B 参数 ‘enormous’ 模型，以及遗漏的 336x336 OpenAI ViT 模式。

April 5, 2023

所有 ResNet 模型都已推送到 Hugging Face Hub，并提供多权重支持
- 添加了所有过去的 timm 训练权重，并带有基于 recipe 的标签以区分
- 所有 ResNet strikes back A1/A2/A3 (seed 0) 和 R50 示例 B/C1/C2/D 权重均可用
- 将 torchvision v2 recipe 权重添加到现有的 torchvision 原始权重
- 请参阅 https://huggingface.co/timm/seresnextaa101d_32x8d.sw_in12k_ft_in1k_288#model-comparison 中的比较表
新的 ImageNet-12k + ImageNet-1k 微调可用于一些抗锯齿 ResNet 模型
- resnetaa50d.sw_in12k_ft_in1k - 81.7 @ 224, 82.6 @ 288
- resnetaa101d.sw_in12k_ft_in1k - 83.5 @ 224, 84.1 @ 288
- seresnextaa101d_32x8d.sw_in12k_ft_in1k - 86.0 @ 224, 86.5 @ 288
- seresnextaa101d_32x8d.sw_in12k_ft_in1k_288 - 86.5 @ 288, 86.7 @ 320

2023年3月31日

为 convnext-base/large CLIP 模型添加首个 ConvNext-XXLarge CLIP -> IN-1k 微调和 IN-12k 中间微调。

model	top1	top5	img_size	param_count	gmacs	macts
convnext_xxlarge.clip_laion2b_soup_ft_in1k	88.612	98.704	256	846.47	198.09	124.45
convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_384	88.312	98.578	384	200.13	101.11	126.74
convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_320	87.968	98.47	320	200.13	70.21	88.02
convnext_base.clip_laion2b_augreg_ft_in12k_in1k_384	87.138	98.212	384	88.59	45.21	84.49
convnext_base.clip_laion2b_augreg_ft_in12k_in1k	86.344	97.97	256	88.59	20.09	37.55

添加 EVA-02 MIM 预训练和微调权重，推送到 HF Hub，并更新所有 EVA 模型的模型卡。首个 Top-1 准确率超过 90%（Top-5 准确率 99%）的模型！请查看 https://github.com/baaivision/EVA 上的原始代码和权重，了解更多关于他们融合 MIM、CLIP 以及大量模型、数据集和训练配方调整的工作细节。

model	top1	top5	param_count	img_size
eva02_large_patch14_448.mim_m38m_ft_in22k_in1k	90.054	99.042	305.08	448
eva02_large_patch14_448.mim_in22k_ft_in22k_in1k	89.946	99.01	305.08	448
eva_giant_patch14_560.m30m_ft_in22k_in1k	89.792	98.992	1014.45	560
eva02_large_patch14_448.mim_in22k_ft_in1k	89.626	98.954	305.08	448
eva02_large_patch14_448.mim_m38m_ft_in1k	89.57	98.918	305.08	448
eva_giant_patch14_336.m30m_ft_in22k_in1k	89.56	98.956	1013.01	336
eva_giant_patch14_336.clip_ft_in1k	89.466	98.82	1013.01	336
eva_large_patch14_336.in22k_ft_in22k_in1k	89.214	98.854	304.53	336
eva_giant_patch14_224.clip_ft_in1k	88.882	98.678	1012.56	224
eva02_base_patch14_448.mim_in22k_ft_in22k_in1k	88.692	98.722	87.12	448
eva_large_patch14_336.in22k_ft_in1k	88.652	98.722	304.53	336
eva_large_patch14_196.in22k_ft_in22k_in1k	88.592	98.656	304.14	196
eva02_base_patch14_448.mim_in22k_ft_in1k	88.23	98.564	87.12	448
eva_large_patch14_196.in22k_ft_in1k	87.934	98.504	304.14	196
eva02_small_patch14_336.mim_in22k_ft_in1k	85.74	97.614	22.13	336
eva02_tiny_patch14_336.mim_in22k_ft_in1k	80.658	95.524	5.76	336

DeiT 和基于 MLP-Mixer 模型的 Multi-weight 和 HF hub

2023年3月22日

更多权重被推送到 HF Hub 以及 multi-weight 支持，包括：regnet.py、rexnet.py、byobnet.py、resnetv2.py、swin_transformer.py、swin_transformer_v2.py、swin_transformer_v2_cr.py
Swin Transformer 模型支持特征提取（swinv2_cr_* 的 NCHW 特征图，以及所有其他模型的 NHWC 特征图）和空间嵌入输出。
FocalNet（来自 https://github.com/microsoft/FocalNet）模型和权重已添加，并进行了重大重构、特征提取，且没有固定的分辨率/尺寸约束
RegNet 权重已增加，并推送了 HF hub、SWAG、SEER 和 torchvision v2 权重。SEER 在模型尺寸方面的性能相当差，但可能有用。
更多 ImageNet-12k 预训练和 1k 微调的 timm 权重
- rexnetr_200.sw_in12k_ft_in1k - 82.6 @ 224, 83.2 @ 288
- rexnetr_300.sw_in12k_ft_in1k - 84.0 @ 224, 84.5 @ 288
- regnety_120.sw_in12k_ft_in1k - 85.0 @ 224, 85.4 @ 288
- regnety_160.lion_in12k_ft_in1k - 85.6 @ 224, 86.0 @ 288
- regnety_160.sw_in12k_ft_in1k - 85.6 @ 224, 86.0 @ 288（与 SWAG PT + 1k FT 相比，这相同，但分辨率低得多，远超 SEER FT）
添加了模型名称弃用 + 重映射功能（这是将 0.8.x 从预发布版中移除的里程碑）。正在添加映射…
次要错误修复和改进。

2023年2月26日

添加 ConvNeXt-XXLarge CLIP 预训练图像塔权重，用于微调和特征提取（微调待定）— 请参阅模型卡
更新 convnext_xxlarge 默认 LayerNorm eps 为 1e-5（用于 CLIP 权重，提高稳定性）
0.8.15dev0

2023年2月20日

添加 320x320 convnext_large_mlp.clip_laion2b_ft_320 和 convnext_large_mlp.clip_laion2b_ft_soup_320 CLIP 图像塔权重，用于特征提取和微调
0.8.13dev0 pypi 版本发布，包含最新更改，并迁移到 huggingface org

2023年2月16日

添加了 safetensor 检查点支持
添加了来自 ‘Scaling Vision Transformers to 22 B. Params’ 的想法 (https://arxiv.org/abs/2302.05442) — qk 范数、RmsNorm、并行块
为 vit_*、vit_relpos_*、coatnet/maxxvit（首先）添加 F.scaled_dot_product_attention 支持（仅限 PyTorch 2.0）
添加了 Lion 优化器（带有 multi-tensor 选项）（https://arxiv.org/abs/2302.06675）
梯度检查点与 features_only=True 一起使用

2023年2月7日

新的推理基准测试数字已添加到 results 文件夹中。
添加 convnext LAION CLIP 训练权重和初始的 in1k 微调集
- convnext_base.clip_laion2b_augreg_ft_in1k - 86.2% @ 256x256
- convnext_base.clip_laiona_augreg_ft_in1k_384 - 86.5% @ 384x384
- convnext_large_mlp.clip_laion2b_augreg_ft_in1k - 87.3% @ 256x256
- convnext_large_mlp.clip_laion2b_augreg_ft_in1k_384 - 87.9% @ 384x384
添加 DaViT 模型。支持 features_only=True。由 Fredo 从 https://github.com/dingmyu/davit 改编而来。
在 MaxViT、ConvNeXt、DaViT 中使用通用的 NormMlpClassifierHead
添加 EfficientFormer-V2 模型，更新 EfficientFormer，并重构 LeViT（密切相关的架构）。权重在 HF hub 上。
- 新的 EfficientFormer-V2 架构，与 (https://github.com/snap-research/EfficientFormer) 的原始架构相比进行了重大重构。支持 features_only=True。
- EfficientFormer 的次要更新。
- 将 LeViT 模型重构为 stages，为新的 conv 变体添加 features_only=True 支持，需要权重重映射。
将 ImageNet 元数据（synsets、索引）从 /results 移动到 timm/data/_info。
添加 ImageNetInfo / DatasetInfo 类，以提供 timm 中各种 ImageNet 分类器布局的标签
- 更新 inference.py 以使用，尝试：python inference.py --data-dir /folder/to/images --model convnext_small.in12k --label-type detail --topk 5
准备好 0.8.10 pypi 预发布版（最终测试）。

2023年1月20日

添加两个 convnext 12k -> 1k 微调，分辨率为 384x384
- convnext_tiny.in12k_ft_in1k_384 - 85.1 @ 384
- convnext_small.in12k_ft_in1k_384 - 86.2 @ 384
将所有 MaxxViT 权重推送到 HF hub，并为 rw base MaxViT 和 CoAtNet 1/2 模型添加新的 ImageNet-12k -> 1k 微调

model	top1	top5	samples / sec	Params (M)	GMAC	Act (M)
maxvit_xlarge_tf_512.in21k_ft_in1k	88.53	98.64	21.76	475.77	534.14	1413.22
maxvit_xlarge_tf_384.in21k_ft_in1k	88.32	98.54	42.53	475.32	292.78	668.76
maxvit_base_tf_512.in21k_ft_in1k	88.20	98.53	50.87	119.88	138.02	703.99
maxvit_large_tf_512.in21k_ft_in1k	88.04	98.40	36.42	212.33	244.75	942.15
maxvit_large_tf_384.in21k_ft_in1k	87.98	98.56	71.75	212.03	132.55	445.84
maxvit_base_tf_384.in21k_ft_in1k	87.92	98.54	104.71	119.65	73.80	332.90
maxvit_rmlp_base_rw_384.sw_in12k_ft_in1k	87.81	98.37	106.55	116.14	70.97	318.95
maxxvitv2_rmlp_base_rw_384.sw_in12k_ft_in1k	87.47	98.37	149.49	116.09	72.98	213.74
coatnet_rmlp_2_rw_384.sw_in12k_ft_in1k	87.39	98.31	160.80	73.88	47.69	209.43
maxvit_rmlp_base_rw_224.sw_in12k_ft_in1k	86.89	98.02	375.86	116.14	23.15	92.64
maxxvitv2_rmlp_base_rw_224.sw_in12k_ft_in1k	86.64	98.02	501.03	116.09	24.20	62.77
maxvit_base_tf_512.in1k	86.60	97.92	50.75	119.88	138.02	703.99
coatnet_2_rw_224.sw_in12k_ft_in1k	86.57	97.89	631.88	73.87	15.09	49.22
maxvit_large_tf_512.in1k	86.52	97.88	36.04	212.33	244.75	942.15
coatnet_rmlp_2_rw_224.sw_in1k	86.49	97.90	620.58	73.88	15.18	54.78
maxvit_base_tf_384.in1k	86.29	97.80	101.09	119.65	73.80	332.90
maxvit_large_tf_384.in1k	86.23	97.69	70.56	212.03	132.55	445.84
maxvit_small_tf_512.in1k	86.10	97.76	88.63	69.13	67.26	383.77
maxvit_tiny_tf_512.in1k	85.67	97.58	144.25	31.05	33.49	257.59
maxvit_small_tf_384.in1k	85.54	97.46	188.35	69.02	35.87	183.65
maxvit_tiny_tf_384.in1k	85.11	97.38	293.46	30.98	17.53	123.42
maxvit_large_tf_224.in1k	84.93	96.97	247.71	211.79	43.68	127.35
coatnet_rmlp_1_rw2_224.sw_in12k_ft_in1k	84.90	96.96	1025.45	41.72	8.11	40.13
maxvit_base_tf_224.in1k	84.85	96.99	358.25	119.47	24.04	95.01
maxxvit_rmlp_small_rw_256.sw_in1k	84.63	97.06	575.53	66.01	14.67	58.38
coatnet_rmlp_2_rw_224.sw_in1k	84.61	96.74	625.81	73.88	15.18	54.78
maxvit_rmlp_small_rw_224.sw_in1k	84.49	96.76	693.82	64.90	10.75	49.30
maxvit_small_tf_224.in1k	84.43	96.83	647.96	68.93	11.66	53.17
maxvit_rmlp_tiny_rw_256.sw_in1k	84.23	96.78	807.21	29.15	6.77	46.92
coatnet_1_rw_224.sw_in1k	83.62	96.38	989.59	41.72	8.04	34.60
maxvit_tiny_rw_224.sw_in1k	83.50	96.50	1100.53	29.06	5.11	33.11
maxvit_tiny_tf_224.in1k	83.41	96.59	1004.94	30.92	5.60	35.78
coatnet_rmlp_1_rw_224.sw_in1k	83.36	96.45	1093.03	41.69	7.85	35.47
maxxvitv2_nano_rw_256.sw_in1k	83.11	96.33	1276.88	23.70	6.26	23.05
maxxvit_rmlp_nano_rw_256.sw_in1k	83.03	96.34	1341.24	16.78	4.37	26.05
maxvit_rmlp_nano_rw_256.sw_in1k	82.96	96.26	1283.24	15.50	4.47	31.92
maxvit_nano_rw_256.sw_in1k	82.93	96.23	1218.17	15.45	4.46	30.28
coatnet_bn_0_rw_224.sw_in1k	82.39	96.19	1600.14	27.44	4.67	22.04
coatnet_0_rw_224.sw_in1k	82.39	95.84	1831.21	27.44	4.43	18.73
coatnet_rmlp_nano_rw_224.sw_in1k	82.05	95.87	2109.09	15.15	2.62	20.34
coatnext_nano_rw_224.sw_in1k	81.95	95.92	2525.52	14.70	2.47	12.80
coatnet_nano_rw_224.sw_in1k	81.70	95.64	2344.52	15.14	2.41	15.41
maxvit_rmlp_pico_rw_256.sw_in1k	80.53	95.21	1594.71	7.52	1.85	24.86

2023年1月11日

更新 ConvNeXt ImageNet-12k 预训练系列，包含两个新的微调权重（和预 FT .in12k 标签）
- convnext_nano.in12k_ft_in1k - 82.3 @ 224, 82.9 @ 288（之前发布）
- convnext_tiny.in12k_ft_in1k - 84.2 @ 224, 84.5 @ 288
- convnext_small.in12k_ft_in1k - 85.2 @ 224, 85.3 @ 288

2023年1月6日

最终添加了 --model-kwargs 和 --opt-kwargs 到脚本，以直接从命令行将罕见参数传递到模型类
- train.py --data-dir /imagenet --model resnet50 --amp --model-kwargs output_stride=16 act_layer=silu
- train.py --data-dir /imagenet --model vit_base_patch16_clip_224 --img-size 240 --amp --model-kwargs img_size=240 patch_size=12
清理了一些流行的模型，以更好地支持参数传递/与模型配置合并，还有更多工作要做。

2023年1月5日

ConvNeXt-V2 模型和权重已添加到现有的 convnext.py 中
- 论文：ConvNeXt V2：通过掩码自动编码器共同设计和缩放 ConvNet
- 参考实现：https://github.com/facebookresearch/ConvNeXt-V2 （注意：权重目前为 CC-BY-NC）@dataclass

2022年12月23日 🎄☃

添加来自 https://github.com/google-research/big_vision 的 FlexiViT 模型和权重（请查看 https://arxiv.org/abs/2212.08013 上的论文）
- 注意：目前调整大小在模型创建时是静态的，正在进行中的是在运行时动态/训练补丁大小采样
更多模型已更新为 multi-weight 并且现在可以通过 HF hub 下载（convnext、efficientnet、mobilenet、vision_transformer*、beit）
更多模型预训练标签和调整，一些模型名称已更改（正在处理弃用翻译，目前将 main 分支视为 DEV 分支，对于稳定使用，请使用 0.6.x）
更多 ImageNet-12k（22k 的子集）预训练模型涌现
- efficientnet_b5.in12k_ft_in1k - 85.9 @ 448x448
- vit_medium_patch16_gap_384.in12k_ft_in1k - 85.5 @ 384x384
- vit_medium_patch16_gap_256.in12k_ft_in1k - 84.5 @ 256x256
- convnext_nano.in12k_ft_in1k - 82.9 @ 288x288

2022年12月8日

将 ‘EVA l’ 添加到 vision_transformer.py，MAE 风格的 ViT-L/14 MIM 预训练，带有 EVA-CLIP 目标，在 ImageNet-1k 上进行 FT（对于某些模型，使用 ImageNet-22k 中间数据集）
- 原始来源：https://github.com/baaivision/EVA

model	top1	param_count	gmac	macts	hub
eva_large_patch14_336.in22k_ft_in22k_in1k	89.2	304.5	191.1	270.2	link
eva_large_patch14_336.in22k_ft_in1k	88.7	304.5	191.1	270.2	link
eva_large_patch14_196.in22k_ft_in22k_in1k	88.6	304.1	61.6	63.5	link
eva_large_patch14_196.in22k_ft_in1k	87.9	304.1	61.6	63.5	link

2022年12月6日

将 ‘EVA g’，BEiT 风格的 ViT-g/14 模型权重添加到 beit.py，包括 MIM 预训练和 CLIP 预训练。
- 原始来源：https://github.com/baaivision/EVA
- 论文：https://arxiv.org/abs/2211.07636

model	top1	param_count	gmac	macts	hub
eva_giant_patch14_560.m30m_ft_in22k_in1k	89.8	1014.4	1906.8	2577.2	link
eva_giant_patch14_336.m30m_ft_in22k_in1k	89.6	1013	620.6	550.7	link
eva_giant_patch14_336.clip_ft_in1k	89.4	1013	620.6	550.7	link
eva_giant_patch14_224.clip_ft_in1k	89.1	1012.6	267.2	192.6	link

2022年12月5日

multi-weight 支持的预发布版 (0.8.0dev0) (model_arch.pretrained_tag)。使用 pip install --pre timm 安装
- vision_transformer、maxvit、convnext 是首批支持的模型实现
- 模型名称正在随之更改（之前的 _21k 等函数将合并），仍在整理弃用处理
- 可能存在错误，但我需要反馈，所以请尝试一下
- 如果需要稳定性，请使用 0.6.x pypi 版本或从 0.6.x 分支克隆
train/validate/inference/benchmark 中添加了对 PyTorch 2.0 compile 的支持，使用 --torchcompile 参数
推理脚本允许更多地控制输出，选择 k 以获取 top-class 索引 + prob json、csv 或 parquet 输出
添加来自 LAION-2B 和原始 OpenAI CLIP 模型的一整套微调 CLIP 图像塔权重

model	top1	param_count	gmac	macts	hub
vit_huge_patch14_clip_336.laion2b_ft_in12k_in1k	88.6	632.5	391	407.5	link
vit_large_patch14_clip_336.openai_ft_in12k_in1k	88.3	304.5	191.1	270.2	link
vit_huge_patch14_clip_224.laion2b_ft_in12k_in1k	88.2	632	167.4	139.4	link
vit_large_patch14_clip_336.laion2b_ft_in12k_in1k	88.2	304.5	191.1	270.2	link
vit_large_patch14_clip_224.openai_ft_in12k_in1k	88.2	304.2	81.1	88.8	link
vit_large_patch14_clip_224.laion2b_ft_in12k_in1k	87.9	304.2	81.1	88.8	link
vit_large_patch14_clip_224.openai_ft_in1k	87.9	304.2	81.1	88.8	link
vit_large_patch14_clip_336.laion2b_ft_in1k	87.9	304.5	191.1	270.2	link
vit_huge_patch14_clip_224.laion2b_ft_in1k	87.6	632	167.4	139.4	link
vit_large_patch14_clip_224.laion2b_ft_in1k	87.3	304.2	81.1	88.8	link
vit_base_patch16_clip_384.laion2b_ft_in12k_in1k	87.2	86.9	55.5	101.6	link
vit_base_patch16_clip_384.openai_ft_in12k_in1k	87	86.9	55.5	101.6	link
vit_base_patch16_clip_384.laion2b_ft_in1k	86.6	86.9	55.5	101.6	link
vit_base_patch16_clip_384.openai_ft_in1k	86.2	86.9	55.5	101.6	link
vit_base_patch16_clip_224.laion2b_ft_in12k_in1k	86.2	86.6	17.6	23.9	link
vit_base_patch16_clip_224.openai_ft_in12k_in1k	85.9	86.6	17.6	23.9	link
vit_base_patch32_clip_448.laion2b_ft_in12k_in1k	85.8	88.3	17.9	23.9	link
vit_base_patch16_clip_224.laion2b_ft_in1k	85.5	86.6	17.6	23.9	link
vit_base_patch32_clip_384.laion2b_ft_in12k_in1k	85.4	88.3	13.1	16.5	link
vit_base_patch16_clip_224.openai_ft_in1k	85.3	86.6	17.6	23.9	link
vit_base_patch32_clip_384.openai_ft_in12k_in1k	85.2	88.3	13.1	16.5	link
vit_base_patch32_clip_224.laion2b_ft_in12k_in1k	83.3	88.2	4.4	5	link
vit_base_patch32_clip_224.laion2b_ft_in1k	82.6	88.2	4.4	5	link
vit_base_patch32_clip_224.openai_ft_in1k	81.9	88.2	4.4	5	link

从官方实现 https://github.com/google-research/maxvit 移植 MaxViT Tensorflow 权重
- 对于放大的 384/512 in21k 微调权重，存在比预期更大的下降，可能缺少细节，但 21k FT 似乎对小的预处理很敏感

model	top1	param_count	gmac	macts	hub
maxvit_xlarge_tf_512.in21k_ft_in1k	88.5	475.8	534.1	1413.2	link
maxvit_xlarge_tf_384.in21k_ft_in1k	88.3	475.3	292.8	668.8	link
maxvit_base_tf_512.in21k_ft_in1k	88.2	119.9	138	704	link
maxvit_large_tf_512.in21k_ft_in1k	88	212.3	244.8	942.2	link
maxvit_large_tf_384.in21k_ft_in1k	88	212	132.6	445.8	link
maxvit_base_tf_384.in21k_ft_in1k	87.9	119.6	73.8	332.9	link
maxvit_base_tf_512.in1k	86.6	119.9	138	704	link
maxvit_large_tf_512.in1k	86.5	212.3	244.8	942.2	link
maxvit_base_tf_384.in1k	86.3	119.6	73.8	332.9	link
maxvit_large_tf_384.in1k	86.2	212	132.6	445.8	link
maxvit_small_tf_512.in1k	86.1	69.1	67.3	383.8	link
maxvit_tiny_tf_512.in1k	85.7	31	33.5	257.6	link
maxvit_small_tf_384.in1k	85.5	69	35.9	183.6	link
maxvit_tiny_tf_384.in1k	85.1	31	17.5	123.4	link
maxvit_large_tf_224.in1k	84.9	211.8	43.7	127.4	link
maxvit_base_tf_224.in1k	84.9	119.5	24	95	link
maxvit_small_tf_224.in1k	84.4	68.9	11.7	53.2	link
maxvit_tiny_tf_224.in1k	83.4	30.9	5.6	35.8	link

2022年10月15日

训练和验证脚本增强
非 GPU（即 CPU）设备支持
train 脚本的 SLURM 兼容性
HF 数据集支持（通过 ReaderHfds）
TFDS/WDS 数据加载改进（针对分布式使用的样本填充/包装已修复，与样本计数估计相关）
脚本/加载器的 in_chans !=3 支持
Adan 优化器
可以通过 args 启用每步 LR 调度
数据集“解析器”重命名为“读取器”，更具描述性
AMP 参数已更改，通过 --amp-impl apex 使用 APEX，通过 --amp-dtype bfloat16 支持 bfloat16
main 分支切换到 0.7.x 版本，0.6x 分支已 fork，用于仅添加权重的稳定版本
master -> main 分支重命名

2022年10月10日

maxxvit 系列中更多权重，包括首个基于 ConvNeXt 块的 coatnext 和 maxxvit 实验
- coatnext_nano_rw_224 - 82.0 @ 224 (G) —（使用 ConvNeXt conv 块，无 BatchNorm）
- maxxvit_rmlp_nano_rw_256 - 83.0 @ 256, 83.7 @ 320 (G) （使用 ConvNeXt conv 块，无 BN）
- maxvit_rmlp_small_rw_224 - 84.5 @ 224, 85.1 @ 320 (G)
- maxxvit_rmlp_small_rw_256 - 84.6 @ 256, 84.9 @ 288 (G) — 可能可以训练得更好，需要调整超参数（使用 ConvNeXt 块，无 BN）
- coatnet_rmlp_2_rw_224 - 84.6 @ 224, 85 @ 320 (T)
- 注意：官方 MaxVit 权重 (in1k) 已在 https://github.com/google-research/maxvit 上发布 — 由于我的实现是独立于他们的实现创建的，并且有一些小的差异，加上整个 TF 相同填充的乐趣，因此需要额外的工作来移植和适配。

2022年9月23日

LAION-2B CLIP 图像塔支持作为微调或特征提取的预训练骨干网络（无分类器）
- vit_base_patch32_224_clip_laion2b
- vit_large_patch14_224_clip_laion2b
- vit_huge_patch14_224_clip_laion2b
- vit_giant_patch14_224_clip_laion2b

2022年9月7日

Hugging Face timm 文档主页现已存在，未来请在此处查找更多内容
从 https://github.com/microsoft/unilm/tree/master/beit2 为 base 和 large 224x224 模型添加更多 BEiT-v2 权重
在 maxxvit 系列中添加更多权重，包括 pico（750 万参数，1.9 GMACs）、两个 tiny 变体
- maxvit_rmlp_pico_rw_256 - 80.5 @ 256, 81.3 @ 320 (T)
- maxvit_tiny_rw_224 - 83.5 @ 224 (G)
- maxvit_rmlp_tiny_rw_256 - 84.2 @ 256, 84.8 @ 320 (T)

2022年8月29日

默认情况下，MaxVit 窗口大小随 img_size 缩放。添加利用这一点的新的 RelPosMlp MaxViT 权重
- maxvit_rmlp_nano_rw_256 - 83.0 @ 256, 83.6 @ 320 (T)

2022年8月26日

CoAtNet (https://arxiv.org/abs/2106.04803) 和 MaxVit (https://arxiv.org/abs/2204.01697) timm 原始模型
- 两者都在 maxxvit.py 模型定义中找到，包含原始论文范围之外的众多实验
- 可以在 https://github.com/google-research/maxvit 中找到 MaxVit 作者未完成的 Tensorflow 版本
初始 CoAtNet 和 MaxVit timm 预训练权重（正在努力添加更多）
- coatnet_nano_rw_224 - 81.7 @ 224 (T)
- coatnet_rmlp_nano_rw_224 - 82.0 @ 224, 82.8 @ 320 (T)
- coatnet_0_rw_224 - 82.4 (T) — NOTE timm ‘0’ coatnets have 2 more 3rd stage blocks
- coatnet_bn_0_rw_224 - 82.4 (T)
- maxvit_nano_rw_256 - 82.9 @ 256 (T)
- coatnet_rmlp_1_rw_224 - 83.4 @ 224, 84 @ 320 (T)
- coatnet_1_rw_224 - 83.6 @ 224 (G)
- (T) = TPU 训练，使用 bits_and_tpu 分支训练代码，(G) = GPU 训练
GCVit (权重改编自 https://github.com/NVlabs/GCVit，代码 100% timm 重写，用于许可证目的)
MViT-V2 (多尺度 vit，改编自 https://github.com/facebookresearch/mvit)
EfficientFormer (改编自 https://github.com/snap-research/EfficientFormer)
PyramidVisionTransformer-V2 (改编自 https://github.com/whai362/PVT)
为 LayerNorm 和 GroupNorm 提供 ‘Fast Norm’ 支持，避免 float32 在 AMP 中向上转换 (如果 APEX LN 可用，则进一步提升性能)

2022 年 8 月 15 日

添加 ConvNeXt atto 权重
- convnext_atto - 75.7 @ 224, 77.0 @ 288
- convnext_atto_ols - 75.9 @ 224, 77.2 @ 288

2022 年 8 月 5 日

更多带有权重文件的自定义 ConvNeXt 小型模型定义
- convnext_femto - 77.5 @ 224, 78.7 @ 288
- convnext_femto_ols - 77.9 @ 224, 78.9 @ 288
- convnext_pico - 79.5 @ 224, 80.4 @ 288
- convnext_pico_ols - 79.5 @ 224, 80.5 @ 288
- convnext_nano_ols - 80.9 @ 224, 81.6 @ 288
更新 EdgeNeXt 以改进 ONNX 导出，添加新的基础变体和来自原始版本的权重 (https://github.com/mmaaz60/EdgeNeXt)

2022 年 7 月 28 日

添加全新 DeiT-III Medium (width=512, depth=12, num_heads=8) 模型权重。感谢 Hugo Touvron!

2022 年 7 月 27 日

所有运行时基准测试和验证结果 csv 文件终于更新!
添加了一些权重和模型定义
- darknetaa53 - 79.8 @ 256, 80.5 @ 288
- convnext_nano - 80.8 @ 224, 81.5 @ 288
- cs3sedarknet_l - 81.2 @ 256, 81.8 @ 288
- cs3darknet_x - 81.8 @ 256, 82.2 @ 288
- cs3sedarknet_x - 82.2 @ 256, 82.7 @ 288
- cs3edgenet_x - 82.2 @ 256, 82.7 @ 288
- cs3se_edgenet_x - 82.8 @ 256, 83.5 @ 320
以上 cs3* 权重均在 TPU 上使用 bits_and_tpu 分支训练。感谢 TRC 计划!
为 ConvNeXt 添加 output_stride=8 和 16 支持 (空洞卷积)
修复 deit3 模型无法调整 pos_emb 大小的问题
版本 0.6.7 PyPi 发布 (包含上述错误修复和自 0.6.5 以来的新权重)

2022 年 7 月 8 日

更多模型，更多修复

添加了官方研究模型 (带权重)
- 来自 (https://github.com/mmaaz60/EdgeNeXt) 的 EdgeNeXt
- 来自 (https://github.com/apple/ml-cvnets) 的 MobileViT-V2
- 来自 (https://github.com/facebookresearch/deit) 的 DeiT III (Revenge of the ViT)
我自己的模型
- 根据请求添加了小型 ResNet 定义，基本块和瓶颈块的重复次数均为 1 (resnet10 和 resnet14)
- 重构了 CspNet，使用数据类配置，简化了 CrossStage3 (cs3) 选项。这些更接近 YOLO-v5+ 主干定义。
- 更多相对位置 vit 微调。训练了两个 srelpos (共享相对位置) 模型，以及一个带有类别令牌的中型模型。
- 为 EdgeNeXt 添加了备用下采样模式，并训练了一个 small 模型。比原始小型模型更好，但不如他们新的 USI 训练权重。
我自己的模型权重结果 (全部 ImageNet-1k 训练)
- resnet10t - 66.5 @ 176, 68.3 @ 224
- resnet14t - 71.3 @ 176, 72.3 @ 224
- resnetaa50 - 80.6 @ 224 , 81.6 @ 288
- darknet53 - 80.0 @ 256, 80.5 @ 288
- cs3darknet_m - 77.0 @ 256, 77.6 @ 288
- cs3darknet_focus_m - 76.7 @ 256, 77.3 @ 288
- cs3darknet_l - 80.4 @ 256, 80.9 @ 288
- cs3darknet_focus_l - 80.3 @ 256, 80.9 @ 288
- vit_srelpos_small_patch16_224 - 81.1 @ 224, 82.1 @ 320
- vit_srelpos_medium_patch16_224 - 82.3 @ 224, 83.1 @ 320
- vit_relpos_small_patch16_cls_224 - 82.6 @ 224, 83.6 @ 320
- edgnext_small_rw - 79.6 @ 224, 80.4 @ 320
以上 cs3, darknet, 和 vit_*relpos 权重均在 TPU 上训练，感谢 TRC 计划! 其余模型在过热的 GPU 上训练。
Hugging Face Hub 支持修复已验证，演示 notebook 即将推出
预训练权重/配置可以从外部加载 (即从本地磁盘)，并支持头部适配。
添加了更改 timm 数据集/读取器扫描的图像扩展名的支持。请参阅 (https://github.com/rwightman/pytorch-image-models/pull/1274#issuecomment-1178303103)
默认 ConvNeXt LayerNorm 实现使用 F.layer_norm(x.permute(0, 2, 3, 1), ...).permute(0, 3, 1, 2)，通过所有情况下的 LayerNorm2d。
- 在某些硬件 (例如带有 CL 的 Ampere) 上比以前的自定义实现稍慢，但在更广泛的硬件/PyTorch 版本范围内，回归总体上更少。
- 以前的实现作为 LayerNormExp2d 存在于 models/layers/norm.py 中
大量错误修复
目前正在测试即将发布的 PyPi 0.6.x 版本
LeViT 更大模型的预训练仍在进行中，没有蒸馏，它们训练效果不佳/不容易训练。是时候添加蒸馏支持了 (终于)?
ImageNet-22k 权重训练 + 微调正在进行中，多权重支持的工作 (缓慢地) 进展中 (权重太多了，叹气) …

2022 年 5 月 13 日

从 (https://github.com/microsoft/Swin-Transformer) 添加了官方 Swin-V2 模型和权重。清理后支持 torchscript。
对现有 timm Swin-V2-CR 实现进行了一些重构，可能会做更多工作以使部分更接近官方版本，并决定是否合并某些方面。
更多 Vision Transformer 相对位置/残差后归一化实验 (全部在 TPU 上训练，感谢 TRC 计划)
- vit_relpos_small_patch16_224 - 81.5 @ 224, 82.5 @ 320 — 相对位置，层缩放，无类别令牌，平均池化
- vit_relpos_medium_patch16_rpn_224 - 82.3 @ 224, 83.1 @ 320 — 相对位置 + res-post-norm，无类别令牌，平均池化
- vit_relpos_medium_patch16_224 - 82.5 @ 224, 83.3 @ 320 — 相对位置，层缩放，无类别令牌，平均池化
- vit_relpos_base_patch16_gapcls_224 - 82.8 @ 224, 83.9 @ 320 — 相对位置，层缩放，类别令牌，平均池化 (错误地)
将 512 维，8 头 ‘medium’ ViT 模型变体重新带回 (在 2020 年首次 ViT 实现中，在 pre DeiT ‘small’ 模型中使用后)
为 ViT 相对位置支持添加在现有实现和官方 Swin-V2 实现中的一些新增内容之间切换的功能，以供未来试验
Sequencer2D 实现 (https://arxiv.org/abs/2205.01972)，通过作者 (https://github.com/okojoalg) 的 PR 添加

2022 年 5 月 2 日

Vision Transformer 实验添加相对位置 (Swin-V2 log-coord) (vision_transformer_relpos.py) 和残差后归一化分支 (来自 Swin-V2) (vision_transformer*.py)
- vit_relpos_base_patch32_plus_rpn_256 - 79.5 @ 256, 80.6 @ 320 — 相对位置 + 扩展宽度 + res-post-norm，无类别令牌，平均池化
- vit_relpos_base_patch16_224 - 82.5 @ 224, 83.6 @ 320 — 相对位置，层缩放，无类别令牌，平均池化
- vit_base_patch16_rpn_224 - 82.3 @ 224 — 相对位置 + res-post-norm，无类别令牌，平均池化
Vision Transformer 重构以删除表示层，该层仅在初始 vit 中使用，并且自更新的预训练 (即 How to Train Your ViT) 以来很少使用
vit_* 模型支持删除类别令牌、使用全局平均池化、使用 fc_norm (ala beit, mae)。

2022 年 4 月 22 日

timm 模型现在在 fast.ai 中获得官方支持! 正好赶上新的 Practical Deep Learning 课程。timmdocs 文档链接已更新为 timm.fast.ai。
在 TPU 训练的系列中添加了两个模型权重。一些 In22k 预训练仍在进行中。
- seresnext101d_32x8d - 83.69 @ 224, 84.35 @ 288
- seresnextaa101d_32x8d (使用 AvgPool2d 进行抗锯齿处理) - 83.85 @ 224, 84.57 @ 288

2022 年 3 月 23 日

在基本 vit 模型中添加 ParallelBlock 和 LayerScale 选项，以支持关于 ViT 人人都应该知道的三件事中的模型配置
convnext_tiny_hnf (head norm first) 权重使用 (接近) A2 配方训练，top-1 准确率 82.2%，使用更多 epoch 可以做得更好。

2022 年 3 月 21 日

合并 norm_norm_norm。重要提示: 即将到来的 0.6.x 版本的此更新可能会在一段时间内使 master 分支不稳定。如果需要稳定性，可以使用分支 0.5.x 或以前的 0.5.x 版本。
重大权重更新 (全部 TPU 训练)，如本 release 中所述
- regnety_040 - 82.3 @ 224, 82.96 @ 288
- regnety_064 - 83.0 @ 224, 83.65 @ 288
- regnety_080 - 83.17 @ 224, 83.86 @ 288
- regnetv_040 - 82.44 @ 224, 83.18 @ 288 (timm pre-act)
- regnetv_064 - 83.1 @ 224, 83.71 @ 288 (timm pre-act)
- regnetz_040 - 83.67 @ 256, 84.25 @ 320
- regnetz_040h - 83.77 @ 256, 84.5 @ 320 (头部带有额外的 fc)
- resnetv2_50d_gn - 80.8 @ 224, 81.96 @ 288 (pre-act GroupNorm)
- resnetv2_50d_evos 80.77 @ 224, 82.04 @ 288 (pre-act EvoNormS)
- regnetz_c16_evos - 81.9 @ 256, 82.64 @ 320 (EvoNormS)
- regnetz_d8_evos - 83.42 @ 256, 84.04 @ 320 (EvoNormS)
- xception41p - 82 @ 299 (timm pre-act)
- xception65 - 83.17 @ 299
- xception65p - 83.14 @ 299 (timm pre-act)
- resnext101_64x4d - 82.46 @ 224, 83.16 @ 288
- seresnext101_32x8d - 83.57 @ 224, 84.270 @ 288
- resnetrs200 - 83.85 @ 256, 84.44 @ 320
修复了 HuggingFace hub 支持，为允许预训练模型定义和权重的替代 ‘配置源’ 奠定了初步基础 (即将推出通用的本地文件/远程 url 支持)
添加了 SwinTransformer-V2 实现。由 Christoph Reich 提交。我正在进行训练实验和模型更改，因此预计会出现兼容性中断。
从 https://github.com/microsoft/Cream/tree/main/AutoFormerV2 添加了 Swin-S3 (AutoFormerV2) 模型/权重
改编自 https://github.com/apple/ml-cvnets 的 MobileViT 模型及其权重
改编自 https://github.com/sail-sg/poolformer 的 PoolFormer 模型及其权重
改编自 https://github.com/sail-sg/volo 的 VOLO 模型及其权重
在实验非 BatchNorm 归一化层 (如 EvoNorm, FilterResponseNorm, GroupNorm 等) 方面做了大量工作
增强了对多种模型 (尤其是 EfficientNet/MobileNetV3, RegNet, 和 aligned Xception) 中替代归一化+激活 (‘NormAct’) 层的支持
为 EfficientNet 系列添加了分组卷积支持
为所有模型添加 ‘group matching’ API，以允许对模型参数进行分组，以便应用 ‘逐层’ LR 衰减，学习率缩放已添加到 LR 调度器
为许多模型添加了梯度检查点支持
为所有模型添加了 forward_head(x, pre_logits=False) 函数，以允许单独调用 forward_features + forward_head
所有 vision transformer 和 vision MLP 模型更新为从 forward_features 返回非池化/非令牌选择的特征，为了与 CNN 模型保持一致，令牌选择或池化现在在 forward_head 中应用

2022 年 2 月 2 日

Chris Hughes 昨天在他的博客上发布了对 timm 的详尽运行。非常值得一读。 PyTorch Image Models (timm) 入门: 实战指南
我目前正在准备将 norm_norm_norm 分支合并回 master (ver 0.6.x)，大约在下周左右。
- 这些更改比平常更广泛，可能会破坏一些模型 API 的使用并使其不稳定 (目标是完全向后兼容)。因此，请注意 pip install git+https://github.com/rwightman/pytorch-image-models 安装!
- 0.5.x 版本和 0.5.x 分支将保持稳定，并进行一两个 cherry pick，直到尘埃落定。如果您想要稳定版本，建议暂时坚持使用 pypi 安装。

2022 年 1 月 14 日

版本 0.5.4 w/ 发布将推送到 pypi。距离上次 pypi 更新已经有一段时间了，风险更高的更改将很快合并到主分支…
从官方实现 (https://github.com/facebookresearch/ConvNeXt) 添加 ConvNeXT 模型/w 权重，一些性能调整，与 timm 功能兼容
尝试训练一些小型 (~1.8-3M 参数)/移动优化的模型，到目前为止有一些效果不错，更多模型正在路上…
- mnasnet_small - top-1 准确率 65.6
- mobilenetv2_050 - 65.9
- lcnet_100/075/050 - 72.1 / 68.8 / 63.1
- semnasnet_075 - 73
- fbnetv3_b/d/g - 79.1 / 79.7 / 82.0
由 rsomani95 添加的 TinyNet 模型
通过 MobileNetV3 架构添加的 LCNet

2023年1月5日

ConvNeXt-V2 模型和权重已添加到现有的 convnext.py 中
- 论文：ConvNeXt V2：通过掩码自动编码器共同设计和缩放 ConvNet
- 参考实现: https://github.com/facebookresearch/ConvNeXt-V2 (注意: 权重目前为 CC-BY-NC 许可)

2022年12月23日 🎄☃

添加来自 https://github.com/google-research/big_vision 的 FlexiViT 模型和权重（请查看 https://arxiv.org/abs/2212.08013 上的论文）
- 注意：目前调整大小在模型创建时是静态的，正在进行中的是在运行时动态/训练补丁大小采样
更多模型已更新为 multi-weight 并且现在可以通过 HF hub 下载（convnext、efficientnet、mobilenet、vision_transformer*、beit）
更多模型预训练标签和调整，一些模型名称已更改（正在处理弃用翻译，目前将 main 分支视为 DEV 分支，对于稳定使用，请使用 0.6.x）
更多 ImageNet-12k（22k 的子集）预训练模型涌现
- efficientnet_b5.in12k_ft_in1k - 85.9 @ 448x448
- vit_medium_patch16_gap_384.in12k_ft_in1k - 85.5 @ 384x384
- vit_medium_patch16_gap_256.in12k_ft_in1k - 84.5 @ 256x256
- convnext_nano.in12k_ft_in1k - 82.9 @ 288x288

2022年12月8日

将 ‘EVA l’ 添加到 vision_transformer.py，MAE 风格的 ViT-L/14 MIM 预训练，带有 EVA-CLIP 目标，在 ImageNet-1k 上进行 FT（对于某些模型，使用 ImageNet-22k 中间数据集）
- 原始来源：https://github.com/baaivision/EVA

model	top1	param_count	gmac	macts	hub
eva_large_patch14_336.in22k_ft_in22k_in1k	89.2	304.5	191.1	270.2	link
eva_large_patch14_336.in22k_ft_in1k	88.7	304.5	191.1	270.2	link
eva_large_patch14_196.in22k_ft_in22k_in1k	88.6	304.1	61.6	63.5	link
eva_large_patch14_196.in22k_ft_in1k	87.9	304.1	61.6	63.5	link

2022年12月6日

将 ‘EVA g’，BEiT 风格的 ViT-g/14 模型权重添加到 beit.py，包括 MIM 预训练和 CLIP 预训练。
- 原始来源：https://github.com/baaivision/EVA
- 论文：https://arxiv.org/abs/2211.07636

model	top1	param_count	gmac	macts	hub
eva_giant_patch14_560.m30m_ft_in22k_in1k	89.8	1014.4	1906.8	2577.2	link
eva_giant_patch14_336.m30m_ft_in22k_in1k	89.6	1013	620.6	550.7	link
eva_giant_patch14_336.clip_ft_in1k	89.4	1013	620.6	550.7	link
eva_giant_patch14_224.clip_ft_in1k	89.1	1012.6	267.2	192.6	link

2022年12月5日

multi-weight 支持的预发布版 (0.8.0dev0) (model_arch.pretrained_tag)。使用 pip install --pre timm 安装
- vision_transformer、maxvit、convnext 是首批支持的模型实现
- 模型名称正在随之更改（之前的 _21k 等函数将合并），仍在整理弃用处理
- 可能存在错误，但我需要反馈，所以请尝试一下
- 如果需要稳定性，请使用 0.6.x pypi 版本或从 0.6.x 分支克隆
train/validate/inference/benchmark 中添加了对 PyTorch 2.0 compile 的支持，使用 --torchcompile 参数
推理脚本允许更多地控制输出，选择 k 以获取 top-class 索引 + prob json、csv 或 parquet 输出
添加来自 LAION-2B 和原始 OpenAI CLIP 模型的一整套微调 CLIP 图像塔权重

model	top1	param_count	gmac	macts	hub
vit_huge_patch14_clip_336.laion2b_ft_in12k_in1k	88.6	632.5	391	407.5	link
vit_large_patch14_clip_336.openai_ft_in12k_in1k	88.3	304.5	191.1	270.2	link
vit_huge_patch14_clip_224.laion2b_ft_in12k_in1k	88.2	632	167.4	139.4	link
vit_large_patch14_clip_336.laion2b_ft_in12k_in1k	88.2	304.5	191.1	270.2	link
vit_large_patch14_clip_224.openai_ft_in12k_in1k	88.2	304.2	81.1	88.8	link
vit_large_patch14_clip_224.laion2b_ft_in12k_in1k	87.9	304.2	81.1	88.8	link
vit_large_patch14_clip_224.openai_ft_in1k	87.9	304.2	81.1	88.8	link
vit_large_patch14_clip_336.laion2b_ft_in1k	87.9	304.5	191.1	270.2	link
vit_huge_patch14_clip_224.laion2b_ft_in1k	87.6	632	167.4	139.4	link
vit_large_patch14_clip_224.laion2b_ft_in1k	87.3	304.2	81.1	88.8	link
vit_base_patch16_clip_384.laion2b_ft_in12k_in1k	87.2	86.9	55.5	101.6	link
vit_base_patch16_clip_384.openai_ft_in12k_in1k	87	86.9	55.5	101.6	link
vit_base_patch16_clip_384.laion2b_ft_in1k	86.6	86.9	55.5	101.6	link
vit_base_patch16_clip_384.openai_ft_in1k	86.2	86.9	55.5	101.6	link
vit_base_patch16_clip_224.laion2b_ft_in12k_in1k	86.2	86.6	17.6	23.9	link
vit_base_patch16_clip_224.openai_ft_in12k_in1k	85.9	86.6	17.6	23.9	link
vit_base_patch32_clip_448.laion2b_ft_in12k_in1k	85.8	88.3	17.9	23.9	link
vit_base_patch16_clip_224.laion2b_ft_in1k	85.5	86.6	17.6	23.9	link
vit_base_patch32_clip_384.laion2b_ft_in12k_in1k	85.4	88.3	13.1	16.5	link
vit_base_patch16_clip_224.openai_ft_in1k	85.3	86.6	17.6	23.9	link
vit_base_patch32_clip_384.openai_ft_in12k_in1k	85.2	88.3	13.1	16.5	link
vit_base_patch32_clip_224.laion2b_ft_in12k_in1k	83.3	88.2	4.4	5	link
vit_base_patch32_clip_224.laion2b_ft_in1k	82.6	88.2	4.4	5	link
vit_base_patch32_clip_224.openai_ft_in1k	81.9	88.2	4.4	5	link

从官方实现 https://github.com/google-research/maxvit 移植 MaxViT Tensorflow 权重
- 对于放大的 384/512 in21k 微调权重，存在比预期更大的下降，可能缺少细节，但 21k FT 似乎对小的预处理很敏感

model	top1	param_count	gmac	macts	hub
maxvit_xlarge_tf_512.in21k_ft_in1k	88.5	475.8	534.1	1413.2	link
maxvit_xlarge_tf_384.in21k_ft_in1k	88.3	475.3	292.8	668.8	link
maxvit_base_tf_512.in21k_ft_in1k	88.2	119.9	138	704	link
maxvit_large_tf_512.in21k_ft_in1k	88	212.3	244.8	942.2	link
maxvit_large_tf_384.in21k_ft_in1k	88	212	132.6	445.8	link
maxvit_base_tf_384.in21k_ft_in1k	87.9	119.6	73.8	332.9	link
maxvit_base_tf_512.in1k	86.6	119.9	138	704	link
maxvit_large_tf_512.in1k	86.5	212.3	244.8	942.2	link
maxvit_base_tf_384.in1k	86.3	119.6	73.8	332.9	link
maxvit_large_tf_384.in1k	86.2	212	132.6	445.8	link
maxvit_small_tf_512.in1k	86.1	69.1	67.3	383.8	link
maxvit_tiny_tf_512.in1k	85.7	31	33.5	257.6	link
maxvit_small_tf_384.in1k	85.5	69	35.9	183.6	link
maxvit_tiny_tf_384.in1k	85.1	31	17.5	123.4	link
maxvit_large_tf_224.in1k	84.9	211.8	43.7	127.4	link
maxvit_base_tf_224.in1k	84.9	119.5	24	95	link
maxvit_small_tf_224.in1k	84.4	68.9	11.7	53.2	link
maxvit_tiny_tf_224.in1k	83.4	30.9	5.6	35.8	link

2022年10月15日

训练和验证脚本增强
非 GPU（即 CPU）设备支持
train 脚本的 SLURM 兼容性
HF 数据集支持（通过 ReaderHfds）
TFDS/WDS 数据加载改进（针对分布式使用的样本填充/包装已修复，与样本计数估计相关）
脚本/加载器的 in_chans !=3 支持
Adan 优化器
可以通过 args 启用每步 LR 调度
数据集“解析器”重命名为“读取器”，更具描述性
AMP 参数已更改，通过 --amp-impl apex 使用 APEX，通过 --amp-dtype bfloat16 支持 bfloat16
main 分支切换到 0.7.x 版本，0.6x 分支已 fork，用于仅添加权重的稳定版本
master -> main 分支重命名

2022年10月10日

maxxvit 系列中更多权重，包括首个基于 ConvNeXt 块的 coatnext 和 maxxvit 实验
- coatnext_nano_rw_224 - 82.0 @ 224 (G) —（使用 ConvNeXt conv 块，无 BatchNorm）
- maxxvit_rmlp_nano_rw_256 - 83.0 @ 256, 83.7 @ 320 (G) （使用 ConvNeXt conv 块，无 BN）
- maxvit_rmlp_small_rw_224 - 84.5 @ 224, 85.1 @ 320 (G)
- maxxvit_rmlp_small_rw_256 - 84.6 @ 256, 84.9 @ 288 (G) — 可能可以训练得更好，需要调整超参数（使用 ConvNeXt 块，无 BN）
- coatnet_rmlp_2_rw_224 - 84.6 @ 224, 85 @ 320 (T)
- 注意：官方 MaxVit 权重 (in1k) 已在 https://github.com/google-research/maxvit 上发布 — 由于我的实现是独立于他们的实现创建的，并且有一些小的差异，加上整个 TF 相同填充的乐趣，因此需要额外的工作来移植和适配。

2022年9月23日

LAION-2B CLIP 图像塔支持作为微调或特征提取的预训练骨干网络（无分类器）
- vit_base_patch32_224_clip_laion2b
- vit_large_patch14_224_clip_laion2b
- vit_huge_patch14_224_clip_laion2b
- vit_giant_patch14_224_clip_laion2b

2022年9月7日

Hugging Face timm 文档主页现已存在，未来请在此处查找更多内容
从 https://github.com/microsoft/unilm/tree/master/beit2 为 base 和 large 224x224 模型添加更多 BEiT-v2 权重
在 maxxvit 系列中添加更多权重，包括 pico（750 万参数，1.9 GMACs）、两个 tiny 变体
- maxvit_rmlp_pico_rw_256 - 80.5 @ 256, 81.3 @ 320 (T)
- maxvit_tiny_rw_224 - 83.5 @ 224 (G)
- maxvit_rmlp_tiny_rw_256 - 84.2 @ 256, 84.8 @ 320 (T)

2022年8月29日

默认情况下，MaxVit 窗口大小随 img_size 缩放。添加利用这一点的新的 RelPosMlp MaxViT 权重
- maxvit_rmlp_nano_rw_256 - 83.0 @ 256, 83.6 @ 320 (T)

2022年8月26日

CoAtNet (https://arxiv.org/abs/2106.04803) 和 MaxVit (https://arxiv.org/abs/2204.01697) timm 原始模型
- 两者都在 maxxvit.py 模型定义中找到，包含原始论文范围之外的众多实验
- 可以在 https://github.com/google-research/maxvit 中找到 MaxVit 作者未完成的 Tensorflow 版本
初始 CoAtNet 和 MaxVit timm 预训练权重（正在努力添加更多）
- coatnet_nano_rw_224 - 81.7 @ 224 (T)
- coatnet_rmlp_nano_rw_224 - 82.0 @ 224, 82.8 @ 320 (T)
- coatnet_0_rw_224 - 82.4 (T) — NOTE timm ‘0’ coatnets have 2 more 3rd stage blocks
- coatnet_bn_0_rw_224 - 82.4 (T)
- maxvit_nano_rw_256 - 82.9 @ 256 (T)
- coatnet_rmlp_1_rw_224 - 83.4 @ 224, 84 @ 320 (T)
- coatnet_1_rw_224 - 83.6 @ 224 (G)
- (T) = TPU 训练，使用 bits_and_tpu 分支训练代码，(G) = GPU 训练
GCVit (权重改编自 https://github.com/NVlabs/GCVit，代码 100% timm 重写，用于许可证目的)
MViT-V2 (多尺度 vit，改编自 https://github.com/facebookresearch/mvit)
EfficientFormer (改编自 https://github.com/snap-research/EfficientFormer)
PyramidVisionTransformer-V2 (改编自 https://github.com/whai362/PVT)
为 LayerNorm 和 GroupNorm 提供 ‘Fast Norm’ 支持，避免 float32 在 AMP 中向上转换 (如果 APEX LN 可用，则进一步提升性能)

2022 年 8 月 15 日

添加 ConvNeXt atto 权重
- convnext_atto - 75.7 @ 224, 77.0 @ 288
- convnext_atto_ols - 75.9 @ 224, 77.2 @ 288

2022 年 8 月 5 日

更多带有权重文件的自定义 ConvNeXt 小型模型定义
- convnext_femto - 77.5 @ 224, 78.7 @ 288
- convnext_femto_ols - 77.9 @ 224, 78.9 @ 288
- convnext_pico - 79.5 @ 224, 80.4 @ 288
- convnext_pico_ols - 79.5 @ 224, 80.5 @ 288
- convnext_nano_ols - 80.9 @ 224, 81.6 @ 288
更新 EdgeNeXt 以改进 ONNX 导出，添加新的基础变体和来自原始版本的权重 (https://github.com/mmaaz60/EdgeNeXt)

2022 年 7 月 28 日

添加全新 DeiT-III Medium (width=512, depth=12, num_heads=8) 模型权重。感谢 Hugo Touvron!

2022 年 7 月 27 日

所有运行时基准测试和验证结果 csv 文件都是最新的!
添加了一些权重和模型定义
- darknetaa53 - 79.8 @ 256, 80.5 @ 288
- convnext_nano - 80.8 @ 224, 81.5 @ 288
- cs3sedarknet_l - 81.2 @ 256, 81.8 @ 288
- cs3darknet_x - 81.8 @ 256, 82.2 @ 288
- cs3sedarknet_x - 82.2 @ 256, 82.7 @ 288
- cs3edgenet_x - 82.2 @ 256, 82.7 @ 288
- cs3se_edgenet_x - 82.8 @ 256, 83.5 @ 320
以上 cs3* 权重均在 TPU 上使用 bits_and_tpu 分支训练。感谢 TRC 计划!
为 ConvNeXt 添加 output_stride=8 和 16 支持 (空洞卷积)
修复 deit3 模型无法调整 pos_emb 大小的问题
版本 0.6.7 PyPi 发布 (包含上述错误修复和自 0.6.5 以来的新权重)

2022 年 7 月 8 日

更多模型，更多修复

添加了官方研究模型 (带权重)
- 来自 (https://github.com/mmaaz60/EdgeNeXt) 的 EdgeNeXt
- 来自 (https://github.com/apple/ml-cvnets) 的 MobileViT-V2
- 来自 (https://github.com/facebookresearch/deit) 的 DeiT III (Revenge of the ViT)
我自己的模型
- 根据请求添加了小型 ResNet 定义，基本块和瓶颈块的重复次数均为 1 (resnet10 和 resnet14)
- 重构了 CspNet，使用数据类配置，简化了 CrossStage3 (cs3) 选项。这些更接近 YOLO-v5+ 主干定义。
- 更多相对位置 vit 微调。训练了两个 srelpos (共享相对位置) 模型，以及一个带有类别令牌的中型模型。
- 为 EdgeNeXt 添加了备用下采样模式，并训练了一个 small 模型。比原始小型模型更好，但不如他们新的 USI 训练权重。
我自己的模型权重结果 (全部 ImageNet-1k 训练)
- resnet10t - 66.5 @ 176, 68.3 @ 224
- resnet14t - 71.3 @ 176, 72.3 @ 224
- resnetaa50 - 80.6 @ 224 , 81.6 @ 288
- darknet53 - 80.0 @ 256, 80.5 @ 288
- cs3darknet_m - 77.0 @ 256, 77.6 @ 288
- cs3darknet_focus_m - 76.7 @ 256, 77.3 @ 288
- cs3darknet_l - 80.4 @ 256, 80.9 @ 288
- cs3darknet_focus_l - 80.3 @ 256, 80.9 @ 288
- vit_srelpos_small_patch16_224 - 81.1 @ 224, 82.1 @ 320
- vit_srelpos_medium_patch16_224 - 82.3 @ 224, 83.1 @ 320
- vit_relpos_small_patch16_cls_224 - 82.6 @ 224, 83.6 @ 320
- edgnext_small_rw - 79.6 @ 224, 80.4 @ 320
以上 cs3, darknet, 和 vit_*relpos 权重均在 TPU 上训练，感谢 TRC 计划! 其余模型在过热的 GPU 上训练。
Hugging Face Hub 支持修复已验证，演示 notebook 即将推出
预训练权重/配置可以从外部加载 (即从本地磁盘)，并支持头部适配。
添加了更改 timm 数据集/解析器扫描的图像扩展名的支持。请参阅 (https://github.com/rwightman/pytorch-image-models/pull/1274#issuecomment-1178303103)
默认 ConvNeXt LayerNorm 实现使用 F.layer_norm(x.permute(0, 2, 3, 1), ...).permute(0, 3, 1, 2)，通过所有情况下的 LayerNorm2d。
- 在某些硬件 (例如带有 CL 的 Ampere) 上比以前的自定义实现稍慢，但在更广泛的硬件/PyTorch 版本范围内，回归总体上更少。
- 以前的实现作为 LayerNormExp2d 存在于 models/layers/norm.py 中
大量错误修复
目前正在测试即将发布的 PyPi 0.6.x 版本
LeViT 更大模型的预训练仍在进行中，没有蒸馏，它们训练效果不佳/不容易训练。是时候添加蒸馏支持了 (终于)?
ImageNet-22k 权重训练 + 微调正在进行中，多权重支持的工作 (缓慢地) 进展中 (权重太多了，叹气) …

2022 年 5 月 13 日

从 (https://github.com/microsoft/Swin-Transformer) 添加了官方 Swin-V2 模型和权重。清理后支持 torchscript。
对现有 timm Swin-V2-CR 实现进行了一些重构，可能会做更多工作以使部分更接近官方版本，并决定是否合并某些方面。
更多 Vision Transformer 相对位置/残差后归一化实验 (全部在 TPU 上训练，感谢 TRC 计划)
- vit_relpos_small_patch16_224 - 81.5 @ 224, 82.5 @ 320 — 相对位置，层缩放，无类别令牌，平均池化
- vit_relpos_medium_patch16_rpn_224 - 82.3 @ 224, 83.1 @ 320 — 相对位置 + res-post-norm，无类别令牌，平均池化
- vit_relpos_medium_patch16_224 - 82.5 @ 224, 83.3 @ 320 — 相对位置，层缩放，无类别令牌，平均池化
- vit_relpos_base_patch16_gapcls_224 - 82.8 @ 224, 83.9 @ 320 — 相对位置，层缩放，类别令牌，平均池化 (错误地)
将 512 维，8 头 ‘medium’ ViT 模型变体重新带回 (在 2020 年首次 ViT 实现中，在 pre DeiT ‘small’ 模型中使用后)
为 ViT 相对位置支持添加在现有实现和官方 Swin-V2 实现中的一些新增内容之间切换的功能，以供未来试验
Sequencer2D 实现 (https://arxiv.org/abs/2205.01972)，通过作者 (https://github.com/okojoalg) 的 PR 添加

2022 年 5 月 2 日

Vision Transformer 实验添加相对位置 (Swin-V2 log-coord) (vision_transformer_relpos.py) 和残差后归一化分支 (来自 Swin-V2) (vision_transformer*.py)
- vit_relpos_base_patch32_plus_rpn_256 - 79.5 @ 256, 80.6 @ 320 — 相对位置 + 扩展宽度 + res-post-norm，无类别令牌，平均池化
- vit_relpos_base_patch16_224 - 82.5 @ 224, 83.6 @ 320 — 相对位置，层缩放，无类别令牌，平均池化
- vit_base_patch16_rpn_224 - 82.3 @ 224 — 相对位置 + res-post-norm，无类别令牌，平均池化
Vision Transformer 重构以删除表示层，该层仅在初始 vit 中使用，并且自更新的预训练 (即 How to Train Your ViT) 以来很少使用
vit_* 模型支持删除类别令牌、使用全局平均池化、使用 fc_norm (ala beit, mae)。

2022 年 4 月 22 日

timm 模型现在在 fast.ai 中获得官方支持! 正好赶上新的 Practical Deep Learning 课程。timmdocs 文档链接已更新为 timm.fast.ai。
在 TPU 训练的系列中添加了两个模型权重。一些 In22k 预训练仍在进行中。
- seresnext101d_32x8d - 83.69 @ 224, 84.35 @ 288
- seresnextaa101d_32x8d (使用 AvgPool2d 进行抗锯齿处理) - 83.85 @ 224, 84.57 @ 288

2022 年 3 月 23 日

在基本 vit 模型中添加 ParallelBlock 和 LayerScale 选项，以支持关于 ViT 人人都应该知道的三件事中的模型配置
convnext_tiny_hnf (head norm first) 权重使用 (接近) A2 配方训练，top-1 准确率 82.2%，使用更多 epoch 可以做得更好。

2022 年 3 月 21 日

合并 norm_norm_norm。重要提示: 即将到来的 0.6.x 版本的此更新可能会在一段时间内使 master 分支不稳定。如果需要稳定性，可以使用分支 0.5.x 或以前的 0.5.x 版本。
重大权重更新 (全部 TPU 训练)，如本 release 中所述
- regnety_040 - 82.3 @ 224, 82.96 @ 288
- regnety_064 - 83.0 @ 224, 83.65 @ 288
- regnety_080 - 83.17 @ 224, 83.86 @ 288
- regnetv_040 - 82.44 @ 224, 83.18 @ 288 (timm pre-act)
- regnetv_064 - 83.1 @ 224, 83.71 @ 288 (timm pre-act)
- regnetz_040 - 83.67 @ 256, 84.25 @ 320
- regnetz_040h - 83.77 @ 256, 84.5 @ 320 (头部带有额外的 fc)
- resnetv2_50d_gn - 80.8 @ 224, 81.96 @ 288 (pre-act GroupNorm)
- resnetv2_50d_evos 80.77 @ 224, 82.04 @ 288 (pre-act EvoNormS)
- regnetz_c16_evos - 81.9 @ 256, 82.64 @ 320 (EvoNormS)
- regnetz_d8_evos - 83.42 @ 256, 84.04 @ 320 (EvoNormS)
- xception41p - 82 @ 299 (timm pre-act)
- xception65 - 83.17 @ 299
- xception65p - 83.14 @ 299 (timm pre-act)
- resnext101_64x4d - 82.46 @ 224, 83.16 @ 288
- seresnext101_32x8d - 83.57 @ 224, 84.270 @ 288
- resnetrs200 - 83.85 @ 256, 84.44 @ 320
修复了 HuggingFace hub 支持，为允许预训练模型定义和权重的替代 ‘配置源’ 奠定了初步基础 (即将推出通用的本地文件/远程 url 支持)
添加了 SwinTransformer-V2 实现。由 Christoph Reich 提交。我正在进行训练实验和模型更改，因此预计会出现兼容性中断。
从 https://github.com/microsoft/Cream/tree/main/AutoFormerV2 添加了 Swin-S3 (AutoFormerV2) 模型/权重
改编自 https://github.com/apple/ml-cvnets 的 MobileViT 模型及其权重
改编自 https://github.com/sail-sg/poolformer 的 PoolFormer 模型及其权重
改编自 https://github.com/sail-sg/volo 的 VOLO 模型及其权重
在实验非 BatchNorm 归一化层 (如 EvoNorm, FilterResponseNorm, GroupNorm 等) 方面做了大量工作
增强了对多种模型 (尤其是 EfficientNet/MobileNetV3, RegNet, 和 aligned Xception) 中替代归一化+激活 (‘NormAct’) 层的支持
为 EfficientNet 系列添加了分组卷积支持
为所有模型添加 ‘group matching’ API，以允许对模型参数进行分组，以便应用 ‘逐层’ LR 衰减，学习率缩放已添加到 LR 调度器
为许多模型添加了梯度检查点支持
为所有模型添加了 forward_head(x, pre_logits=False) 函数，以允许单独调用 forward_features + forward_head
所有 vision transformer 和 vision MLP 模型更新为从 forward_features 返回非池化/非令牌选择的特征，为了与 CNN 模型保持一致，令牌选择或池化现在在 forward_head 中应用

2022 年 2 月 2 日

Chris Hughes 昨天在他的博客上发布了对 timm 的详尽运行。非常值得一读。 PyTorch Image Models (timm) 入门: 实战指南
我目前正在准备将 norm_norm_norm 分支合并回 master (ver 0.6.x)，大约在下周左右。
- 这些更改比平常更广泛，可能会破坏一些模型 API 的使用并使其不稳定 (目标是完全向后兼容)。因此，请注意 pip install git+https://github.com/rwightman/pytorch-image-models 安装!
- 0.5.x 版本和 0.5.x 分支将保持稳定，并进行一两个 cherry pick，直到尘埃落定。如果您想要稳定版本，建议暂时坚持使用 pypi 安装。

2022 年 1 月 14 日

版本 0.5.4 w/ 发布将推送到 pypi。距离上次 pypi 更新已经有一段时间了，风险更高的更改将很快合并到主分支…
从官方实现 (https://github.com/facebookresearch/ConvNeXt) 添加 ConvNeXT 模型/w 权重，一些性能调整，与 timm 功能兼容
尝试训练一些小型 (~1.8-3M 参数)/移动优化的模型，到目前为止有一些效果不错，更多模型正在路上…
- mnasnet_small - top-1 准确率 65.6
- mobilenetv2_050 - 65.9
- lcnet_100/075/050 - 72.1 / 68.8 / 63.1
- semnasnet_075 - 73
- fbnetv3_b/d/g - 79.1 / 79.7 / 82.0
由 rsomani95 添加的 TinyNet 模型
通过 MobileNetV3 架构添加的 LCNet

< > 在 GitHub 上更新