还在用全部token训练ViT?清华&UCLA提出token的动态稀疏化采样,降低inference时的计算量

论文链接:https://arxiv.org/abs/2106.02034
项目链接:https://github.com/raoyongming/DynamicViT

01

02
2.1 Overview

2.2 Hierarchical Token Sparsification with Prediction Modules






2.3 End-to-end Optimization with Attention Masking



2.4 Training and Inference






03
3.1 Main results

3.2 Comparisons with the-state-of-the-arts


3.3 Analysis
DynamicViT for model scaling

Visualizations


Comparisons of different sparsification strategy

04
作者介绍
研究领域:FightingCV公众号运营者,研究方向为多模态内容理解,专注于解决视觉模态和语言模态相结合的任务,促进Vision-Language模型的实地应用。
知乎/公众号:FightingCV
END,入群👇备注:TFM
赞 (0)
