论文总结 | CVPR2021视觉Transformer优秀论文|项目大盘点

前言
2021年,新年伊始,Visual Transformer的研究热点达到了前所未有的顶峰,经久不衰,在视觉顶会CVPR2021上,视觉Transformer论文高达40+篇。
本篇文章是目前热度最高的视觉Transformer论文大盘点,更多CVPR2021论文和开源代码,可见往期文章盘点:

论文总结 | CVPR2021语义分割优秀论文大盘点

论文总结 | CVPR2021目标检测论文大盘点(65篇优秀论文)

CVPR2021-视觉Transformer文章

36篇Vision Transformer论文,研究方向涉及:图像分类、目标检测、实例分割、语义分割、行为识别、自动驾驶、关键点匹配、目标跟踪、NAS、low-level视觉、HoI、可解释性、布局生成、检索、文本检测等。
1. End-to-End Human Pose and Mesh Reconstruction with Transformers
  • Paper: https://arxiv.org/abs/2012.09760
  • Code: https://github.com/microsoft/MeshTransformer

2. Temporal-Relational CrossTransformers for Few-Shot Action Recognition

  • Paper: https://arxiv.org/abs/2101.06184
  • Code: https://github.com/tobyperrett/trx

3. Kaleido-BERT:Vision-Language Pre-training on Fashion Domain

  • Paper: https://arxiv.org/abs/2103.16110
  • Code: https://github.com/mczhuge/Kaleido-BERT

4. HOTR: End-to-End Human-Object Interaction Detection with Transformers

  • Paper: https://arxiv.org/abs/2104.13682

5. Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

  • Paper: https://arxiv.org/abs/2104.09224
  • Code: https://github.com/autonomousvision/transfuser

6. Pose Recognition with Cascade Transformers

  • Paper: https://arxiv.org/abs/2104.06976

  • Code: https://github.com/mlpc-ucsd/PRTR

7. Variational Transformer Networks for Layout Generation

  • Paper: https://arxiv.org/abs/2104.02416

8. LoFTR: Detector-Free Local Feature Matching with Transformers

  • Homepage: https://zju3dv.github.io/loftr/

  • Paper: https://arxiv.org/abs/2104.00680

  • Code: https://github.com/zju3dv/LoFTR

9. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

  • Paper: https://arxiv.org/abs/2012.15840

  • Code: https://github.com/fudan-zvg/SETR

10. Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers

  • Paper: https://arxiv.org/abs/2103.16553

11. Transformer Tracking

  • Paper: https://arxiv.org/abs/2103.15436

  • Code: https://github.com/chenxin-dlut/TransT

12. HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers

  • Paper(Oral): None
  • Code: https://github.com/dingmyu/HR-NAS

13. MIST: Multiple Instance Spatial Transformer

  • Paper: https://arxiv.org/abs/1811.10725
  • Code: None

14. Multimodal Motion Prediction with Stacked Transformers

  • Paper: https://arxiv.org/abs/2103.11624
  • Code: https://decisionforce.github.io/mmTransformer

15. Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning

  • Paper:https://www.amazon.science/publications/

  • Code: https://github.com/amzn/image-to-recipe-transformers

16. Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

  • Paper(Oral): https://arxiv.org/abs/2103.11681

  • Code: https://github.com/594422814/TransformerTrack

17. Pre-Trained Image Processing Transformer

  • Paper:  https://arxiv.org/abs/2012.00364

18. End-to-End Video Instance Segmentation with Transformers

  • Paper(Oral): https://arxiv.org/abs/2011.14503

  • Code: https://github.com/Epiphqny/VisTR

19. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

  • Paper(Oral): https://arxiv.org/abs/2011.09094

  • Code: https://github.com/dddzg/up-detr

20. End-to-End Human Object Interaction Detection with HOI Transformer

  • Paper: https://arxiv.org/abs/2103.04503

  • Code: https://github.com/bbepoch/HoiTransformer

21. Transformer Interpretability Beyond Attention Visualization

  • Paper: https://arxiv.org/abs/2012.09838

  • Code: https://github.com/hila-chefer/Transformer-Explainability

22. Line Segment Detection Using Transformers without Edges

23. SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation

  • Paper(Oral): https://arxiv.org/abs/2101.08833

  • Code: https://github.com/dukebw/SSTVOS

24. Facial Action Unit Detection With Transformers

25. Topological Planning With Transformers for Vision-and-Language Navigation

  • Paper: https://arxiv.org/abs/2012.05292

26. Adaptive Image Transformer for One-Shot Object Detection

27. Taming Transformers for High-Resolution Image Synthesis

  • Homepage: https://compvis.github.io/taming-transformers/

  • Paper(Oral): https://arxiv.org/abs/2012.09841

  • Code: https://github.com/CompVis/taming-transformers

28. Self-Supervised Video Hashing via Bidirectional Transformers

29. Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos

  • Paper(Oral): https://hehefan.github.io/pdfs/p4transformer.pdf

30. General Multi-Label Image Classification With Transformers

  • Paper: https://arxiv.org/abs/2011.14027

31. Bottleneck Transformers for Visual Recognition

  • Paper: https://arxiv.org/abs/2101.11605

32. VLN BERT: A Recurrent Vision-and-Language BERT for Navigation

  • Paper(Oral): https://arxiv.org/abs/2011.13922

  • Code: https://github.com/YicongHong/Recurrent-VLN-BERT

33. Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling

  • Paper(Oral): https://arxiv.org/abs/2102.06183

  • Code: https://github.com/jayleicn/ClipBERT

34. Self-attention based Text Knowledge Mining for Text Detection

  • Code: https://github.com/CVI-SZU/STKM

35. SSAN: Separable Self-Attention Network for Video Representation Learning

36. Scaling Local Self-Attention For Parameter Efficient Visual Backbones
  • Paper(Oral): https://arxiv.org/abs/2103.12731

CVPR2021文章|代码合集:

https://github.com/statisticszhang/CVPR2021-Papers-with-Code

点击阅读原文,直达GitHub项目,海量深度学习论文|项目~~

   点赞是一种鼓励 分享是一种动力

关注AI研习图书馆,发现不一样的精彩世界

(0)

相关推荐