论文总结 | CVPR2021视觉Transformer优秀论文|项目大盘点
CVPR2021-视觉Transformer文章
Paper: https://arxiv.org/abs/2012.09760 Code: https://github.com/microsoft/MeshTransformer

2. Temporal-Relational CrossTransformers for Few-Shot Action Recognition
Paper: https://arxiv.org/abs/2101.06184 Code: https://github.com/tobyperrett/trx
3. Kaleido-BERT:Vision-Language Pre-training on Fashion Domain
Paper: https://arxiv.org/abs/2103.16110 Code: https://github.com/mczhuge/Kaleido-BERT
4. HOTR: End-to-End Human-Object Interaction Detection with Transformers
Paper: https://arxiv.org/abs/2104.13682
5. Multi-Modal Fusion Transformer for End-to-End Autonomous Driving
Paper: https://arxiv.org/abs/2104.09224 Code: https://github.com/autonomousvision/transfuser
6. Pose Recognition with Cascade Transformers
Paper: https://arxiv.org/abs/2104.06976
Code: https://github.com/mlpc-ucsd/PRTR

7. Variational Transformer Networks for Layout Generation
Paper: https://arxiv.org/abs/2104.02416
8. LoFTR: Detector-Free Local Feature Matching with Transformers
Homepage: https://zju3dv.github.io/loftr/
Paper: https://arxiv.org/abs/2104.00680
Code: https://github.com/zju3dv/LoFTR
9. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
Paper: https://arxiv.org/abs/2012.15840
Code: https://github.com/fudan-zvg/SETR

10. Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers
Paper: https://arxiv.org/abs/2103.16553
11. Transformer Tracking
Paper: https://arxiv.org/abs/2103.15436
Code: https://github.com/chenxin-dlut/TransT

12. HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers
Paper(Oral): None Code: https://github.com/dingmyu/HR-NAS
13. MIST: Multiple Instance Spatial Transformer
Paper: https://arxiv.org/abs/1811.10725 Code: None
14. Multimodal Motion Prediction with Stacked Transformers
Paper: https://arxiv.org/abs/2103.11624 Code: https://decisionforce.github.io/mmTransformer
15. Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning
Paper:https://www.amazon.science/publications/
Code: https://github.com/amzn/image-to-recipe-transformers
16. Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking
Paper(Oral): https://arxiv.org/abs/2103.11681
Code: https://github.com/594422814/TransformerTrack

17. Pre-Trained Image Processing Transformer
Paper: https://arxiv.org/abs/2012.00364
18. End-to-End Video Instance Segmentation with Transformers
Paper(Oral): https://arxiv.org/abs/2011.14503
Code: https://github.com/Epiphqny/VisTR

19. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
Paper(Oral): https://arxiv.org/abs/2011.09094
Code: https://github.com/dddzg/up-detr
20. End-to-End Human Object Interaction Detection with HOI Transformer
Paper: https://arxiv.org/abs/2103.04503
Code: https://github.com/bbepoch/HoiTransformer
21. Transformer Interpretability Beyond Attention Visualization
Paper: https://arxiv.org/abs/2012.09838
Code: https://github.com/hila-chefer/Transformer-Explainability

22. Line Segment Detection Using Transformers without Edges
23. SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
Paper(Oral): https://arxiv.org/abs/2101.08833
Code: https://github.com/dukebw/SSTVOS

24. Facial Action Unit Detection With Transformers
25. Topological Planning With Transformers for Vision-and-Language Navigation
Paper: https://arxiv.org/abs/2012.05292
26. Adaptive Image Transformer for One-Shot Object Detection
27. Taming Transformers for High-Resolution Image Synthesis
Homepage: https://compvis.github.io/taming-transformers/
Paper(Oral): https://arxiv.org/abs/2012.09841
Code: https://github.com/CompVis/taming-transformers

28. Self-Supervised Video Hashing via Bidirectional Transformers
29. Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos
Paper(Oral): https://hehefan.github.io/pdfs/p4transformer.pdf
30. General Multi-Label Image Classification With Transformers
Paper: https://arxiv.org/abs/2011.14027
31. Bottleneck Transformers for Visual Recognition
Paper: https://arxiv.org/abs/2101.11605

32. VLN BERT: A Recurrent Vision-and-Language BERT for Navigation
Paper(Oral): https://arxiv.org/abs/2011.13922
Code: https://github.com/YicongHong/Recurrent-VLN-BERT
33. Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
Paper(Oral): https://arxiv.org/abs/2102.06183
Code: https://github.com/jayleicn/ClipBERT
34. Self-attention based Text Knowledge Mining for Text Detection
Code: https://github.com/CVI-SZU/STKM
35. SSAN: Separable Self-Attention Network for Video Representation Learning
Paper(Oral): https://arxiv.org/abs/2103.12731
CVPR2021文章|代码合集:
https://github.com/statisticszhang/CVPR2021-Papers-with-Code
点击阅读原文,直达GitHub项目,海量深度学习论文|项目~~
关注AI研习图书馆,发现不一样的精彩世界
