StudyTime ManagementModerate

PVT v2: Improved baselines with pyramid vision transformer

Read full paper →
Authors
Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lü, Ping Luo, Ling Shao
Journal
Computational Visual Media
Year
2022
Citations
2,132

Abstract

Transformers have recently lead to encouraging progress in computer vision. In this work, we present new baselines by improving the original Pyramid Vision Transformer (PVT v1) by adding three designs: (i) a linear complexity attention layer, (ii) an overlapping patch embedding, and (iii) a convolutional feed-forward network. With these modifications, PVT v2 reduces the computational complexity of PVT v1 to linearity and provides significant improvements on fundamental vision tasks such as classification, detection, and segmentation. In particular, PVT v2 achieves comparable or better performance than recent work such as the Swin transformer. We hope this work will facilitate state-of-the-art transformer research in computer vision. Code is available at https://github.com/whai362/PVT .

Test it on yourself

Run a structured time management experiment

The research gives you a prior. Your own data tells you what actually works for you.

PVT v2: Improved baselines with pyramid vision transformer | Steady Practice | SteadyPractice