A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers
[article]
2022
arXiv
pre-print
Self-attention based models such as vision transformers (ViTs) have emerged as a very competitive architecture alternative to convolutional neural networks (CNNs) in computer vision. Despite increasingly stronger variants with ever-higher recognition accuracies, due to the quadratic complexity of self-attention, existing ViTs are typically demanding in computation and model size. Although several successful design choices (e.g., the convolutions and hierarchical multi-stage structure) of prior
arXiv:2205.03436v2
fatcat:dybmfnlw45hepb5hv2ilxeiuam