A Data-scalable Transformer for Medical Image Segmentation: Architecture, Model Efficiency, and Benchmark [article]

Yunhe Gao, Mu Zhou, Di Liu, Zhennan Yan, Shaoting Zhang, Dimitris N. Metaxas
2022 arXiv   pre-print
Transformer, as a new generation of neural architecture, has demonstrated remarkable performance in natural language processing and computer vision. However, existing vision Transformers struggle to learn with limited medical data and are unable to generalize on diverse medical image tasks. To tackle these challenges, we present UTNetV2 as a data-scalable Transformer towards generalizable medical image segmentation. The key designs incorporate desirable inductive bias, hierarchical modeling
more » ... linear-complexity attention, and multi-scale feature fusion in a spatially and semantically global manner. UTNetV2 can learn across tiny- to large-scale data without pre-training. Extensive experiments demonstrate the potential of UTNetV2 as a general segmentation backbone, outperforming CNNs and vision Transformers on three public datasets with multiple modalities (e.g., CT and MRI) and diverse medical targets (e.g., healthy organ, diseased tissue, and tumor). We make the data processing, models and evaluation pipeline publicly available, offering solid baselines and unbiased comparisons for promoting a wide range of downstream clinical applications.
arXiv:2203.00131v3 fatcat:dmuh4yga4rahzjjdy4ttg7eei4