Vision Transformer によるCT画像からの腎腫瘍検出

田中 亨; 鈴木 淳晟; 亀谷 由隆; 山田 啓一; 堀田 一弘; 高橋 友一; 佐々 直人; 松川 宜久; 岩野 信吾; 山本 徳則

doi:10.11517/jsaisigtwo.2022.AIMED-012_05

Abstract

Convolutional neural networks (CNNs) have been adopted as standard deep learn- ing models in medical image analysis owing to their ability to automatically extract high-level features from training images. Recently, Vision Transformer (ViT) models have been proposed, which implement the Transformer architecture originally developed for natural language process- ing. Given their high predictive performance, we built a couple of ViT models to detect kidney cancer based on computed tomography (CT) images. Experimental results show that our ViT models outperformed conventional CNNs in terms of detection accuracy with various types of CT images. Moreover, we visualized the attention maps of our ViT models to help understand the basis for their detection output.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!