Rapid training of Perceiver in a low-cost computing environment

Kaiyu SUZUKI; Tomofumi MATSUZAWA

doi:10.11517/pjsai.JSAI2023.0_2U4IS2c05

抄録

Perceiver is a deep learning model that can be applied to a variety of modalities. It can simultaneously process various forms of input and output, such as images, speech, and natural language using the same architecture. However, Perceiver is computationally more expensive than other models. Therefore, training the model in environments with relatively limited fast parallel computational resources is relatively difficult. In this study, we aimed to reduce the computational cost such that learning can be performed in a short time in environments other than large-scale computing systems. To this end, we first show that a speed-up method proposed for Transformer is also effective for Perceiver. In particular, the gated attention unit proposed for FLASH reduces computational complexity without sacrificing accuracy. The proposed acceleration method can achieve accuracy comparable to that of the original model in a limited computing environment. As an introductory example, we conducted experiments using the ImageNet image recognition task and demonstrated that the proposed method can reduce the training time compared to conventional methods without a significant loss of accuracy. This model can be used to input and output any kind of data quickly in a low-cost computing environment.

著者関連情報

お気に入り & アラート

閲覧履歴

責任著者(Corresponding author)

会議情報

J-STAGEへの登録はこちら（無料）