BERTにおけるWeightとActivationの3値化の検討

加来 宗一郎; 西田 京介; 吉田 仙

doi:10.11517/pjsai.JSAI2021.0_3J4GS6c01

Abstract

Quantization techniques that approximate float values with a small number of bits have been attracting attention to reduce the model size and speed of pre-trained language models such as BERT. On the other hand, quantization of activation (input to each layer) is mostly done with 8 bits, and it is empirically known that approximation with less than 8 bits is difficult to maintain accuracy. In this study, we consider outliers in the intermediate representation of BERT to be a problem, and propose a ternarization method that can deal with outliers in the activation of each layer of the pre-trained BERT. Experimental results show that the ternarized model of weight and activation outperformed the previous method in language modeling and downstream tasks.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!