This paper compares Japanese and multilingual language models (LMs) in a Japanese pronoun reference resolution task to determine the factors of LMs that contribute to Japanese pronoun resolution. Specifically, we tackle the Japanese Winograd schema challenge task (WSC task), which is a well-known pronoun reference resolution task. The Japanese WSC task requires inter-sentential analysis, which is more challenging to solve than intra-sentential analysis. A previous study evaluated pre-trained multilingual LMs in terms of training language on the target WSC task, including Japanese. However, the study did not perform pre-trained LM-wise evaluations, focusing on the training language-wise evaluations with a multilingual WSC task. Furthermore, it did not investigate the effectiveness of factors (e.g., model size, learning settings in the pre-training phase, or multilingualism) to improve the performance. In our study, we compare the performance of inter-sentential analysis on the Japanese WSC task for several pre-trained LMs, including multilingual ones. Our results confirm that XLM, a pre-trained LM on multiple languages, performs the best among all considered LMs, which we attribute to the amount of data in the pre-training phase.