In this paper, we apply the learning under covariate shift to the problem of unsupervised domain adaptation for word sense disambiguation (WSD). This learning is a type of weighted learning method, in which the probability density ratio
w(
x) =
PT(
x)/
PS(
x) is used as the weight of an instance. However,
w(
x) tends to be small in WSD tasks. In order to address this problem, we calculate
w(
x) by estimating
PT(
x) and
PS(
x), where
PS(
x) is estimating by regarding the corpus combining the source domain corpus and target domain corpus as the source domain corpus. In the experiment, we use three domains -OC (Yahoo! Chiebukuro), PB (books) and PN (news papers)- in BCCWJ, and 16 target words provided by the Japanese WSD task in SemEval-2. For calculating
w(
x), we also use uLSIF, which directly estimates
w(
x) without estimating
PT(
x) or
PS(
x). Moreover, we use the “
p power” method and the “relative probability density ratio” method to boost the obtained probability density ratio. These experiments prove our method to be effective.
View full abstract