The research aimed to evaluate how reliably major Parts-of-Speech (PoS) (i.e., Noun,
Verb, Adjective, Adverb) can be predicted from surface word forms —more concretely,
ending character n-grams (n= 2, 3, 4) of surface word forms— of English and Czech,
to compare the results from the two languages. It was conducted with two objectives.
First, it wanted to establish the hypothesis that the degrees to which PoS is reliably
predicated from surface word forms can vary drastically among languages, though effec-
tive measurement of the predictability is unimplemented yet. (If a language has a high
degree of predictability of PoS from surface word forms, we can say is has a high form-
function transparency in PoS recognition.) Second, it wanted to show that English is
a language whose vocabulary is relatively hard to acquire, as far as a good predictabil-
ity of POS from word forms facilitates vocabulary acquisition, which is admittedly an
unconfirmed hypothesis, with other things being equal.
Results of Formal Concept Analysis (Ganter and Wille 1999) applied to the English
and Czech data suggest that ending character n-gram of English words had noticeably
less predictability than ones of Czech words in terms of major PoS, i.e., N, V, Adj and
Adv, because they are highly confusing in English. This means that vocabulary acqui-
sition can be significantly harder in English than in Czech, if other things being equal.
The results also suggest that English was one of those languages in which effective PoS
recognition requires multi-word processing strategy.
抄録全体を表示