Since (Sag et al. 2002) is presented, the NLP society has been aware that one of the most crucial problems in NLP is how to cope with idiosyncratic multiword expressions, which occur in authentic sentences with unexpectedly high frequency. Here, the idiosyncrasy of expression is twofold in principle; one is idiomaticity, i.e. non-compositionality of meaning and the other is the strong probabilistic boundness of word combination. Thus, many trials to extract those expressions from corpora by using mostly statistical method have been made in NLP field. However, presumably because of the difficulty with their correct extraction without human insight, no reliable, extensive resource has yet been available. Authors recognized the crucial importance of such irregular expressions in around 1970 and started to develop a machine dictionary which contains Japanese idioms, idiom-like expressions and other multiword expressions which consist of frequently co-occurring words. In this paper, we give an overview of the first version of the dictionary, namely JDMWE (Japanese Dictionary of Multi-Word Expressions). It has about 104,000 head entries and is characterized by;
1. the wide notational, syntactic and semantic variety of contained expressions,
2. the syntactic function and structure given for each entry expression and
3. the possibility of internal modification indicated for each component word of the entry expression.
View full abstract