詳細検索結果
以下の条件での結果を表示する: 検索条件を変更
クエリ検索: "10th ANNIVERSARY BEST"
8件中 1-8の結果を表示しています
  • Gongye Jin, Daisuke Kawahara, Sadao Kurohashi
    自然言語処理
    2014年 21 巻 6 号 1163-1182
    発行日: 2014/12/15
    公開日: 2015/03/15
    ジャーナル フリー
    Many knowledge acquisition tasks are tightly dependent on fundamental analysis technologies, such as part of speech (POS) tagging and parsing. Dependency parsing, in particular, has been widely employed for the acquisition of knowledge related to predicate-argument structures. For such tasks, the dependency parsing performance can determine quality of acquired knowledge, regardless of target languages. Therefore, reducing dependency parsing errors and selecting high quality dependencies is of primary importance. In this study, we present a language-independent approach for automatically selecting high quality dependencies from automatic parses. By considering several aspects that affect the accuracy of dependency parsing, we created a set of features for supervised classification of reliable dependencies. Experimental results on seven languages show that our approach can effectively select high quality dependencies from dependency parses.
  • Weiqi Gu, Haiyue Song, Chenhui Chu, Sadao Kurohashi
    Journal of Information Processing
    2023年 31 巻 299-307
    発行日: 2023年
    公開日: 2023/05/15
    ジャーナル フリー

    Video-guided machine translation, as one type of multimodal machine translation, aims to engage video contents as auxiliary information to address the word sense ambiguity problem in machine translation. Previous studies only use features from pre-trained action detection models as motion representations of the video to solve the verb sense ambiguity and neglect the noun sense ambiguity problem. To address this, we propose a video-guided machine translation system using both spatial and motion representations. For the spatial part, we propose a hierarchical attention network to model the spatial information from object-level to video-level. We investigate and discuss spatial features extracted from objects with pre-trained convolutional neural network models and spatial concept features extracted from object labels and attributes with pre-trained language models. We further investigate spatial feature filtering by referring to corresponding source sentences. Experiments on the VATEX dataset show that our system achieves a 35.86 BLEU-4 score, which is 0.51 score higher than the single model of the SOTA method. Experiments on the How2 dataset further verify the generalization ability of our proposed system.

  • Chenhui Chu, Raj Dabre, Sadao Kurohashi
    Journal of Information Processing
    2018年 26 巻 529-538
    発行日: 2018年
    公開日: 2018/07/15
    ジャーナル フリー

    Neural machine translation (NMT) has shown very promising results when there are large amounts of parallel corpora. However, for low resource domains, vanilla NMT cannot give satisfactory performance due to overfitting on the small size of parallel corpora. Two categories of domain adaptation approaches have been proposed for low resource NMT, i.e., adaptation using out-of-domain parallel corpora and in-domain monolingual corpora. In this paper, we conduct a comprehensive empirical comparison of the methods in both categories. For domain adaptation using out-of-domain parallel corpora, we further propose a novel domain adaptation method named mixed fine tuning, which combines two existing methods namely fine tuning and multi domain NMT. For domain adaptation using in-domain monolingual corpora, we compare two existing methods namely language model fusion and synthetic data generation. In addition, we propose a method that combines these two categories. We empirically compare all the methods and discuss their benefits and shortcomings. To the best of our knowledge, this is the first work on a comprehensive empirical comparison of domain adaptation methods for NMT.

  • Chenhui Chu, Yu Shen, Fabien Cromieres, Sadao Kurohashi
    自然言語処理
    2017年 24 巻 2 号 267-296
    発行日: 2017/03/15
    公開日: 2017/06/15
    ジャーナル フリー

    Ideally, tree-to-tree machine translation (MT) that utilizes syntactic parse trees on both source and target sides could preserve non-local structure, and thus generate fluent and accurate translations. In practice, however, firstly, high quality parsers for both source and target languages are difficult to obtain; secondly, even if we have high quality parsers on both sides, they still can be non-isomorphic because of the annotation criterion difference between the two languages. The lack of isomorphism between the parse trees makes it difficult to extract translation rules. This extremely limits the performance of tree-to-tree MT. In this article, we present an approach that projects dependency parse trees from the language side that has a high quality parser, to the side that has a low quality parser, to improve the isomorphism of the parse trees. We first project a part of the dependencies with high confidence to make a partial parse tree, and then complement the remaining dependencies with partial parsing constrained by the already projected dependencies. Experiments conducted on the Japanese-Chinese and English-Chinese language pairs show that our proposed method significantly improves the performance on both the two language pairs.

  • Chenhui Chu, Yu Shen, Fabien Cromieresy, Sadao Kurohashi
    Information and Media Technologies
    2017年 12 巻 172-201
    発行日: 2017年
    公開日: 2017/09/15
    ジャーナル フリー

    Ideally, tree-to-tree machine translation (MT) that utilizes syntactic parse trees onboth source and target sides could preserve non-local structure, and thus generatefluent and accurate translations. In practice, however, firstly, high quality parsers forboth source and target languages are difficult to obtain; secondly, even if we havehigh quality parsers on both sides, they still can be non-isomorphic because of theannotation criterion difference between the two languages. The lack of isomorphismbetween the parse trees makes it difficult to extract translation rules. This extremelylimits the performance of tree-to-tree MT. In this article, we present an approachthat projects dependency parse trees from the language side that has a high qualityparser, to the side that has a low quality parser, to improve the isomorphism of theparse trees. We first project a part of the dependencies with high confidence to makea partial parse tree, and then complement the remaining dependencies with partialparsing constrained by the already projected dependencies. Experiments conductedon the Japanese-Chinese and English-Chinese language pairs show that our proposedmethod significantly improves the performance on both the two language pairs.

  • Mo Shen, Daisuke Kawahara, Sadao Kurohashi
    自然言語処理
    2016年 23 巻 3 号 235-266
    発行日: 2016/06/15
    公開日: 2016/09/15
    ジャーナル フリー

    Chinese word segmentation is an initial and important step in Chinese language processing. Recent advances in machine learning techniques have boosted the performance of Chinese word segmentation systems, yet the identification of out-of-vocabulary words is still a major problem in this field of study. Recent research has attempted to address this problem by exploiting characteristics of frequent substrings in unlabeled data. We propose a simple yet effective approach for extracting a specific type of frequent substrings, called maximized substrings, which provide good estimations of unknown word boundaries. In the task of Chinese word segmentation, we use these substrings which are extracted from large scale unlabeled data to improve the segmentation accuracy. The effectiveness of this approach is demonstrated through experiments using various data sets from different domains. In the task of unknown word extraction, we apply post-processing techniques that effectively reduce the noise in the extracted substrings. We demonstrate the effectiveness and efficiency of our approach by comparing the results with a widely applied Chinese word recognition method in a previous study.

  • Chenhui Chu, Toshiaki Nakazawa, Sadao Kurohashi
    自然言語処理
    2015年 22 巻 3 号 139-170
    発行日: 2015/06/16
    公開日: 2015/12/14
    ジャーナル フリー
    Parallel corpora are crucial for statistical machine translation (SMT); however, they are quite scarce for most language pairs and domains. As comparable corpora are far more available, many studies have been conducted to extract parallel sentences from them for SMT. Parallel sentence extraction relies highly on bilingual lexicons that are also very scarce. We propose an unsupervised bilingual lexicon extraction based parallel sentence extraction system that first extracts bilingual lexicons from comparable corpora and then extracts parallel sentences using the lexicons. Our bilingual lexicon extraction method is based on a combination of topic model and context based methods in an iterative process. The proposed method does not rely on any prior knowledge, and the performance can be improved iteratively. The parallel sentence extraction method uses a binary classifier for parallel sentence identification. The extracted bilingual lexicons are used for the classifier to improve the performance of parallel sentence extraction. Experiments conducted with the Wikipedia data indicate that the proposed bilingual lexicon extraction method greatly outperforms existing methods, and the extracted bilingual lexicons significantly improve the performance of parallel sentence extraction for SMT.
  • Mo Shen, Daisuke Kawahara, Sadao Kurohashi
    Information and Media Technologies
    2016年 11 巻 181-212
    発行日: 2016年
    公開日: 2016/12/15
    ジャーナル フリー

    Chinese word segmentation is an initial and important step in Chinese language processing. Recent advances in machine learning techniques have boosted the performance of Chinese word segmentation systems, yet the identification of out-of-vocabulary words is still a major problem in this field of study. Recent research has attempted to address this problem by exploiting characteristics of frequent substrings in unlabeled data. We propose a simple yet effective approach for extracting a specific type of frequent substrings, called maximized substrings, which provide good estimations of unknown word boundaries. In the task of Chinese word segmentation, we use these substrings which are extracted from large scale unlabeled data to improve the segmentation accuracy. The effectiveness of this approach is demonstrated through experiments using various data sets from different domains. In the task of unknown word extraction, we apply post-processing techniques that effectively reduce the noise in the extracted substrings. We demonstrate the effectiveness and efficiency of our approach by comparing the results with a widely applied Chinese word recognition method in a previous study.

feedback
Top