The classification of dialog acts of user’s utterance is one of the important fundamental techniques in open-domain conversational systems. Most previous studies on the classification of dialog acts were based on supervised machine learning; however, the characteristics of individual dialog acts were not considered. Some features for machine learning may increase the accuracy of classification for a particular dialog act, whereas decrease the accuracy for other dialog acts. In this study, an appropriate feature set is defined for each dialog act to improve the performance of the classification of the dialog acts. First, 28 features are proposed as an initial set. Second, for each dialog act, an optimal set of the features is identified by removing ineffective features from the initial set. Third, binary classifiers that judge whether a dialog act is suitable for a given utterance are trained using the optimized feature set. Finally, one dialog act is chosen based on the results provided by the binary classifiers. The reliability of the judgment of the binary classifiers is also considered. Results of experiments showed that our proposed method significantly outperformed a baseline that was trained using a single feature set.
This paper proposes a method of building a sentiment dictionary using only news and stock price data for textual analysis in finance. To obtain word polarity from stock price fluctuations, we calculate stock price returns following announcements of news articles. We constructed learners with support vector regression, using stock price returns as supervised labels of news articles, and built a sentiment dictionary by extracting word polarity from learners. Furthermore, we examined whether our sentiment dictionary is effective in classifying news articles as negative or positive. We found that our sentiment dictionary is also effective in classifying news articles provided by other news media other than news media we employed in constructing the algorithm. In addition, we found that it is difficult to classify news articles on a date that is two trading days away from the news announcement date.
We developed a cross-lingual recommender system using collaborative filtering with English-Japanese translation pairs of product names to help non-Japanese buyers who speak English when they are visiting Japanese shopping websites. Customer purchase histories at an English shopping site and those at another Japanese shopping site were used for the experiments. Two experiments were conducted to evaluate the system. They were (1) two-fold cross validation where half of the translation pairs were masked and (2) experiments where all of the translation pairs were used. The precisions, recalls, and mean reciprocal ranks (MRRs) of the system were evaluated to assess the general performance of the recommender system in the first set of experiments. We investigated the effect formatting the translation pairs and the performance according to the type of feature value of the vectors (binary versus rating values). In contrast, the kind of items that were recommended in a more realistic scenario were shown in the second experiment. The results reveal that masked items were found more efficiently than when the bestseller recommender system was used and, further, that items only on the Japanese site that seemed to be related to the buyers’ interests could be found by the system in the more realistic scenario.
Domain adaptation is a major challenge when machine translation is applied to practical tasks. In this study, we present domain adaptation methods for machine translation that assume multiple domains. The proposed methods combine two typesof models: a corpus-concatenated model covering multiple domains and single-domain models that are accurate but sparse in specific domains. We combine the advantages of both the models using feature augmentation for domain adaptation in machine learning; however, a conventional method of feature augmentation for machine translation uses a single model. Our experimental results show that the translation qualities of the proposed method improved or were at the same level as those of the single-domain models. The proposed method is extremely effective in low-resource domains. Even in domains having a million bilingual sentences, the translation quality was at least preserved and even improved in some domains. These results demonstrate that state-of-the-art domain adaptations can be realized with appropriate model selection and appropriate settings, even when standard log-linear models are used.
In this paper, we describe a novel method for joint word alignment and symmetrization. Based on initial parameters from simple IBM models, we synchronously parse the parallel sentence pair under the framework of bracket transduction grammar constraints. Our 2-phase method can achieve nearly the same run-time as fast_align while delivering better alignments on distantly-related language pairs such as English–Japanese. We show how to integrate this method into a standard phrase-based SMT pipeline. Although the alignment quality results are mixed, by forcing all words to be aligned (1-to-many/many-to-1), our method significantly reduces the phrase table size with no difference in translation quality and even outperforms fast_align in some end-to-end translation experiments.