Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 31, Issue 2
Displaying 1-30 of 30 articles from this issue
Preface (Non Peer-Reviewed)
General Paper (Peer-Reviewed)
  • Zizheng Zhang, Masato Mita, Mamoru Komachi
    2024 Volume 31 Issue 2 Pages 328-348
    Published: 2024
    Released on J-STAGE: June 15, 2024
    JOURNAL FREE ACCESS

    Cloze tests play an essential role in language assessment and help language learners improve their skills. In this paper, we propose a novel task called Cloze Quality Estimation (CQE)—a task of evaluating whether a cloze test is of sufficient “high-quality” for language assessment based on two important factors: reliability and sufficiency. We have taken the first step by creating a new dataset named CELA for the CQE task, which includes English cloze tests and corresponding evaluations about their quality annotated by native English speakers, which includes 2,597 and 1,730 instances in aspects of reliability and sufficiency, respectively. We have tested baseline evaluation methods on the dataset, showing methods that only focused on the options would not perform well in the challenging task, especially in the aspect of reliability detection. More features such as context of questions are expected to improve the detection performance.

    Download PDF (339K)
  • Sora Tarumoto, Koki Hatagaki, Rina Miyata, Tomoyuki Kajiwara, Takashi ...
    2024 Volume 31 Issue 2 Pages 349-373
    Published: 2024
    Released on J-STAGE: June 15, 2024
    JOURNAL FREE ACCESS

    This study evaluates ChatGPT’s ability to generate Japanese on text-to-text generation tasks. ChatGPT is one of the large language models that can be adapted to a variety of natural language processing tasks in an interactive manner. While its language-generating ability has been quantitatively evaluated in a variety of tasks in English, it has not yet been fully evaluated in Japanese. This paper reports the evaluation results of ChatGPT’s ability to generate Japanese in typical text-to-text generation tasks such as machine translation, summarization, and text simplification, comparing it with conventional supervised methods. Experimental results showed that ChatGPT underperformed existing supervised models in automatic evaluation for all tasks, but tended to outperform those models in human evaluation. Our detailed analysis revealed that while ChatGPT outputs high-quality Japanese sentences in general, it fails to meet some of the detailed requirements of each task.

    Download PDF (512K)
  • Hiroyuki Deguchi, Taro Watanabe, Yusuke Matsui, Masao Utiyama, Hideki ...
    2024 Volume 31 Issue 2 Pages 374-406
    Published: 2024
    Released on J-STAGE: June 15, 2024
    JOURNAL FREE ACCESS

    k nearest neighbor machine translation (kNN-MT) (Khandelwal et al. 2021) boosts the translation quality of trained neural machine translation (NMT) models by incorporating an example search into the decoding algorithm. However, decoding is seriously time-consuming, that is, roughly 100 to 1,000 times slower than that of standard NMT, because neighbor tokens are retrieved from all the target tokens of parallel data in each timestep. In this paper, we propose “Subset kNN-MT”, which improves the decoding speed of kNN-MT using two methods: (1) retrieving neighbor target tokens from a subset that is the set of neighbor sentences of the input sentence, not from all sentences, and (2) efficient distance computation technique suitable for subset neighbor search using a look-up table. Our subset kNN-MT achieved a speed-up of up to 134.2 times and an improvement in the BLEU score of up to 1.6 compared with those of kNN-MT in the WMT’19 De-En translation task, domain adaptation tasks in De-En and En-Ja translations, and the Flores101 multilingual translation task.

    Download PDF (767K)
  • Kosuke Nishida, Naoki Yoshinaga, Kyosuke Nishida
    2024 Volume 31 Issue 2 Pages 407-432
    Published: 2024
    Released on J-STAGE: June 15, 2024
    JOURNAL FREE ACCESS

    Although named entity recognition (NER) assists in extracting domain-specific entities from text (e.g., artists in the music domain), it is expensive to create a large amount of training data or structured knowledge base to perform accurate NER in the target domain. Here, we propose a self-adaptive NER that retrieves external knowledge from unstructured text to learn the usage of entities that have not been learned well. To retrieve useful knowledge for NER, we designed an effective two-stage model that retrieved unstructured knowledge using uncertain entities as queries. Our model predicts the entities in the input and then identifies entities whose predictions are not confident. It then retrieves knowledge by using these uncertain entities as queries and concatenates the retrieved text with the original input to revise the prediction. Experiments on CrossNER datasets demonstrated that our model outperforms strong baselines using 2.35 points in the F1-metric. We confirmed that knowledge retrieval is important for the NER task and that retrieval based on prediction confidence is particularly useful when the model has long-tail entity knowledge through pre-training.

    Download PDF (441K)
  • Miyu Oba, Tatsuki Kuribayashi, Hiroki Ouchi, Taro Watanabe
    2024 Volume 31 Issue 2 Pages 433-455
    Published: 2024
    Released on J-STAGE: June 15, 2024
    JOURNAL FREE ACCESS

    The success of neural language models (LMs) has considerably increased the attention on their language acquisition. This work focuses on the second language (L2) acquisition of LMs, whereas previous study has typically explored their first language (L1) acquisition. Specifically, we trained bilingual LMs using a scenario similar to human L2 acquisition and analyzed their cross-lingual transfer from linguistic perspectives. Our exploratory experiments demonstrated that L1 pre-training accelerated their linguistic generalization in L2, and language transfer configurations (for example, L1 choice and the presence of parallel texts) substantially affected their generalizations. These clarify their (non-) humanlike L2 acquisition in particular aspects.

    Download PDF (4303K)
  • Kazuki Tani, Akihiro Tamura, Tomoyuki Kajiwara, Takashi Ninomiya, Tsun ...
    2024 Volume 31 Issue 2 Pages 456-478
    Published: 2024
    Released on J-STAGE: June 15, 2024
    JOURNAL FREE ACCESS

    We aim to construct Japanese (Ja)-to-English (En) multi-level complexity-controllable machine translation (MCMT), which is a Ja-to-En MT model controlling the output complexity at more than two levels. There is no test dataset for Ja-to-En MCMT since existing studies have focused on the English and Spanish language pair. Therefore, we construct a test dataset for Ja-to-En MCMT by using the Newsela corpus, a set of English news articles described in multiple complexity levels, and their manual translation. This study also proposes a multi-reference-based learning method for MCMT. While conventional methods use a single reference translation as a training instance, the proposed method compares multiple target language sentences with the same content at different complexity levels and trains a MCMT model to make the loss for a sentence with a target complexity level lower than that for a non-target complexity level. The evaluations on the test dataset constructed in this study show that BLEU improves by 0.94 points compared to the conventional multitask model.

    Download PDF (721K)
  • Keisuke Shirai, Hirotaka Kameko, Shinsuke Mori
    2024 Volume 31 Issue 2 Pages 479-503
    Published: 2024
    Released on J-STAGE: June 15, 2024
    JOURNAL FREE ACCESS

    Comprehension of procedural texts by machines is essential for reasoning about the steps in the texts and automating the procedures by robots. Previous work has focused on the cooking domain and proposed a recipe flow graph (r-FG) to represent an understanding of recipe texts with annotations. r-FG is defined as a directed acyclic graph with expressions related to procedures as nodes and the relationships between the nodes as edges. Previous work has proposed a framework that predicts r-FG representations in two steps: node prediction and edge prediction. While such advances have developed, the idea has only been applied to the cooking domain. This work proposes a wikiHow flow graph (w-FG) to represent an understanding of open-domain procedural texts. w-FG is compatible with r-FG, and the existing r-FG annotations in the cooking domain can be automatically converted into those in w-FG. We introduce a novel dataset called the w-FG corpus from wikiHow articles to evaluate flow graph prediction accuracy in domains other than cooking. Experimental results show that domain adaptation from the cooking to the target domain enables predictions of nodes with more than 75.0% accuracy and edges with more than 61.8%.

    Download PDF (1118K)
  • Taichi Ishiwatari, Jun Goto, Hiroaki Yamada, Takenobu Tokunaga
    2024 Volume 31 Issue 2 Pages 504-533
    Published: 2024
    Released on J-STAGE: June 15, 2024
    JOURNAL FREE ACCESS

    Interest in emotion recognition in conversations (ERC) has been increasing in various fields, because it can be used to analyze conversations conducted via social media and build emotional and empathetic dialogue systems. In ERC, some utterances with the same surface can show different emotions depending on the conversational context. A typical solution to this issue is to encode contextual information by concatenating a series of utterances and inputting them into a classifier model. In this paper, we propose a method to incorporate an external database to the classifier model. Given a target utterance, we search the training dataset for utterances that are semantically similar to the target. The retrieved utterances are used to calculate probability distribution on emotion labels. The distribution is combined with another probability distribution from the classifier model by using a weighted linear summation. Furthermore, in combining the distributions, we propose dynamic weight coefficients depending on target utterances instead of a constant coefficient. Our experimental results on three ERC datasets show that our method performs best over the baselines.

    Download PDF (2924K)
  • Masato Neishi, Naoki Yoshinaga
    2024 Volume 31 Issue 2 Pages 534-567
    Published: 2024
    Released on J-STAGE: June 15, 2024
    JOURNAL FREE ACCESS

    Recent trends in the pre-training and fine-tuning paradigm have made significant advances in several natural language processing tasks, including machine translation (MT), particularly for low-resource situations. However, it is reported that leveraging out-of-domain data is not as effective, or sometimes even harmful, in MT tasks in high-resource situations, where further improvement is still needed. In this study, we focus on domain-specific dedicated neural machine translation (NMT) models, which still have the advantage in a high-resource situation as concerns translation quality and inference cost. We revisit the in-domain pre-training of embedding layers in Transformer-based NMT models, in which the embeddings are pre-trained with the same training data as the target translation task, considering the large impact of the domain discrepancy between the pre-training and fine-tuning (or training) in MT tasks. Experiments on two translation tasks, ASPEC English-to-Japanese and WMT2017 English-to-German, demonstrate that the in-domain pre-training of embedding layers in a Transformer-based NMT model provides performance improvement without any negative impact and contributes to earlier convergence in training. Through additional experiments, we confirmed that pre-training of the embedding layer of the encoder is more important than that of the embedding layer of the decoder, and the impact does not vanish as the training data size is increased. An analysis of the embeddings revealed the large impact of the pre-training of the embedding layers on the low-frequency tokens.

    Download PDF (291K)
  • Yuki Ishii, Minoru Sasaki
    2024 Volume 31 Issue 2 Pages 568-589
    Published: 2024
    Released on J-STAGE: June 15, 2024
    JOURNAL FREE ACCESS

    In this study, we focus on automatic synonym detection, specifically targeting the senses of two input Japanese words such as the sense of word “うまい”, which means “good” or “excellent”, and the sense of word “じょうず”, which means “skillful in doing something”. The lexical knowledge obtained from the thesaurus aids in the development of natural language processing techniques. Previous studies on the acquisition of lexical knowledge in Japanese has focused on the detection of relations between words, such as synonyms, hypernyms, and antonyms. However, these studies primarily focused on detecting synonymy for words, not on the senses of words. To address this problem, we propose a synonym detection method for two Japanese words. This method uses the sentence-BERT similarity and sense definitions extracted from the Japanese dictionary . To evaluate the effectiveness of the proposed method, we conducted experiments to detect synonyms using an evaluation dataset from the Iwanami Japanese Dictionary and the Japanese thesaurus “Bunrui-Goi-Hyo”. The experimental results indicated that the proposed method effectively determines synonyms for word senses.

    Download PDF (867K)
  • Rina Miyata, Hyuga Koretaka, Hiroki Yamauchi, Daiki Yanamoto, Tomoyuki ...
    2024 Volume 31 Issue 2 Pages 590-609
    Published: 2024
    Released on J-STAGE: June 15, 2024
    JOURNAL FREE ACCESS

    In this study, we construct and release a Japanese parallel corpus for text simplification. Existing Japanese corpora for this task have been constructed by non-experts, and there is no high-quality, large-scale corpus constructed by experts. We constructed a large-scale parallel corpus at the sentence-level by manually sentence alignment on articles that had been simplified by experts. Our human evaluation revealed that parallel corpora that were simplified by experts contained a more diverse set of simplification operations than those by non-experts. We also found that our parallel corpus is fluently and adequately simplified.

    Download PDF (1172K)
  • Naoki Minamibata, Akihiro Tamura, Tsuneo Kato
    2024 Volume 31 Issue 2 Pages 610-636
    Published: 2024
    Released on J-STAGE: June 15, 2024
    JOURNAL FREE ACCESS

    In the field of neural machine translation (NMT), translation performance has been improved using named entity (NE) information. In earlier studies, two promising approaches for NE-based NMT have been proposed: a “tagging model” that inserts NE tags into sentences and an “embedding model” that incorporates NE embeddings into word embeddings. Although an embedding model improves translation performance by using target-side NE information in addition to source-side NE information, tagging models use only source-side NE information. Therefore, this study proposes a new tagging model that uses both source- and target-side NE information. Moreover, this study proposes an ensemble of our tagging model and an embedding model that generates a target language sentence based on the probabilities of averaging the output probabilities of the tagging model and those of the embedding model. Evaluations on the WMT 2014 English (En)↔German (De) translation tasks and the WMT 2020 English (En)↔Japanese (Ja) translation tasks showed that the proposed tagging model outperformed an existing tagging model (up to +0.76, +1.59, +0.96, and +0.65 BLEU for En-to-De, De-to-En, En-to-Ja, and Ja-to-En, respectively).

    Download PDF (1332K)
  • Tomoki Sugimoto, Yasumasa Onoe, Hitomi Yanaka
    2024 Volume 31 Issue 2 Pages 637-679
    Published: 2024
    Released on J-STAGE: June 15, 2024
    JOURNAL FREE ACCESS

    Temporal inference, i.e., natural language inference involving time, is a challenging task because of the complex interaction of various time-related linguistic phenomena, such as tense and aspect. Although various temporal inference datasets have been provided to assess the temporal inference ability of language models, their primary focus is on English and only on a few linguistic phenomena. Therefore, whether Japanese language models can generalize diverse temporal inference patterns is yet to be understood. In this research, we constructed a controlled Japanese temporal inference dataset considering aspect (Jamp_sp), which includes a variety of temporal inference patterns. The training and test data in Jamp_sp can be controlled based on problem attributes such as temporal inference patterns and time formats, thereby allowing a detailed analysis of the generalization capacity of the language models. To accomplish this objective, we trained the language models on the training data before and after the split, and evaluated them on our test data. The results demonstrate that Jamp_sp is a challenging dataset not only for discriminative language models but also for current generative language models, such as GPT-4, and that there is room for improvement in the generalization capacity of these models.

    Download PDF (12299K)
  • Ryota Miyano, Tomoyuki Kajiwara, Yuki Arase
    2024 Volume 31 Issue 2 Pages 680-706
    Published: 2024
    Released on J-STAGE: June 15, 2024
    JOURNAL FREE ACCESS

    In language generation, a reranking method improves the quality of generated sentences by re-scoring the top N hypotheses. The reranking method assumes the existence of higher quality hypotheses in N-best. We expand this assumption to be more practical as partly higher quality hypotheses exist in the N-best; however, they may be imperfect as the entire sentence. We propose a method for generating high-quality outputs by integrating high-quality fragments in the N-best. Specifically, we first obtain the N-best hypotheses and estimate the quality of each token. We then perform decoding again applying lexical constraints, with the words predicted to be wrong as negative constraints and those predicted to be correct as positive constraints. This method produces sentences that contain the correct words included in the N-best output and do not contain the wrong words. Empirical experiments on paraphrase generation, summarization, translation, and constrained text generation confirmed that our method outperformed strong N-best reranking methods in paraphrase generation and summarization tasks.

    Download PDF (845K)
  • Keiyu Nagafuchi, Yasutomo Kimura, Kazuma Kadowaki, Kenji Araki
    2024 Volume 31 Issue 2 Pages 707-732
    Published: 2024
    Released on J-STAGE: June 15, 2024
    JOURNAL FREE ACCESS

    In this study, we collected minutes from national and local assemblies published on the web, constructing a large corpus. In addition, we developed pre-trained language models adapted to the Japanese political domain using the constructed corpus of meeting records, incorporating several derivatives. Our models demonstrated superior and comparable performances to conventional models for tasks within the political and nonpolitical domains, respectively. In addition, we showed that increasing the number of training steps during domain adaptation with additional pre-training improves performance significantly. Furthermore, leveraging the corpus from the initial pre-training enhances performance in the adapted domain while maintaining performance in the non-adapted domains.

    Download PDF (510K)
Society Column (Non Peer-Reviewed)
Information (Non Peer-Reviewed)
feedback
Top