IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
最新号
選択された号の論文の22件中1~22を表示しています
Special Section on Information and Communication Technology to Support Hyperconnectivity
  • Tokuro MATSUO
    2024 年 E107.D 巻 4 号 p. 424-425
    発行日: 2024/04/01
    公開日: 2024/04/01
    ジャーナル フリー
  • Yihan DONG, Shiyao DING, Takayuki ITO
    原稿種別: PAPER
    2024 年 E107.D 巻 4 号 p. 426-433
    発行日: 2024/04/01
    公開日: 2024/04/01
    ジャーナル フリー

    This paper presents the design and implementation of an automated multi-phase facilitation agent based on LLM to realize inclusive facilitation and efficient use of a large language model (LLM) to facilitate realistic discussions. Large-scale discussion support systems have been studied and implemented very widely since they enable a lot of people to discuss remotely and within 24 hours and 7 days. Furthermore, automated facilitation artificial intelligence (AI) agents have been realized since they can efficiently facilitate large-scale discussions. For example, D-Agree is a large-scale discussion support system where an automated facilitation AI agent facilitates discussion among people. Since the current automated facilitation agent was designed following the structure of the issue-based information system (IBIS) and the IBIS-based agent has been proven that it has superior performance. However, there are several problems that need to be addressed with the IBIS-based agent. In this paper, we focus on the following three problems: 1) The IBIS-based agent was designed to only promote other participants' posts by replying to existing posts accordingly, lacking the consideration of different behaviours taken by participants with diverse characteristics, leading to a result that sometimes the discussion is not sufficient. 2) The facilitation messages generated by the IBIS-based agent were not natural enough, leading to consequences that the participants were not sufficiently promoted and the participants did not follow the flow to discuss a topic. 3) Since the IBIS-based agent is not combined with LLM, designing the control of LLM is necessary. Thus, to solve the problems mentioned above, the design of a phase-based facilitation framework is proposed in this paper. Specifically, we propose two significant designs: One is the design for a multi-phase facilitation agent created based on the framework to address problem 1); The other one is the design for the combination with LLM to address problem 2) and 3). Particularly, the language model called “GPT-3.5” is used for the combination by using corresponding APIs from OPENAI. Furthermore, we demonstrate the improvement of our facilitation agent framework by presenting the evaluations and a case study. Besides, we present the difference between our framework and LangChain which has generic features to utilize LLMs.

  • Sofia SAHAB, Jawad HAQBEEN, Takayuki ITO
    原稿種別: PAPER
    2024 年 E107.D 巻 4 号 p. 434-442
    発行日: 2024/04/01
    公開日: 2024/04/01
    ジャーナル フリー

    Despite the increasing use of conversational artificial intelligence (AI) in online discussion environments, few studies explore the application of AI as a facilitator in forming problem-solving debates and influencing opinions in cross-venue scenarios, particularly in diverse and war-ravaged countries. This study aims to investigate the impact of AI on enhancing participant engagement and collaborative problem-solving in online-mediated discussion environments, especially in diverse and heterogeneous discussion settings, such as the five cities in Afghanistan. We seek to assess the extent to which AI participation in online conversations succeeds by examining the depth of discussions and participants' contributions, comparing discussions facilitated by AI with those not facilitated by AI across different venues. The results are discussed with respect to forming and changing opinions with and without AI-mediated communication. The findings indicate that the number of opinions generated in AI-facilitated discussions significantly differs from discussions without AI support. Additionally, statistical analyses reveal quantitative disparities in online discourse sentiments when conversational AI is present compared to when it is absent. These findings contribute to a better understanding of the role of AI-mediated discussions and offer several practical and social implications, paving the way for future developments and improvements.

  • Yoshiko ARIMA
    原稿種別: PAPER
    2024 年 E107.D 巻 4 号 p. 443-450
    発行日: 2024/04/01
    公開日: 2024/04/01
    ジャーナル フリー

    Both group process studies and collective intelligence studies are concerned with “which of the crowds and the best members perform better.” This can be seen as a matter of democracy versus dictatorship. Having evidence of the growth potential of crowds and experts can be useful in making correct predictions and can benefit humanity. In the collective intelligence experimental paradigm, experts' or best members ability is compared with the accuracy of the crowd average. In this research (n =620), using repeated trials of simple tasks, we compare the correct answer of a class average (index of collective intelligence) and the best member (the one whose answer was closest to the correct answer). The results indicated that, for the cognition task, collective intelligence improved to the level of the best member through repeated trials without feedback; however, it depended on the ability of the best members for the prediction task. The present study suggested that best members' superiority over crowds for the prediction task on the premise of being free from social influence. However, machine learning results suggests that the best members among us cannot be easily found beforehand because they appear through repeated trials.

  • Shiyao DING, Takayuki ITO
    原稿種別: PAPER
    2024 年 E107.D 巻 4 号 p. 451-458
    発行日: 2024/04/01
    公開日: 2024/04/01
    ジャーナル フリー

    Despite recent advancements in utilizing meta-learning for addressing the generalization challenges of graph neural networks (GNN), their performance in argumentation mining tasks, such as argument classifications, remains relatively limited. This is primarily due to the under-utilization of potential pattern knowledge intrinsic to argumentation structures. To address this issue, our study proposes a two-stage, pattern-based meta-GNN method in contrast to conventional pattern-free meta-GNN approaches. Initially, our method focuses on learning a high-level pattern representation to effectively capture the pattern knowledge within an argumentation structure and then predicts edge types. It then utilizes a meta-learning framework in the second stage, designed to train a meta-learner based on the predicted edge types. This feature allows for rapid generalization to novel argumentation graphs. Through experiments on real English discussion datasets spanning diverse topics, our results demonstrate that our proposed method substantially outperforms conventional pattern-free GNN approaches, signifying a significant stride forward in this domain.

  • Muhammad FAWAD RAHIM, Tessai HAYAMA
    原稿種別: PAPER
    2024 年 E107.D 巻 4 号 p. 459-467
    発行日: 2024/04/01
    公開日: 2024/04/01
    ジャーナル フリー

    In recent years, location-based technologies for ubiquitous environments have aimed to realize services tailored to each purpose based on information about an individual's current location. To establish such advanced location-based services, an estimation technology that can accurately recognize and predict the movements of people and objects is necessary. Although global positioning system (GPS) has already been used as a standard for outdoor positioning technology and many services have been realized, several techniques using conventional wireless sensors such as Wi-Fi, RFID, and Bluetooth have been considered for indoor positioning technology. However, conventional wireless indoor positioning is prone to the effects of noise, and the large range of estimated indoor locations makes it difficult to identify human activities precisely. We propose a method to mine user activity patterns from time-series data of user's locationss in an indoor environment using ultra-wideband (UWB) sensors. An UWB sensor is useful for indoor positioning due to its high noise immunity and measurement accuracy, however, to our knowledge, estimation and prediction of human indoor activities using UWB sensors have not yet been addressed. The proposed method consists of three steps: 1) obtaining time-series data of the user's location using a UWB sensor attached to the user, and then estimating the areas where the user has stayed; 2) associating each area of the user's stay with a nearby landmark of activity and assigning indoor activities; and 3) mining the user's activity patterns based on the user's indoor activities and their transitions. We conducted experiments to evaluate the proposed method by investigating the accuracy of estimating the user's area of stay using a UWB sensor and observing the results of activity pattern mining applied to actual laboratory members over 30-days. The results showed that the proposed method is superior to a comparison method, Time-based clustering algorithm, in estimating the stay areas precisely, and that it is possible to reveal the user's activity patterns appropriately in the actual environment.

  • Li HE, Jingxuan ZHAO, Jianyong DUAN, Hao WANG, Xin LI
    原稿種別: PAPER
    2024 年 E107.D 巻 4 号 p. 468-476
    発行日: 2024/04/01
    公開日: 2024/04/01
    ジャーナル フリー

    In Natural Language Understanding, intent detection and slot filling have been widely used to understand user queries. However, current methods tend to rely on single words and sentences to understand complex semantic concepts, and can only consider local information within the sentence. Therefore, they usually cannot capture long-distance dependencies well and are prone to problems where complex intentions in sentences are difficult to recognize. In order to solve the problem of long-distance dependency of the model, this paper uses ConceptNet as an external knowledge source and introduces its extensive semantic information into the multi-intent detection and slot filling model. Specifically, for a certain sentence, based on confidence scores and semantic relationships, the most relevant conceptual knowledge is selected to equip the sentence, and a concept context map with rich information is constructed. Then, the multi-head graph attention mechanism is used to strengthen context correlation and improve the semantic understanding ability of the model. The experimental results indicate that the model has significantly improved performance compared to other models on the MixATIS and MixSNIPS multi-intent datasets.

  • Takumi HASEGAWA, Tessai HAYAMA
    原稿種別: PAPER
    2024 年 E107.D 巻 4 号 p. 477-485
    発行日: 2024/04/01
    公開日: 2024/04/01
    ジャーナル フリー

    E-learning, which can be used anywhere and at any time, is very convenient and has been introduced to improve learning efficiency. However, securing a completion rate has been a major challenge. Recently, the learning forms of e-learning require learners to be introspective, deliberate, and logical and have proven to be incompatible with many learners with low completion rates. Thus, we developed an e-learning system that incorporates a fill-in-the-blank question-type concept map to deepen learners' understanding of learning contents while watching learning videos. The developed system promotes active learning reflectively and logically by allowing learners to answer blank question labels on concept maps from video content and labels associated with the blank question labels. We confirmed in the laboratory experiment by comparing with a conventional video-based learning system that the developed system encouraged a learner to do more system operations for rechecking the learning content and to better understand the learning contents while watching the learning video. As the next step, a field experiment is needed to investigate the usefulness and effectiveness of the developed system in actual environments in order to boost the practicality of the developed system. In this study, we introduced the developed system into the two class of the uviversity course and investigated the level of understanding to the learning contents, the system operations, and the usefulness of the developed system by comparing with those in the laboratory experiment. The results showed that the developed system provided to support the understanding to learning content and the usefulness of each function in the field experiment, as in the laboratory experiment. On the other hand, the students in the field experiment gave lower usefulness of the developed system than those in the lab experiment, suggesting that the students who attempted to thoroughly understand the learning contents in the field experiment were fewer than those in the lab experiment from their system operations during the learning.

  • Kazuma TAKAHASHI, Wen GU, Koichi OTA, Shinobu HASEGAWA
    原稿種別: PAPER
    2024 年 E107.D 巻 4 号 p. 486-494
    発行日: 2024/04/01
    公開日: 2024/04/01
    ジャーナル フリー

    In academic presentation, the structure design of presentation is critical for making the presentation logical and understandable. However, it is difficult for novice researchers to construct required academic presentation structure due to the flexibility in structure creation. To help novice researchers revise and improve their presentation structure, we propose an academic presentation structure modification support system based on structural elements of the presentation slides. In the proposed system, we build a presentation structural elements model (PSEM) that represents the essential structural elements and their relations to clarify the ideal structure of academic presentation. Based on the PSEM, we also designed two evaluation indices to evaluate the academic presentation structure. To evaluate the proposed system with real-world data, we construct a web application that generates evaluation and feedback to academic presentation slides. The experimental results demonstrate the effectiveness of the proposed system.

  • Li HE, Xiaowu ZHANG, Jianyong DUAN, Hao WANG, Xin LI, Liang ZHAO
    原稿種別: PAPER
    2024 年 E107.D 巻 4 号 p. 495-504
    発行日: 2024/04/01
    公開日: 2024/04/01
    ジャーナル フリー

    Chinese spelling correction (CSC) models detect and correct a text typo based on the misspelled character and its context. Recently, Bert-based models have dominated the research of Chinese spelling correction . However, these methods only focus on the semantic information of the text during the pretraining stage, neglecting the learning of correcting spelling errors. Moreover, when multiple incorrect characters are in the text, the context introduces noisy information, making it difficult for the model to accurately detect the positions of the incorrect characters, leading to false corrections. To address these limitations, we apply the multimodal pre-trained language model ChineseBert to the task of spelling correction. We propose a self-distillation learning-based pretraining strategy, where a confusion set is used to construct text containing erroneous characters, allowing the model to jointly learns how to understand language and correct spelling errors. Additionally, we introduce a single-channel masking mechanism to mitigate the noise caused by the incorrect characters. This mechanism masks the semantic encoding channel while preserving the phonetic and glyph encoding channels, reducing the noise introduced by incorrect characters during the prediction process. Finally, experiments are conducted on widely used benchmarks. Our model achieves superior performance against state-of-the-art methods by a remarkable gain.

  • Honghui YUAN, Keiji YANAI
    原稿種別: PAPER
    2024 年 E107.D 巻 4 号 p. 505-514
    発行日: 2024/04/01
    公開日: 2024/04/01
    ジャーナル フリー

    Deep learning techniques are used to transform the style of images and produce diverse images. In the text style transformation field, many previous studies attempted to generate stylized text using deep learning networks. However, to achieve multiple style transformations for text images, the methods proposed in previous studies require learning multiple networks or cannot be guided by style images. Thus, in this study we focused on multistyle transformation of text images using style images to guide the generation of results. We propose a multiple-style transformation network for text style transfer, which we refer to as the Multi-Style Shape Matching GAN (Multi-Style SMGAN). The proposed method generates multiple styles of text images using a single model by training the model only once, and allows users to control the text style according to style images. The proposed method implements conditions to the network such that all styles can be distinguished effectively in the network, and the generation of each styled text can be controlled according to these conditions. The proposed network is optimized such that the conditional information can be transmitted effectively throughout the network. The proposed method was evaluated experimentally on a large number of text images, and the results show that the trained model can generate multiple-style text in realtime according to the style image. In addition, the results of a user survey study indicate that the proposed method produces higher quality results compared to existing methods.

Regular Section
  • Yuan LI, Tingting HU, Ryuji FUCHIKAMI, Takeshi IKENAGA
    原稿種別: PAPER
    専門分野: Computer System
    2024 年 E107.D 巻 4 号 p. 515-524
    発行日: 2024/04/01
    公開日: 2024/04/01
    ジャーナル フリー

    A 1 millisecond (1-ms) vision system, which processes videos at 1000 frames per second (FPS) within 1ms/frame delay, plays an increasingly important role in fields such as robotics and factory automation. Superpixel as one of the most extensively employed image oversegmentation methods is a crucial pre-processing step for reducing computations in various computer vision applications. Among the different superpixel methods, simple linear iterative clustering (SLIC) has gained widespread adoption due to its simplicity, effectiveness, and computational efficiency. However, the iterative assignment and update steps in SLIC make it challenging to achieve high processing speed. To address this limitation and develop a SLIC superpixel segmentation system with a 1ms delay, this paper proposes grid sample based temporal iteration. By leveraging the high frame rate of the input video, the proposed method distributes the iterations into the temporal domain, ensuring that the system's delay keeps within one frame. Additionally, grid sample information is added as initialization information to the obtained superpixel centers for enhancing the stability of superpixels. Furthermore, a selective label propagation based pipeline architecture is proposed for parallel computation of all the possibilities of label propagation. This eliminates data dependency between adjacent pixels and enables a fully pipelined system. The evaluation results demonstrate that the proposed superpixel segmentation system achieves boundary recall and under-segmentation error comparable to the original SLIC algorithm. When considering label consistency, the proposed system surpasses the performance of state-of-the-art superpixel segmentation methods. Moreover, in terms of hardware performance, the proposed system processes 1000FPS images with 0.985ms/frame delay.

  • Wei ZHENG, Hao HU, Tengfei CHEN, Fengyu YANG, Xin FAN, Peng XIAO
    原稿種別: PAPER
    専門分野: Software Engineering
    2024 年 E107.D 巻 4 号 p. 525-536
    発行日: 2024/04/01
    公開日: 2024/04/01
    ジャーナル フリー

    Providing students with useful feedback on faulty programs can effectively help students fix programs. Spectrum-Based Fault Location (SBFL), which is a widely studied and lightweight technique, can automatically generate a suspicious value of statement ranking to help users find potential faults in a program. However, the performance of SBFL on student programs is not satisfactory, to improve the accuracy of SBFL in student programs, we propose a novel Multi-Correct Programs based Fault Localization (MCPFL) approach. Specifically, We first collected the correct programs submitted by students on the OJ system according to the programming problem numbers and removed the highly similar correct programs based on code similarity, and then stored them together with the faulty program to be located to construct a set of programs. Afterward, we analyzed the suspiciousness of the term in the faulty program through the Term Frequency-Inverse Document Frequency (TF-IDF). Finally, we designed a formula to calculate the weight of suspiciousness for program statements based on the number of input variables in the statement and weighted it to the spectrum-based fault localization formula. To evaluate the effectiveness of MCPFL, we conducted empirical studies on six student program datasets collected in our OJ system, and the results showed that MCPFL can effectively improve the traditional SBFL methods. In particular, on the EXAM metric, our approach improves by an average of 27.51% on the Dstar formula.

  • Yu WANG, Liangyong YANG, Jilian ZHANG, Xuelian DENG
    原稿種別: PAPER
    専門分野: Data Engineering, Web Information Systems
    2024 年 E107.D 巻 4 号 p. 537-543
    発行日: 2024/04/01
    公開日: 2024/04/01
    ジャーナル フリー

    Cloud computing has become the mainstream computing paradigm nowadays. More and more data owners (DO) choose to outsource their data to a cloud service provider (CSP), who is responsible for data management and query processing on behalf of DO, so as to cut down operational costs for the DO. However, in real-world applications, CSP may be untrusted, hence it is necessary to authenticate the query result returned from the CSP. In this paper, we consider the problem of approximate string query result authentication in the context of database outsourcing. Based on Merkle Hash Tree (MHT) and Trie, we propose an authenticated tree structure named MTrie for authenticating approximate string query results. We design efficient algorithms for query processing and query result authentication. To verify effectiveness of our method, we have conducted extensive experiments on real datasets and the results show that our proposed method can effectively authenticate approximate string query results.

  • Masaru YAMASHITA
    原稿種別: PAPER
    専門分野: Pattern Recognition
    2024 年 E107.D 巻 4 号 p. 544-550
    発行日: 2024/04/01
    公開日: 2024/04/01
    ジャーナル フリー

    Owing to the several cases wherein abnormal sounds, called adventitious sounds, are included in the lung sounds of a patient suffering from pulmonary disease, the objective of this study was to automatically detect abnormal sounds from auscultatory sounds. To this end, we expressed the acoustic features of the normal lung sounds of healthy people and abnormal lung sounds of patients using Gaussian mixture model (GMM)-hidden Markov models (HMMs), and distinguished between normal and abnormal lung sounds. In our previous study, we constructed left-to-right GMM-HMMs with a limited number of states. Because we expressed abnormal sounds that occur intermittently and repeatedly using limited states, the GMM-HMMs could not express the acoustic features of abnormal sounds. Furthermore, because the analysis frame length and intervals were long, the GMM-HMMs could not express the acoustic features of short time segments, such as heart sounds. Therefore, the classification rate of normal and abnormal respiration was low (86.60%). In this study, we propose the construction of ergodic GMM-HMMs with a repetitive structure for intermittent sounds. Furthermore, we considered a suitable frame length and frame interval to analyze acoustic features. Using the ergodic GMM-HMM, which can express the acoustic features of abnormal sounds and heart sounds that occur repeatedly in detail, the classification rate increased (89.34%). The results obtained in this study demonstrated the effectiveness of the proposed method.

  • Haijun LIANG, Yukun LI, Jianguo KONG, Qicong HAN, Chengyu YU
    原稿種別: PAPER
    専門分野: Speech and Hearing
    2024 年 E107.D 巻 4 号 p. 551-558
    発行日: 2024/04/01
    公開日: 2024/04/01
    ジャーナル フリー

    Air Traffic Control (ATC) communication suffers from issues such as high electromagnetic interference, fast speech rate, and low intelligibility, which pose challenges for downstream tasks like Automatic Speech Recognition (ASR). This article aims to research how to enhance the audio quality and intelligibility of civil aviation speech through speech enhancement methods, thereby improving the accuracy of speech recognition and providing support for the digitalization of civil aviation. We propose a speech enhancement model called DIUnet_V (DenseNet & Inception & U-Net & Volume) that combines both time-frequency and time-domain methods to effectively handle the specific characteristics of civil aviation speech, such as predominant electromagnetic interference and fast speech rate. For model evaluation, we assess the denoising and enhancement effects using three metrics: Signal-to-Noise Ratio (SNR), Mean Opinion Score (MOS), and speech recognition error rate. On a simulated ATC training recording dataset, DIUnet_Volume10 achieved an SNR value of 7.3861, showing a 4.5663 improvement compared to the original U-net model. To address the challenge of the absence of clean speech in the ATC working environment, which makes it difficult to accurately calculate SNR, we propose evaluating the denoising effects indirectly based on the recognition performance of an ATC speech recognition system. On a real ATC speech dataset, the average word error rate decreased by 1.79% absolute and the average sentence error rate decreased by 3% absolute for DIUnet_V processed speech compared to the unprocessed speech in the built speech recognition system.

  • Yuuki AOIKE, Masashi KIYOMI, Yasuaki KOBAYASHI, Yota OTACHI
    原稿種別: LETTER
    専門分野: Fundamentals of Information Systems
    2024 年 E107.D 巻 4 号 p. 559-563
    発行日: 2024/04/01
    公開日: 2024/04/01
    ジャーナル フリー

    In this note, we consider the problem of finding a step-by-step transformation between two longest increasing subsequences in a sequence, namely Longest Increasing Subsequence Reconfiguration. We give a polynomial-time algorithm for deciding whether there is a reconfiguration sequence between two longest increasing subsequences in a sequence. This implies that Independent Set Reconfiguration and Token Sliding are polynomial-time solvable on permutation graphs, provided that the input two independent sets are largest among all independent sets in the input graph. We also consider a special case, where the underlying permutation graph of an input sequence is bipartite. In this case, we give a polynomial-time algorithm for finding a shortest reconfiguration sequence (if it exists).

  • Yeongwoo HA, Seongbeom PARK, Jieun LEE, Sangeun OH
    原稿種別: LETTER
    専門分野: Information Network
    2024 年 E107.D 巻 4 号 p. 564-568
    発行日: 2024/04/01
    公開日: 2024/04/01
    ジャーナル フリー

    With the recent advances in IoT, there is a growing interest in multi-surface computing, where a mobile app can cooperatively utilize multiple devices' surfaces. We propose a novel framework that seamlessly augments mobile apps with multi-surface computing capabilities. It enables various apps to employ multiple surfaces with acceptable performance.

  • Zhengwei XIA, Yun LIU, Xiaoyun WANG, Feiyun ZHANG, Rui CHEN, Weiwei JI ...
    原稿種別: LETTER
    専門分野: Image Processing and Video Processing
    2024 年 E107.D 巻 4 号 p. 569-573
    発行日: 2024/04/01
    公開日: 2024/04/01
    ジャーナル フリー

    Infrared and visible image fusion can combine the thermal radiation information and the textures to provide a high-quality fused image. In this letter, we propose a hybrid variational fusion model to achieve this end. Specifically, an 0 term is adopted to preserve the highlighted targets with salient gradient variation in the infrared image, an 1 term is used to suppress the noise in the fused image and an 2 term is employed to keep the textures of the visible image. Experimental results demonstrate the superiority of the proposed variational model and our results have more sharpen textures with less noise.

  • Wocheng XIAO, Lingyu LIANG, Jianyong CHEN, Tao WANG
    原稿種別: LETTER
    専門分野: Image Recognition, Computer Vision
    2024 年 E107.D 巻 4 号 p. 574-578
    発行日: 2024/04/01
    公開日: 2024/04/01
    ジャーナル フリー

    Video text detection (VTD) aims to localize text instances in videos, which has wide applications for downstream tasks. To deal with the variances of different scenes and text instances, multiple models and feature fusion strategies were typically integrated in existing VTD methods. A VTD method consisting of sophisticated components can efficiently improve detection accuracy, but may suffer from a limitation for real-time applications. This paper aims to achieve real-time VTD with an adaptive lightweight end-to-end framework. Different from previous methods that represent text in a spatial domain, we model text instances in the Fourier domain. Specifically, we propose a scale-aware Fourier Contour Embedding method, which not only models arbitrary shaped text contours of videos as compact signatures, but also adaptively select proper scales for features in a backbone in the training stage. Then, we construct VTD-FCENet to achieve real-time VTD, which encodes temporal correlations of adjacent frames with scale-aware FCE in a lightweight and adaptive manner. Quantitative evaluations were conducted on ICDAR2013 Video, Minetto and YVT benchmark datasets, and the results show that our VTD-FCENet not only obtains the state-of-the-arts or competitive detection accuracy, but also allows real-time text detection on HD videos simultaneously.

  • Hao CHEN, Zhe-Ming LU, Jie LIU
    原稿種別: LETTER
    専門分野: Image Recognition, Computer Vision
    2024 年 E107.D 巻 4 号 p. 579-583
    発行日: 2024/04/01
    公開日: 2024/04/01
    ジャーナル フリー

    This Letter focuses on deep learning-based monkeys' head swing counting problem. Nowadays, there are very few papers on monkey detection, and even fewer papers on monkeys' head swing counting. This research tries to fill in the gap and try to calculate the head swing frequency of monkeys through deep learning, where we further extend the traditional target detection algorithm. After analyzing object detection results, we localize the monkey's actions over a period. This Letter analyzes the task of counting monkeys' head swings, and proposes the standard that accurately describes a monkey's head swing. Under the guidance of this standard, the monkeys' head swing counting accuracy in 50 test videos reaches 94.23%.

  • Zezhong LI, Fuji REN
    原稿種別: LETTER
    専門分野: Natural Language Processing
    2024 年 E107.D 巻 4 号 p. 584-587
    発行日: 2024/04/01
    公開日: 2024/04/01
    ジャーナル フリー

    Compared to subword based Neural Machine Translation (NMT), character based NMT eschews linguistic-motivated segmentation which performs directly on the raw character sequence, following a more absolute end-to-end manner. This property is more fascinating for machine translation (MT) between Japanese and Chinese, both of which use consecutive logographic characters without explicit word boundaries. However, there is still one disadvantage which should be addressed, that is, character is a less meaning-bearing unit than the subword, which requires the character models to be capable of sense discrimination. Specifically, there are two types of sense ambiguities existing in the source and target language, separately. With the former, it has been partially solved by the deep encoder and several existing works. But with the later, interestingly, the ambiguity in the target side is rarely discussed. To address this problem, we propose two simple yet effective methods, including a non-parametric pre-clustering for sense induction and a joint model to perform sense discrimination and NMT training simultaneously. Extensive experiments on Japanese↔Chinese MT show that our proposed methods consistently outperform the strong baselines, and verify the effectiveness of using sense-discriminated representation for character based NMT.

feedback
Top