IEICE Transactions on Information and Systems

Special Section on Information and Communication Technology to Support Hyperconnectivity

FOREWORD

Tokuro MATSUO

2024 年 E107.D 巻 4 号 p. 424-425
発行日: 2024/04/01
公開日: 2024/04/01

DOIhttps://doi.org/10.1587/transinf.2023IHF0001

ジャーナルフリー

PDF形式でダウンロード (202K)
An Automated Multi-Phase Facilitation Agent Based on LLM

Yihan DONG, Shiyao DING, Takayuki ITO

原稿種別: PAPER
2024 年 E107.D 巻 4 号 p. 426-433
発行日: 2024/04/01
公開日: 2024/04/01

DOIhttps://doi.org/10.1587/transinf.2023IHP0011

ジャーナルフリー

抄録を表示する抄録を非表示にする

This paper presents the design and implementation of an automated multi-phase facilitation agent based on LLM to realize inclusive facilitation and efficient use of a large language model (LLM) to facilitate realistic discussions. Large-scale discussion support systems have been studied and implemented very widely since they enable a lot of people to discuss remotely and within 24 hours and 7 days. Furthermore, automated facilitation artificial intelligence (AI) agents have been realized since they can efficiently facilitate large-scale discussions. For example, D-Agree is a large-scale discussion support system where an automated facilitation AI agent facilitates discussion among people. Since the current automated facilitation agent was designed following the structure of the issue-based information system (IBIS) and the IBIS-based agent has been proven that it has superior performance. However, there are several problems that need to be addressed with the IBIS-based agent. In this paper, we focus on the following three problems: 1) The IBIS-based agent was designed to only promote other participants' posts by replying to existing posts accordingly, lacking the consideration of different behaviours taken by participants with diverse characteristics, leading to a result that sometimes the discussion is not sufficient. 2) The facilitation messages generated by the IBIS-based agent were not natural enough, leading to consequences that the participants were not sufficiently promoted and the participants did not follow the flow to discuss a topic. 3) Since the IBIS-based agent is not combined with LLM, designing the control of LLM is necessary. Thus, to solve the problems mentioned above, the design of a phase-based facilitation framework is proposed in this paper. Specifically, we propose two significant designs: One is the design for a multi-phase facilitation agent created based on the framework to address problem 1); The other one is the design for the combination with LLM to address problem 2) and 3). Particularly, the language model called “GPT-3.5” is used for the combination by using corresponding APIs from OPENAI. Furthermore, we demonstrate the improvement of our facilitation agent framework by presenting the evaluations and a case study. Besides, we present the difference between our framework and LangChain which has generic features to utilize LLMs.

抄録全体を表示

PDF形式でダウンロード (1647K)
Conversational AI as a Facilitator Improves Participant Engagement and Problem-Solving in Online Discussion: Sharing Evidence from Five Cities in Afghanistan

Sofia SAHAB, Jawad HAQBEEN, Takayuki ITO

原稿種別: PAPER
2024 年 E107.D 巻 4 号 p. 434-442
発行日: 2024/04/01
公開日: 2024/04/01

DOIhttps://doi.org/10.1587/transinf.2023IHP0014

ジャーナルフリー

抄録を表示する抄録を非表示にする

Despite the increasing use of conversational artificial intelligence (AI) in online discussion environments, few studies explore the application of AI as a facilitator in forming problem-solving debates and influencing opinions in cross-venue scenarios, particularly in diverse and war-ravaged countries. This study aims to investigate the impact of AI on enhancing participant engagement and collaborative problem-solving in online-mediated discussion environments, especially in diverse and heterogeneous discussion settings, such as the five cities in Afghanistan. We seek to assess the extent to which AI participation in online conversations succeeds by examining the depth of discussions and participants' contributions, comparing discussions facilitated by AI with those not facilitated by AI across different venues. The results are discussed with respect to forming and changing opinions with and without AI-mediated communication. The findings indicate that the number of opinions generated in AI-facilitated discussions significantly differs from discussions without AI support. Additionally, statistical analyses reveal quantitative disparities in online discourse sentiments when conversational AI is present compared to when it is absent. These findings contribute to a better understanding of the role of AI-mediated discussions and offer several practical and social implications, paving the way for future developments and improvements.

抄録全体を表示

PDF形式でダウンロード (1145K)
Learning from Repeated Trials without Feedback: Can Collective Intelligence Outperform the Best Members?

Yoshiko ARIMA

原稿種別: PAPER
2024 年 E107.D 巻 4 号 p. 443-450
発行日: 2024/04/01
公開日: 2024/04/01

DOIhttps://doi.org/10.1587/transinf.2023IHP0001

ジャーナルフリー

抄録を表示する抄録を非表示にする

Both group process studies and collective intelligence studies are concerned with “which of the crowds and the best members perform better.” This can be seen as a matter of democracy versus dictatorship. Having evidence of the growth potential of crowds and experts can be useful in making correct predictions and can benefit humanity. In the collective intelligence experimental paradigm, experts' or best members ability is compared with the accuracy of the crowd average. In this research (n =620), using repeated trials of simple tasks, we compare the correct answer of a class average (index of collective intelligence) and the best member (the one whose answer was closest to the correct answer). The results indicated that, for the cognition task, collective intelligence improved to the level of the best member through repeated trials without feedback; however, it depended on the ability of the best members for the prediction task. The present study suggested that best members' superiority over crowds for the prediction task on the premise of being free from social influence. However, machine learning results suggests that the best members among us cannot be easily found beforehand because they appear through repeated trials.

抄録全体を表示

PDF形式でダウンロード (831K)
Pattern-Based Meta Graph Neural Networks for Argument Classifications

Shiyao DING, Takayuki ITO

原稿種別: PAPER
2024 年 E107.D 巻 4 号 p. 451-458
発行日: 2024/04/01
公開日: 2024/04/01

DOIhttps://doi.org/10.1587/transinf.2023IHP0013

ジャーナルフリー

抄録を表示する抄録を非表示にする

Despite recent advancements in utilizing meta-learning for addressing the generalization challenges of graph neural networks (GNN), their performance in argumentation mining tasks, such as argument classifications, remains relatively limited. This is primarily due to the under-utilization of potential pattern knowledge intrinsic to argumentation structures. To address this issue, our study proposes a two-stage, pattern-based meta-GNN method in contrast to conventional pattern-free meta-GNN approaches. Initially, our method focuses on learning a high-level pattern representation to effectively capture the pattern knowledge within an argumentation structure and then predicts edge types. It then utilizes a meta-learning framework in the second stage, designed to train a meta-learner based on the predicted edge types. This feature allows for rapid generalization to novel argumentation graphs. Through experiments on real English discussion datasets spanning diverse topics, our results demonstrate that our proposed method substantially outperforms conventional pattern-free GNN approaches, signifying a significant stride forward in this domain.

抄録全体を表示

PDF形式でダウンロード (1722K)
Mining User Activity Patterns from Time-Series Data Obtained from UWB Sensors in Indoor Environments

Muhammad FAWAD RAHIM, Tessai HAYAMA

原稿種別: PAPER
2024 年 E107.D 巻 4 号 p. 459-467
発行日: 2024/04/01
公開日: 2024/04/01

DOIhttps://doi.org/10.1587/transinf.2023IHP0002

ジャーナルフリー

抄録を表示する抄録を非表示にする

In recent years, location-based technologies for ubiquitous environments have aimed to realize services tailored to each purpose based on information about an individual's current location. To establish such advanced location-based services, an estimation technology that can accurately recognize and predict the movements of people and objects is necessary. Although global positioning system (GPS) has already been used as a standard for outdoor positioning technology and many services have been realized, several techniques using conventional wireless sensors such as Wi-Fi, RFID, and Bluetooth have been considered for indoor positioning technology. However, conventional wireless indoor positioning is prone to the effects of noise, and the large range of estimated indoor locations makes it difficult to identify human activities precisely. We propose a method to mine user activity patterns from time-series data of user's locationss in an indoor environment using ultra-wideband (UWB) sensors. An UWB sensor is useful for indoor positioning due to its high noise immunity and measurement accuracy, however, to our knowledge, estimation and prediction of human indoor activities using UWB sensors have not yet been addressed. The proposed method consists of three steps: 1) obtaining time-series data of the user's location using a UWB sensor attached to the user, and then estimating the areas where the user has stayed; 2) associating each area of the user's stay with a nearby landmark of activity and assigning indoor activities; and 3) mining the user's activity patterns based on the user's indoor activities and their transitions. We conducted experiments to evaluate the proposed method by investigating the accuracy of estimating the user's area of stay using a UWB sensor and observing the results of activity pattern mining applied to actual laboratory members over 30-days. The results showed that the proposed method is superior to a comparison method, Time-based clustering algorithm, in estimating the stay areas precisely, and that it is possible to reveal the user's activity patterns appropriately in the actual environment.

抄録全体を表示

PDF形式でダウンロード (5910K)
Conceptual Knowledge Enhanced Model for Multi-Intent Detection and Slot Filling

Li HE, Jingxuan ZHAO, Jianyong DUAN, Hao WANG, Xin LI

原稿種別: PAPER
2024 年 E107.D 巻 4 号 p. 468-476
発行日: 2024/04/01
公開日: 2024/04/01

DOIhttps://doi.org/10.1587/transinf.2023IHP0004

ジャーナルフリー

抄録を表示する抄録を非表示にする

In Natural Language Understanding, intent detection and slot filling have been widely used to understand user queries. However, current methods tend to rely on single words and sentences to understand complex semantic concepts, and can only consider local information within the sentence. Therefore, they usually cannot capture long-distance dependencies well and are prone to problems where complex intentions in sentences are difficult to recognize. In order to solve the problem of long-distance dependency of the model, this paper uses ConceptNet as an external knowledge source and introduces its extensive semantic information into the multi-intent detection and slot filling model. Specifically, for a certain sentence, based on confidence scores and semantic relationships, the most relevant conceptual knowledge is selected to equip the sentence, and a concept context map with rich information is constructed. Then, the multi-head graph attention mechanism is used to strengthen context correlation and improve the semantic understanding ability of the model. The experimental results indicate that the model has significantly improved performance compared to other models on the MixATIS and MixSNIPS multi-intent datasets.

抄録全体を表示

PDF形式でダウンロード (1465K)
Practical Application of an e-Learning Support System Incorporating a Fill-in-the-Blank Question-Type Concept Map

Takumi HASEGAWA, Tessai HAYAMA

原稿種別: PAPER
2024 年 E107.D 巻 4 号 p. 477-485
発行日: 2024/04/01
公開日: 2024/04/01

DOIhttps://doi.org/10.1587/transinf.2023IHP0012

ジャーナルフリー

抄録を表示する抄録を非表示にする

E-learning, which can be used anywhere and at any time, is very convenient and has been introduced to improve learning efficiency. However, securing a completion rate has been a major challenge. Recently, the learning forms of e-learning require learners to be introspective, deliberate, and logical and have proven to be incompatible with many learners with low completion rates. Thus, we developed an e-learning system that incorporates a fill-in-the-blank question-type concept map to deepen learners' understanding of learning contents while watching learning videos. The developed system promotes active learning reflectively and logically by allowing learners to answer blank question labels on concept maps from video content and labels associated with the blank question labels. We confirmed in the laboratory experiment by comparing with a conventional video-based learning system that the developed system encouraged a learner to do more system operations for rechecking the learning content and to better understand the learning contents while watching the learning video. As the next step, a field experiment is needed to investigate the usefulness and effectiveness of the developed system in actual environments in order to boost the practicality of the developed system. In this study, we introduced the developed system into the two class of the uviversity course and investigated the level of understanding to the learning contents, the system operations, and the usefulness of the developed system by comparing with those in the laboratory experiment. The results showed that the developed system provided to support the understanding to learning content and the usefulness of each function in the field experiment, as in the laboratory experiment. On the other hand, the students in the field experiment gave lower usefulness of the developed system than those in the lab experiment, suggesting that the students who attempted to thoroughly understand the learning contents in the field experiment were fewer than those in the lab experiment from their system operations during the learning.

抄録全体を表示

PDF形式でダウンロード (1321K)
An Academic Presentation Support System Utilizing Structural Elements

Kazuma TAKAHASHI, Wen GU, Koichi OTA, Shinobu HASEGAWA

原稿種別: PAPER
2024 年 E107.D 巻 4 号 p. 486-494
発行日: 2024/04/01
公開日: 2024/04/01

DOIhttps://doi.org/10.1587/transinf.2023IHP0006

ジャーナルフリー

抄録を表示する抄録を非表示にする

In academic presentation, the structure design of presentation is critical for making the presentation logical and understandable. However, it is difficult for novice researchers to construct required academic presentation structure due to the flexibility in structure creation. To help novice researchers revise and improve their presentation structure, we propose an academic presentation structure modification support system based on structural elements of the presentation slides. In the proposed system, we build a presentation structural elements model (PSEM) that represents the essential structural elements and their relations to clarify the ideal structure of academic presentation. Based on the PSEM, we also designed two evaluation indices to evaluate the academic presentation structure. To evaluate the proposed system with real-world data, we construct a web application that generates evaluation and feedback to academic presentation slides. The experimental results demonstrate the effectiveness of the proposed system.

抄録全体を表示

PDF形式でダウンロード (1426K)
PSDSpell: Pre-Training with Self-Distillation Learning for Chinese Spelling Correction

Li HE, Xiaowu ZHANG, Jianyong DUAN, Hao WANG, Xin LI, Liang ZHAO

原稿種別: PAPER
2024 年 E107.D 巻 4 号 p. 495-504
発行日: 2024/04/01
公開日: 2024/04/01

DOIhttps://doi.org/10.1587/transinf.2023IHP0005

ジャーナルフリー

抄録を表示する抄録を非表示にする

Chinese spelling correction (CSC) models detect and correct a text typo based on the misspelled character and its context. Recently, Bert-based models have dominated the research of Chinese spelling correction . However, these methods only focus on the semantic information of the text during the pretraining stage, neglecting the learning of correcting spelling errors. Moreover, when multiple incorrect characters are in the text, the context introduces noisy information, making it difficult for the model to accurately detect the positions of the incorrect characters, leading to false corrections. To address these limitations, we apply the multimodal pre-trained language model ChineseBert to the task of spelling correction. We propose a self-distillation learning-based pretraining strategy, where a confusion set is used to construct text containing erroneous characters, allowing the model to jointly learns how to understand language and correct spelling errors. Additionally, we introduce a single-channel masking mechanism to mitigate the noise caused by the incorrect characters. This mechanism masks the semantic encoding channel while preserving the phonetic and glyph encoding channels, reducing the noise introduced by incorrect characters during the prediction process. Finally, experiments are conducted on widely used benchmarks. Our model achieves superior performance against state-of-the-art methods by a remarkable gain.

抄録全体を表示

PDF形式でダウンロード (2576K)
Multi-Style Shape Matching GAN for Text Images

Honghui YUAN, Keiji YANAI

原稿種別: PAPER
2024 年 E107.D 巻 4 号 p. 505-514
発行日: 2024/04/01
公開日: 2024/04/01

DOIhttps://doi.org/10.1587/transinf.2023IHP0010

ジャーナルフリー

抄録を表示する抄録を非表示にする

Deep learning techniques are used to transform the style of images and produce diverse images. In the text style transformation field, many previous studies attempted to generate stylized text using deep learning networks. However, to achieve multiple style transformations for text images, the methods proposed in previous studies require learning multiple networks or cannot be guided by style images. Thus, in this study we focused on multistyle transformation of text images using style images to guide the generation of results. We propose a multiple-style transformation network for text style transfer, which we refer to as the Multi-Style Shape Matching GAN (Multi-Style SMGAN). The proposed method generates multiple styles of text images using a single model by training the model only once, and allows users to control the text style according to style images. The proposed method implements conditions to the network such that all styles can be distinguished effectively in the network, and the generation of each styled text can be controlled according to these conditions. The proposed network is optimized such that the conditional information can be transmitted effectively throughout the network. The proposed method was evaluated experimentally on a large number of text images, and the results show that the trained model can generate multiple-style text in realtime according to the style image. In addition, the results of a user survey study indicate that the proposed method produces higher quality results compared to existing methods.

抄録全体を表示

PDF形式でダウンロード (8091K)

Regular Section

Grid Sample Based Temporal Iteration for Fully Pipelined 1-ms SLIC Superpixel Segmentation System

Yuan LI, Tingting HU, Ryuji FUCHIKAMI, Takeshi IKENAGA

原稿種別: PAPER
専門分野: Computer System
2024 年 E107.D 巻 4 号 p. 515-524
発行日: 2024/04/01
公開日: 2024/04/01

DOIhttps://doi.org/10.1587/transinf.2023EDP7128

ジャーナルフリー

抄録を表示する抄録を非表示にする

A 1 millisecond (1-ms) vision system, which processes videos at 1000 frames per second (FPS) within 1ms/frame delay, plays an increasingly important role in fields such as robotics and factory automation. Superpixel as one of the most extensively employed image oversegmentation methods is a crucial pre-processing step for reducing computations in various computer vision applications. Among the different superpixel methods, simple linear iterative clustering (SLIC) has gained widespread adoption due to its simplicity, effectiveness, and computational efficiency. However, the iterative assignment and update steps in SLIC make it challenging to achieve high processing speed. To address this limitation and develop a SLIC superpixel segmentation system with a 1ms delay, this paper proposes grid sample based temporal iteration. By leveraging the high frame rate of the input video, the proposed method distributes the iterations into the temporal domain, ensuring that the system's delay keeps within one frame. Additionally, grid sample information is added as initialization information to the obtained superpixel centers for enhancing the stability of superpixels. Furthermore, a selective label propagation based pipeline architecture is proposed for parallel computation of all the possibilities of label propagation. This eliminates data dependency between adjacent pixels and enables a fully pipelined system. The evaluation results demonstrate that the proposed superpixel segmentation system achieves boundary recall and under-segmentation error comparable to the original SLIC algorithm. When considering label consistency, the proposed system surpasses the performance of state-of-the-art superpixel segmentation methods. Moreover, in terms of hardware performance, the proposed system processes 1000FPS images with 0.985ms/frame delay.

抄録全体を表示

PDF形式でダウンロード (10359K)
Boosting Spectrum-Based Fault Localization via Multi-Correct Programs in Online Programming

Wei ZHENG, Hao HU, Tengfei CHEN, Fengyu YANG, Xin FAN, Peng XIAO

原稿種別: PAPER
専門分野: Software Engineering
2024 年 E107.D 巻 4 号 p. 525-536
発行日: 2024/04/01
公開日: 2024/04/01

DOIhttps://doi.org/10.1587/transinf.2023EDP7164

ジャーナルフリー

抄録を表示する抄録を非表示にする

Providing students with useful feedback on faulty programs can effectively help students fix programs. Spectrum-Based Fault Location (SBFL), which is a widely studied and lightweight technique, can automatically generate a suspicious value of statement ranking to help users find potential faults in a program. However, the performance of SBFL on student programs is not satisfactory, to improve the accuracy of SBFL in student programs, we propose a novel Multi-Correct Programs based Fault Localization (MCPFL) approach. Specifically, We first collected the correct programs submitted by students on the OJ system according to the programming problem numbers and removed the highly similar correct programs based on code similarity, and then stored them together with the faulty program to be located to construct a set of programs. Afterward, we analyzed the suspiciousness of the term in the faulty program through the Term Frequency-Inverse Document Frequency (TF-IDF). Finally, we designed a formula to calculate the weight of suspiciousness for program statements based on the number of input variables in the statement and weighted it to the spectrum-based fault localization formula. To evaluate the effectiveness of MCPFL, we conducted empirical studies on six student program datasets collected in our OJ system, and the results showed that MCPFL can effectively improve the traditional SBFL methods. In particular, on the EXAM metric, our approach improves by an average of 27.51% on the Dstar formula.

抄録全体を表示

PDF形式でダウンロード (2061K)
A Trie-Based Authentication Scheme for Approximate String Queries

Yu WANG, Liangyong YANG, Jilian ZHANG, Xuelian DENG

原稿種別: PAPER
専門分野: Data Engineering, Web Information Systems
2024 年 E107.D 巻 4 号 p. 537-543
発行日: 2024/04/01
公開日: 2024/04/01

DOIhttps://doi.org/10.1587/transinf.2023EDP7185

ジャーナルフリー

抄録を表示する抄録を非表示にする

Cloud computing has become the mainstream computing paradigm nowadays. More and more data owners (DO) choose to outsource their data to a cloud service provider (CSP), who is responsible for data management and query processing on behalf of DO, so as to cut down operational costs for the DO. However, in real-world applications, CSP may be untrusted, hence it is necessary to authenticate the query result returned from the CSP. In this paper, we consider the problem of approximate string query result authentication in the context of database outsourcing. Based on Merkle Hash Tree (MHT) and Trie, we propose an authenticated tree structure named MTrie for authenticating approximate string query results. We design efficient algorithms for query processing and query result authentication. To verify effectiveness of our method, we have conducted extensive experiments on real datasets and the results show that our proposed method can effectively authenticate approximate string query results.

抄録全体を表示

PDF形式でダウンロード (2669K)
Construction of Ergodic GMM-HMMs for Classification between Healthy Individuals and Patients Suffering from Pulmonary Disease

Masaru YAMASHITA

原稿種別: PAPER
専門分野: Pattern Recognition
2024 年 E107.D 巻 4 号 p. 544-550
発行日: 2024/04/01
公開日: 2024/04/01

DOIhttps://doi.org/10.1587/transinf.2023EDP7040

ジャーナルフリー

抄録を表示する抄録を非表示にする

Owing to the several cases wherein abnormal sounds, called adventitious sounds, are included in the lung sounds of a patient suffering from pulmonary disease, the objective of this study was to automatically detect abnormal sounds from auscultatory sounds. To this end, we expressed the acoustic features of the normal lung sounds of healthy people and abnormal lung sounds of patients using Gaussian mixture model (GMM)-hidden Markov models (HMMs), and distinguished between normal and abnormal lung sounds. In our previous study, we constructed left-to-right GMM-HMMs with a limited number of states. Because we expressed abnormal sounds that occur intermittently and repeatedly using limited states, the GMM-HMMs could not express the acoustic features of abnormal sounds. Furthermore, because the analysis frame length and intervals were long, the GMM-HMMs could not express the acoustic features of short time segments, such as heart sounds. Therefore, the classification rate of normal and abnormal respiration was low (86.60%). In this study, we propose the construction of ergodic GMM-HMMs with a repetitive structure for intermittent sounds. Furthermore, we considered a suitable frame length and frame interval to analyze acoustic features. Using the ergodic GMM-HMM, which can express the acoustic features of abnormal sounds and heart sounds that occur repeatedly in detail, the classification rate increased (89.34%). The results obtained in this study demonstrated the effectiveness of the proposed method.

抄録全体を表示

PDF形式でダウンロード (2965K)
Enhancing Speech Quality in Air Traffic Control Communication Using DIUnet_V-Based Speech Enhancement Techniques

Haijun LIANG, Yukun LI, Jianguo KONG, Qicong HAN, Chengyu YU

原稿種別: PAPER
専門分野: Speech and Hearing
2024 年 E107.D 巻 4 号 p. 551-558
発行日: 2024/04/01
公開日: 2024/04/01

DOIhttps://doi.org/10.1587/transinf.2023EDP7110

ジャーナルフリー

抄録を表示する抄録を非表示にする

Air Traffic Control (ATC) communication suffers from issues such as high electromagnetic interference, fast speech rate, and low intelligibility, which pose challenges for downstream tasks like Automatic Speech Recognition (ASR). This article aims to research how to enhance the audio quality and intelligibility of civil aviation speech through speech enhancement methods, thereby improving the accuracy of speech recognition and providing support for the digitalization of civil aviation. We propose a speech enhancement model called DIUnet_V (DenseNet & Inception & U-Net & Volume) that combines both time-frequency and time-domain methods to effectively handle the specific characteristics of civil aviation speech, such as predominant electromagnetic interference and fast speech rate. For model evaluation, we assess the denoising and enhancement effects using three metrics: Signal-to-Noise Ratio (SNR), Mean Opinion Score (MOS), and speech recognition error rate. On a simulated ATC training recording dataset, DIUnet_Volume10 achieved an SNR value of 7.3861, showing a 4.5663 improvement compared to the original U-net model. To address the challenge of the absence of clean speech in the ATC working environment, which makes it difficult to accurately calculate SNR, we propose evaluating the denoising effects indirectly based on the recognition performance of an ATC speech recognition system. On a real ATC speech dataset, the average word error rate decreased by 1.79% absolute and the average sentence error rate decreased by 3% absolute for DIUnet_V processed speech compared to the unprocessed speech in the built speech recognition system.

抄録全体を表示

PDF形式でダウンロード (12336K)
Finding a Reconfiguration Sequence between Longest Increasing Subsequences

Yuuki AOIKE, Masashi KIYOMI, Yasuaki KOBAYASHI, Yota OTACHI

原稿種別: LETTER
専門分野: Fundamentals of Information Systems
2024 年 E107.D 巻 4 号 p. 559-563
発行日: 2024/04/01
公開日: 2024/04/01

DOIhttps://doi.org/10.1587/transinf.2023EDL8067

ジャーナルフリー

抄録を表示する抄録を非表示にする

In this note, we consider the problem of finding a step-by-step transformation between two longest increasing subsequences in a sequence, namely Longest Increasing Subsequence Reconfiguration. We give a polynomial-time algorithm for deciding whether there is a reconfiguration sequence between two longest increasing subsequences in a sequence. This implies that Independent Set Reconfiguration and Token Sliding are polynomial-time solvable on permutation graphs, provided that the input two independent sets are largest among all independent sets in the input graph. We also consider a special case, where the underlying permutation graph of an input sequence is bipartite. In this case, we give a polynomial-time algorithm for finding a shortest reconfiguration sequence (if it exists).

抄録全体を表示

PDF形式でダウンロード (127K)
App-Level Multi-Surface Framework for Supporting Cross-Platform User Interface Distribution

Yeongwoo HA, Seongbeom PARK, Jieun LEE, Sangeun OH

原稿種別: LETTER
専門分野: Information Network
2024 年 E107.D 巻 4 号 p. 564-568
発行日: 2024/04/01
公開日: 2024/04/01

DOIhttps://doi.org/10.1587/transinf.2023EDL8060

ジャーナルフリー

抄録を表示する抄録を非表示にする

With the recent advances in IoT, there is a growing interest in multi-surface computing, where a mobile app can cooperatively utilize multiple devices' surfaces. We propose a novel framework that seamlessly augments mobile apps with multi-surface computing capabilities. It enables various apps to employ multiple surfaces with acceptable performance.

抄録全体を表示

PDF形式でダウンロード (501K)
Infrared and Visible Image Fusion via Hybrid Variational Model

Zhengwei XIA, Yun LIU, Xiaoyun WANG, Feiyun ZHANG, Rui CHEN, Weiwei JI ...

原稿種別: LETTER
専門分野: Image Processing and Video Processing
2024 年 E107.D 巻 4 号 p. 569-573
発行日: 2024/04/01
公開日: 2024/04/01

DOIhttps://doi.org/10.1587/transinf.2023EDL8027

ジャーナルフリー

抄録を表示する抄録を非表示にする

Infrared and visible image fusion can combine the thermal radiation information and the textures to provide a high-quality fused image. In this letter, we propose a hybrid variational fusion model to achieve this end. Specifically, an ℓ₀ term is adopted to preserve the highlighted targets with salient gradient variation in the infrared image, an ℓ₁ term is used to suppress the noise in the fused image and an ℓ₂ term is employed to keep the textures of the visible image. Experimental results demonstrate the superiority of the proposed variational model and our results have more sharpen textures with less noise.

抄録全体を表示

PDF形式でダウンロード (10257K)
VTD-FCENet: A Real-Time HD Video Text Detection with Scale-Aware Fourier Contour Embedding

Wocheng XIAO, Lingyu LIANG, Jianyong CHEN, Tao WANG

原稿種別: LETTER
専門分野: Image Recognition, Computer Vision
2024 年 E107.D 巻 4 号 p. 574-578
発行日: 2024/04/01
公開日: 2024/04/01

DOIhttps://doi.org/10.1587/transinf.2023EDL8030

ジャーナルフリー

抄録を表示する抄録を非表示にする

Video text detection (VTD) aims to localize text instances in videos, which has wide applications for downstream tasks. To deal with the variances of different scenes and text instances, multiple models and feature fusion strategies were typically integrated in existing VTD methods. A VTD method consisting of sophisticated components can efficiently improve detection accuracy, but may suffer from a limitation for real-time applications. This paper aims to achieve real-time VTD with an adaptive lightweight end-to-end framework. Different from previous methods that represent text in a spatial domain, we model text instances in the Fourier domain. Specifically, we propose a scale-aware Fourier Contour Embedding method, which not only models arbitrary shaped text contours of videos as compact signatures, but also adaptively select proper scales for features in a backbone in the training stage. Then, we construct VTD-FCENet to achieve real-time VTD, which encodes temporal correlations of adjacent frames with scale-aware FCE in a lightweight and adaptive manner. Quantitative evaluations were conducted on ICDAR2013 Video, Minetto and YVT benchmark datasets, and the results show that our VTD-FCENet not only obtains the state-of-the-arts or competitive detection accuracy, but also allows real-time text detection on HD videos simultaneously.

抄録全体を表示

PDF形式でダウンロード (19211K)
A Monkey Swing Counting Algorithm Based on Object Detection

Hao CHEN, Zhe-Ming LU, Jie LIU

原稿種別: LETTER
専門分野: Image Recognition, Computer Vision
2024 年 E107.D 巻 4 号 p. 579-583
発行日: 2024/04/01
公開日: 2024/04/01

DOIhttps://doi.org/10.1587/transinf.2023EDL8055

ジャーナルフリー

抄録を表示する抄録を非表示にする

This Letter focuses on deep learning-based monkeys' head swing counting problem. Nowadays, there are very few papers on monkey detection, and even fewer papers on monkeys' head swing counting. This research tries to fill in the gap and try to calculate the head swing frequency of monkeys through deep learning, where we further extend the traditional target detection algorithm. After analyzing object detection results, we localize the monkey's actions over a period. This Letter analyzes the task of counting monkeys' head swings, and proposes the standard that accurately describes a monkey's head swing. Under the guidance of this standard, the monkeys' head swing counting accuracy in 50 test videos reaches 94.23%.

抄録全体を表示

PDF形式でダウンロード (947K)
Sense-Aware Decoder for Character Based Japanese-Chinese NMT

Zezhong LI, Fuji REN

原稿種別: LETTER
専門分野: Natural Language Processing
2024 年 E107.D 巻 4 号 p. 584-587
発行日: 2024/04/01
公開日: 2024/04/01

DOIhttps://doi.org/10.1587/transinf.2023EDL8059

ジャーナルフリー

抄録を表示する抄録を非表示にする

Compared to subword based Neural Machine Translation (NMT), character based NMT eschews linguistic-motivated segmentation which performs directly on the raw character sequence, following a more absolute end-to-end manner. This property is more fascinating for machine translation (MT) between Japanese and Chinese, both of which use consecutive logographic characters without explicit word boundaries. However, there is still one disadvantage which should be addressed, that is, character is a less meaning-bearing unit than the subword, which requires the character models to be capable of sense discrimination. Specifically, there are two types of sense ambiguities existing in the source and target language, separately. With the former, it has been partially solved by the deep encoder and several existing works. But with the later, interestingly, the ambiguity in the target side is rarely discussed. To address this problem, we propose two simple yet effective methods, including a non-parametric pre-clustering for sense induction and a joint model to perform sense discrimination and NMT training simultaneously. Extensive experiments on Japanese↔Chinese MT show that our proposed methods consistently outperform the strong baselines, and verify the effectiveness of using sense-discriminated representation for character based NMT.

抄録全体を表示

PDF形式でダウンロード (737K)

J-STAGEへの登録はこちら（無料）