-
Yasushi YUMINAKA
2024 Volume E107.D Issue 8 Pages
912
Published: August 01, 2024
Released on J-STAGE: August 01, 2024
JOURNAL
FREE ACCESS
-
Tsutomu SASAO
Article type: PAPER
2024 Volume E107.D Issue 8 Pages
913-921
Published: August 01, 2024
Released on J-STAGE: August 01, 2024
JOURNAL
FREE ACCESS
This paper shows that sum-of-product expression (SOP) minimization produces the generalization ability. We show this in three steps. First, various classes of SOPs are generated. Second, minterms of SOP are randomly selected to generate partially defined functions. And, third, from the partially defined functions, original functions are reconstructed by SOP minimization. We consider Achilles heel functions, majority functions, monotone increasing cascade functions, functions generated from random SOPs, monotone increasing random SOPs, circle functions, and globe functions. As for the generalization ability, the presented method is compared with Naive Bayes, multi-level perceptron, support vector machine, JRIP, J48, and random forest. For these functions, in many cases, only 10% of the input combinations are sufficient to reconstruct more than 90% of the truth tables of the original functions.
View full abstract
-
Shinobu NAGAYAMA, Tsutomu SASAO, Jon T. BUTLER
Article type: PAPER
2024 Volume E107.D Issue 8 Pages
922-929
Published: August 01, 2024
Released on J-STAGE: August 01, 2024
JOURNAL
FREE ACCESS
This paper proposes a decomposition method for symmetric multiple-valued functions. It decomposes a given symmetric multiple-valued function into three parts. By using suitable decision diagrams for the three parts, we can represent symmetric multiple-valued functions compactly. By deriving theorems on sizes of the decision diagrams, this paper shows that space complexity of the proposed representation is low. This paper also presents algorithms to construct the decision diagrams for symmetric multiple-valued functions with low time complexity. Experimental results show that the proposed method represents randomly generated symmetric multiple-valued functions more compactly than the conventional representation method using standard multiple-valued decision diagrams. Symmetric multiple-valued functions are a basic class of functions, and thus, their compact representation benefits many applications where they appear.
View full abstract
-
Martin LUKAC, Saadat NURSULTAN, Georgiy KRYLOV, Oliver KESZOCZE, Abilm ...
Article type: PAPER
2024 Volume E107.D Issue 8 Pages
930-939
Published: August 01, 2024
Released on J-STAGE: August 01, 2024
JOURNAL
FREE ACCESS
With the advent of gated quantum computers and the regular structures for qubit layout, methods for placement, routing, noise estimation, and logic to hardware mapping become imminently required. In this paper, we propose a method for quantum circuit layout that is intended to solve such problems when mapping a quantum circuit to a gated quantum computer. The proposed methodology starts by building a Circuit Interaction Graph (CIG) that represents the ideal hardware layout minimizing the distance and path length between the individual qubits. The CIG is also used to introduce a qubit noise model. Once constructed, the CIG is iteratively reduced to a given architecture (qubit coupling model) specifying the neighborhood, qubits, priority, and qubits noise. The introduced constraints allow us to additionally reduce the graph according to preferred weights of desired properties. We propose two different methods of reducing the CIG: iterative reduction or the iterative isomorphism search algorithm. The proposed method is verified and tested on a set of standard benchmarks with results showing improvement on certain functions while in average improving the cost of the implementation over the current state of the art methods.
View full abstract
-
Takashi HIRAYAMA, Rin SUZUKI, Katsuhisa YAMANAKA, Yasuaki NISHITANI
Article type: PAPER
2024 Volume E107.D Issue 8 Pages
940-948
Published: August 01, 2024
Released on J-STAGE: August 01, 2024
JOURNAL
FREE ACCESS
We present a time-efficient lower bound κ on the number of gates in Toffoli-based reversible circuits that represent a given reversible logic function. For the characteristic vector s of a reversible logic function, κ(s) closely approximates σ-lb(s), which is known as a relatively efficient lower bound in respect of evaluation time and tightness. The primary contribution of this paper is that κ enables fast computation while maintaining a tightness of the lower bound, approximately equal to σ-lb. We prove that the discrepancy between κ(s) and σ-lb(s) is at most one only, by providing upper and lower bounds on σ-lb in terms of κ. Subsequently, we show that κ can be calculated more efficiently than σ-lb. An algorithm for κ(s) with a complexity of O(n) is presented, where n is the dimension of s. Experimental results comparing κ and σ-lb are also given. The results demonstrate that the two lower bounds are equal for most reversible functions, and that the calculation of κ is significantly faster than σ-lb by several orders of magnitude.
View full abstract
-
Taisei SAITO, Kota ANDO, Tetsuya ASAI
Article type: PAPER
2024 Volume E107.D Issue 8 Pages
949-957
Published: August 01, 2024
Released on J-STAGE: August 01, 2024
JOURNAL
FREE ACCESS
Neural networks (NNs) fail to perform well or make excessive predictions when predicting out-of-distribution or unseen datasets. In contrast, Bayesian neural networks (BNNs) can quantify the uncertainty of their inference to solve this problem. Nevertheless, BNNs have not been widely adopted owing to their increased memory and computational cost. In this study, we propose a novel approach to extend binary neural networks by introducing a probabilistic interpretation of binary weights, effectively converting them into BNNs. The proposed approach can reduce the number of weights by half compared to the conventional method. A comprehensive comparative analysis with established methods like Monte Carlo dropout and Bayes by backprop was performed to assess the performance and capabilities of our proposed technique in terms of accuracy and capturing uncertainty. Through this analysis, we aim to provide insights into the advantages of this Bayesian extension.
View full abstract
-
Ken ASANO, Masanori NATSUI, Takahiro HANYU
Article type: PAPER
2024 Volume E107.D Issue 8 Pages
958-965
Published: August 01, 2024
Released on J-STAGE: August 01, 2024
JOURNAL
FREE ACCESS
The development of energy-efficient neural network hardware using magnetic tunnel junction (MTJ) devices has been widely investigated. One of the issues in the use of MTJ devices is large write energy. Since MTJ devices show stochastic behaviors, a large write current with enough time length is required to guarantee the certainty of the information held in MTJ devices. This paper demonstrates that quantized neural networks (QNNs) exhibit high tolerance to bit errors in weights and an output feature map. Since probabilistic switching errors in MTJ devices do not have always a serious effect on the performance of QNNs, large write energy is not required for reliable switching operations of MTJ devices. Based on the evaluation results, we achieve about 80% write-energy reduction on buffer memory compared to the conventional method. In addition, it is demonstrated that binary representation exhibits higher bit-error tolerance than the other data representations in the range of large error rates.
View full abstract
-
Takao WAHO, Akihisa KOYAMA, Hitoshi HAYASHI
Article type: PAPER
2024 Volume E107.D Issue 8 Pages
966-975
Published: August 01, 2024
Released on J-STAGE: August 01, 2024
JOURNAL
FREE ACCESS
Signal processing using delta-sigma modulated bit streams is reviewed, along with related topics in stochastic computing (SC). The basic signal processing circuits, adders and multipliers, are covered. In particular, the possibility of preserving the noise-shaping properties inherent in delta-sigma modulation during these operations is discussed. Finally, the root mean square error for addition and multiplication is evaluated, and the performance improvement of signal processing in the delta-sigma domain compared with SC is verified.
View full abstract
-
Yosuke IIJIMA, Atsunori OKADA, Yasushi YUMINAKA
Article type: PAPER
2024 Volume E107.D Issue 8 Pages
976-984
Published: August 01, 2024
Released on J-STAGE: August 01, 2024
JOURNAL
FREE ACCESS
In high-speed data communication systems, it is important to evaluate the quality of the transmitted signal at the receiver. At a high-speed data rate, the transmission line characteristics act as a high-frequency attenuator and contribute to the intersymbol interference (ISI) at the receiver. To evaluate ISI conditions, eye diagrams are widely used to analyze signal quality and visualize the ISI effect as an eye-opening rate. Various types of on-chip eye-opening monitors (EOM) have been proposed to adjust waveform-shaping circuits. However, the eye diagram evaluation of multi-valued signaling becomes more difficult than that of binary transmission because of the complicated signal transition patterns. Moreover, in severe ISI situations where the eye is completely closed, eye diagram evaluation does not work well. This paper presents a novel evaluation method using Two-dimensional(2D) symbol mapping and a linear mixture model (LMM) for multi-valued data transmission. In our proposed method, ISI evaluation can be realized by 2D symbol mapping, and an efficient quantitative analysis can be realized using the LMM. An experimental demonstration of four leveled pulse amplitude modulation(PAM-4) data transmission using a Cat5e cable 100m is presented. The experimental results show that the proposed method can extract features of the ISI effect even though the eye is completely closed in the server condition.
View full abstract
-
Yasushi YUMINAKA, Kazuharu NAKAJIMA, Yosuke IIJIMA
Article type: PAPER
2024 Volume E107.D Issue 8 Pages
985-991
Published: August 01, 2024
Released on J-STAGE: August 01, 2024
JOURNAL
FREE ACCESS
This study investigates a two/three-dimensional (2D/3D) symbol-mapping technique that evaluates data transmission quality based on a four-level pulse-amplitude modulation (PAM-4) symbol transition. Multi-dimensional symbol transition mapping facilitates the visualization of the degree of interference (ISI). The simulation and experimental results demonstrated that the 2D symbol mapping can evaluate the PAM-4 data transmission quality degraded by ISI and visualize the equalization effect. Furthermore, potential applications of 2D mapping and its extension to 3D mapping were explored.
View full abstract
-
Hua HUANG, Yiwen SHAN, Chuan LI, Zhi WANG
Article type: PAPER
Subject area: Fundamentals of Information Systems
2024 Volume E107.D Issue 8 Pages
992-1006
Published: August 01, 2024
Released on J-STAGE: August 01, 2024
JOURNAL
FREE ACCESS
Image denoising is an indispensable process of manifold high level tasks in image processing and computer vision. However, the traditional low-rank minimization-based methods suffer from a biased problem since only the noisy observation is used to estimate the underlying clean matrix. To overcome this issue, a new low-rank minimization-based method, called nuclear norm minus Frobenius norm rank residual minimization (NFRRM), is proposed for image denoising. The propose method transforms the ill-posed image denoising problem to rank residual minimization problems through excavating the nonlocal self-similarity prior. The proposed NFRRM model can perform an accurate estimation to the underlying clean matrix through treating each rank residual component flexibly. More importantly, the global optimum of the proposed NFRRM model can be obtained in closed-form. Extensive experiments demonstrate that the proposed NFRRM method outperforms many state-of-the-art image denoising methods.
View full abstract
-
Shiyu YANG, Tetsuya KANDA, Daniel M. GERMAN, Yoshiki HIGO
Article type: PAPER
Subject area: Software Engineering
2024 Volume E107.D Issue 8 Pages
1007-1015
Published: August 01, 2024
Released on J-STAGE: August 01, 2024
JOURNAL
FREE ACCESS
Stack Overflow, a leading Q&A platform for developers, is a substantial reservoir of Python code snippets. Nevertheless, the incompatibility issues between Python versions, particularly Python 2 and Python 3, introduce substantial challenges that can potentially jeopardize the utility of these code snippets. This empirical study dives deep into the challenges of Python version inconsistencies on the interpretation and application of Python code snippets on Stack Overflow. Our empirical study exposes the prevalence of Python version compatibility issues on Stack Overflow. It further emphasizes an apparent deficiency in version-specific identification, a critical element that facilitates the identification and utilization of Python code snippets. These challenges, primarily arising from the lack of backward compatibility between Python's major versions, pose significant hurdles for developers relying on Stack Overflow for code references and learning. This study, therefore, signifies the importance of proactively addressing these compatibility issues in Python code snippets. It advocates for enhanced tools and strategies to assist developers in efficiently navigating through the Python version complexities on platforms like Stack Overflow. By highlighting these concerns and providing a potential remedy, we aim to contribute to a more efficient and effective programming experience on Stack Overflow and similar platforms.
View full abstract
-
Gao WANG, Gaoli WANG, Siwei SUN
Article type: PAPER
Subject area: Information Network
2024 Volume E107.D Issue 8 Pages
1016-1028
Published: August 01, 2024
Released on J-STAGE: August 01, 2024
JOURNAL
FREE ACCESS
At Crypto 2019, Gohr first adopted the neural distinguisher for differential cryptanalysis, and since then, this work received increasing attention. However, most of the existing work focuses on improving and applying the neural distinguisher, the studies delving into the intrinsic principles of neural distinguishers are finite. At Eurocrypt 2021, Benamira et al. conducted a study on Gohr's neural distinguisher. But for the neural distinguishers proposed later, such as the r-round neural distinguishers trained with k ciphertext pairs or ciphertext differences, denoted as $ND^{cp}_{k\_{r}}$ (Gohr's neural distinguisher is the special $ND^{cp}_{k\_{r}}$ with k=1) and $ND^{cd}_{k\_{r}}$ , such research is lacking. In this work, we devote ourselves to study the intrinsic principles and relationship between $ND^{cd}_{k\_{r}}$ and $ND^{cp}_{k\_{r}}$. Firstly, we explore the working principle of $ND^{cd}_{1\_{r}}$ through a series of experiments and find that it strongly relies on the probability distribution of ciphertext differences. Its operational mechanism bears a strong resemblance to that of $ND^{cp}_{1\_{r}}$ given by Benamira et al.. Therefore, we further compare them from the perspective of differential cryptanalysis and sample features, demonstrating the superior performance of $ND^{cp}_{1\_{r}}$ can be attributed to the relationships between certain ciphertext bits, especially the significant bits. We then extend our investigation to $ND^{cp}_{k\_{r}}$, and show that its ability to recognize samples heavily relies on the average differential probability of k ciphertext pairs and some relationships in the ciphertext itself, but the reliance between k ciphertext pairs is very weak. Finally, in light of the findings of our research, we introduce a strategy to enhance the accuracy of the neural distinguisher by using a fixed difference to generate the negative samples instead of the random one. Through the implementation of this approach, we manage to improve the accuracy of the neural distinguishers by approximately 2% to 8% for 7-round Speck32/64 and 9-round Simon32/64.
View full abstract
-
Zhewei XU, Mizuho IWAIHARA
Article type: PAPER
Subject area: Artificial Intelligence, Data Mining
2024 Volume E107.D Issue 8 Pages
1029-1039
Published: August 01, 2024
Released on J-STAGE: August 01, 2024
JOURNAL
FREE ACCESS
Data sparsity has always been a problem in document classification, for which semi-supervised learning and few-shot learning are studied. An even more extreme scenario is to classify documents without any annotated data, but using only category names. In this paper, we introduce a nearest neighbor search-based method Con2Class to tackle this tough task. We intend to produce embeddings for predefined categories and predict category embeddings for all the unlabeled documents in a unified embedding space, such that categories can be easily assigned by searching the nearest predefined category in the embedding space. To achieve this, we propose confidence-driven contrastive learning, in which prompt-based templates are designed and MLM-maintained contrastive loss is newly proposed to finetune a pretrained language model for embedding production. To deal with the issue that no annotated data is available to validate the classification model, we introduce confidence factor to estimate the classification ability by evaluating the prediction confidence. The language model having the highest confidence factor is used to produce embeddings for similarity evaluation. Pseudo labels are then assigned by searching the semantically closest category name, which are further used to train a separate classifier following a progressive self-training strategy for final prediction. Our experiments on five representative datasets demonstrate the superiority of our proposed method over the existing approaches.
View full abstract
-
Xianglong LI, Yuan LI, Jieyuan ZHANG, Xinhai XU, Donghong LIU
Article type: PAPER
Subject area: Artificial Intelligence, Data Mining
2024 Volume E107.D Issue 8 Pages
1040-1049
Published: August 01, 2024
Released on J-STAGE: August 01, 2024
JOURNAL
FREE ACCESS
In many real-world problems, a complex task is typically composed of a set of subtasks that follow a certain execution order. Traditional multi-agent reinforcement learning methods perform poorly in such multi-task cases, as they consider the whole problem as one task. For such multi-agent multi-task problems, heterogeneous relationships i.e., subtask-subtask, agent-agent, and subtask-agent, are important characters which should be explored to facilitate the learning performance. This paper proposes a dynamic heterogeneous graph based agent allocation-action learning framework. Specifically, a dynamic heterogeneous graph model is firstly designed to characterize the variation of heterogeneous relationships with the time going on. Then a multi-subgraph partition method is invented to extract features of heterogeneous graphs. Leveraging the extracted features, a hierarchical framework is designed to learn the dynamic allocation of agents among subtasks, as well as cooperative behaviors. Experimental results demonstrate that our framework outperforms recent representative methods on two challenging tasks, i.e., SAVETHECITY and Google Research Football full game.
View full abstract
-
Hyebong CHOI, Joel SHIN, Jeongho KIM, Samuel YOON, Hyeonmin PARK, Hyej ...
Article type: PAPER
Subject area: Artificial Intelligence, Data Mining
2024 Volume E107.D Issue 8 Pages
1050-1058
Published: August 01, 2024
Released on J-STAGE: August 01, 2024
JOURNAL
FREE ACCESS
The design of automobile lamps requires accurate estimation of heat distribution to prevent overheating and deformation of the product. Traditional heat resistant analysis using Computational Fluid Dynamics (CFD) is time-consuming and requires expertise in thermofluid mechanics, making real-time temperature analysis less accessible to lamp designers. We propose a machine learning-based temperature prediction system for automobile lamp design. We trained our machine learning models using CFD results of various lamp designs, providing lamp designers real-time Heat-Resistant Analysis. Comprehensive tests on real lamp products demonstrate that our prediction model accurately estimates heat distribution comparable to CFD analysis within a minute. Our system visualizes the estimated heat distribution of car lamp design supporting quick decision-making by lamp designer. It is expected to shorten the product design process, improving the market competitiveness.
View full abstract
-
Haoran LUO, Tengfei SHAO, Shenglei LI, Reiko HISHIYAMA
Article type: PAPER
Subject area: Image Recognition, Computer Vision
2024 Volume E107.D Issue 8 Pages
1059-1069
Published: August 01, 2024
Released on J-STAGE: August 01, 2024
JOURNAL
FREE ACCESS
Makeup transfer is the process of applying the makeup style from one picture (reference) to another (source), allowing for the modification of characters' makeup styles. To meet the diverse makeup needs of individuals or samples, the makeup transfer framework should accurately handle various makeup degrees, ranging from subtle to bold, and exhibit intelligence in adapting to the source makeup. This paper introduces a “3-level” adaptive makeup transfer framework, addressing facial makeup through two sub-tasks: 1. Makeup adaptation, utilizing feature descriptors and eyelid curve algorithms to classify 135 organ-level face shapes; 2. Makeup transfer, achieved by learning the reference picture from three branches (color, highlight, pattern) and applying it to the source picture. The proposed framework, termed “Face Shape Adaptive Makeup Transfer” (FSAMT), demonstrates superior results in makeup transfer output quality, as confirmed by experimental results.
View full abstract
-
Shuto HASEGAWA, Koichiro ENOMOTO, Taeko MIZUTANI, Yuri OKANO, Takenori ...
Article type: PAPER
Subject area: Biological Engineering
2024 Volume E107.D Issue 8 Pages
1070-1078
Published: August 01, 2024
Released on J-STAGE: August 01, 2024
JOURNAL
FREE ACCESS
Melanin, which is responsible for the appearance of spots and freckles, is an important indicator in evaluating skin condition. To assess the efficacy of cosmetics, skin condition scoring is performed by analyzing the distribution and amount of melanin from microscopic images of the stratum corneum cells. However, the current practice of diagnosing skin condition using stratum corneum cells images relies heavily on visual evaluation by experts. The goal of this study is to develop a quantitative evaluation system for skin condition based on melanin within unstained stratum corneum cells images. The proposed system utilizes principal component regression to perform five-level scoring, which is then compared with visual evaluation scores to assess the system's usefulness. Additionally, we evaluated the impact of indicators related to melanin obtained from images on the scores, and verified which indicators are effective for evaluation. In conclusion, we confirmed that scoring is possible with an accuracy of more than 60% on a combination of several indicators, which is comparable to the accuracy of visual assessment.
View full abstract
-
Tomoyasu NAKANO, Masataka GOTO
Article type: PAPER
Subject area: Music Information Processing
2024 Volume E107.D Issue 8 Pages
1079-1088
Published: August 01, 2024
Released on J-STAGE: August 01, 2024
JOURNAL
FREE ACCESS
This paper presents MDX-Mixer, which improves music demixing (MDX) performance by leveraging source signals separated by multiple existing MDX models. Deep-learning-based MDX models have improved their separation performances year by year for four kinds of sound sources: “vocals,” “drums,” “bass,” and “other”. Our research question is whether mixing (i.e., weighted sum) the signals separated by state-of-the-art MDX models can obtain either the best of everything or higher separation performance. Previously, in singing voice separation and MDX, there have been studies in which separated signals of the same sound source are mixed with each other using time-invariant or time-varying positive mixing weights. In contrast to those, this study is novel in that it allows for negative weights as well and performs time-varying mixing using all of the separated source signals and the music acoustic signal before separation. The time-varying weights are estimated by modeling the music acoustic signals and their separated signals by dividing them into short segments. In this paper we propose two new systems: one that estimates time-invariant weights using 1×1 convolution, and one that estimates time-varying weights by applying the MLP-Mixer layer proposed in the computer vision field to each segment. The latter model is called MDX-Mixer. Their performances were evaluated based on the source-to-distortion ratio (SDR) using the well-known MUSDB18-HQ dataset. The results show that the MDX-Mixer achieved higher SDR than the separated signals given by three state-of-the-art MDX models.
View full abstract
-
Jiyeon LEE
Article type: LETTER
Subject area: Human-computer Interaction
2024 Volume E107.D Issue 8 Pages
1089-1092
Published: August 01, 2024
Released on J-STAGE: August 01, 2024
JOURNAL
FREE ACCESS
With the rapid advancement of graphics processing units (GPUs), Virtual Reality (VR) experiences have significantly improved, enhancing immersion and realism. However, these advancements also raise security concerns in VR. In this paper, I introduce a new attack leveraging known WebVR vulnerabilities to track the activities of VR users. The proposed attack leverages the user's hand motion information exposed to web attackers, demonstrating the capability to identify consumed content, such as 3D images and videos, and pilfer private drawings created in a 3D drawing app. To achieve this, I employed a machine learning approach to process controller sensor data and devised techniques to extract sensitive activities during the use of target apps. The experimental results demonstrate that the viewed content in the targeted content viewer can be identified with 90% accuracy. Furthermore, I successfully obtained drawing outlines that precisely match the user's original drawings without performance degradation, validating the effectiveness of the attack.
View full abstract
-
Ji XI, Yue XIE, Pengxu JIANG, Wei JIANG
Article type: LETTER
Subject area: Speech and Hearing
2024 Volume E107.D Issue 8 Pages
1093-1096
Published: August 01, 2024
Released on J-STAGE: August 01, 2024
JOURNAL
FREE ACCESS
Currently, a significant portion of acoustic scene categorization (ASC) research is centered around utilizing Convolutional Neural Network (CNN) models. This preference is primarily due to CNN's ability to effectively extract time-frequency information from audio recordings of scenes by employing spectrum data as input. The expression of many dimensions can be achieved by utilizing 2D spectrum characteristics. Nevertheless, the diverse interpretations of the same object's existence in different positions on the spectrum map can be attributed to the discrepancies between spectrum properties and picture qualities. The lack of distinction between different aspects of input information in ASC-based CNN networks may result in a decline in system performance. Considering this, a feature pyramid segmentation (FPS) approach based on CNN is proposed. The proposed approach involves utilizing spectrum features as the input for the model. These features are split based on a preset scale, and each segment-level feature is then fed into the CNN network for learning. The SoftMax classifier will receive the output of all feature scales, and these high-level features will be fused and fed to it to categorize different scenarios. The experiment provides evidence to support the efficacy of the FPS strategy and its potential to enhance the performance of the ASC system.
View full abstract
-
Hongliang FU, Qianqian LI, Huawei TAO, Chunhua ZHU, Yue XIE, Ruxue GUO
Article type: LETTER
Subject area: Speech and Hearing
2024 Volume E107.D Issue 8 Pages
1097-1100
Published: August 01, 2024
Released on J-STAGE: August 01, 2024
JOURNAL
FREE ACCESS
Speech emotion recognition (SER) is a key research technology to realize the third generation of artificial intelligence, which is widely used in human-computer interaction, emotion diagnosis, interpersonal communication and other fields. However, the aliasing of language and semantic information in speech tends to distort the alignment of emotion features, which affects the performance of cross-corpus SER system. This paper proposes a cross-corpus SER model based on causal emotion information representation (CEIR). The model uses the reconstruction loss of the deep autoencoder network and the source domain label information to realize the preliminary separation of causal features. Then, the causal correlation matrix is constructed, and the local maximum mean difference (LMMD) feature alignment technology is combined to make the causal features of different dimensions jointly distributed independent. Finally, the supervised fine-tuning of labeled data is used to achieve effective extraction of causal emotion information. The experimental results show that the average unweighted average recall (UAR) of the proposed algorithm is increased by 3.4% to 7.01% compared with the latest partial algorithms in the field.
View full abstract
-
Zhi LIU, Heng WANG, Yuan LI, Hongyun LU, Hongyuan JING, Mengmeng ZHANG
Article type: LETTER
Subject area: Image Processing and Video Processing
2024 Volume E107.D Issue 8 Pages
1101-1104
Published: August 01, 2024
Released on J-STAGE: August 01, 2024
JOURNAL
FREE ACCESS
In video-based point cloud compression (V-PCC), the partitioning of the Coding Unit (CU) has ultra-high computational complexity. Just Noticeable Difference Model (JND) is an effective metric to guide this process. However, in this paper, it is found that the performance of traditional JND model is degraded in V-PCC. For the attribute video, due to the pixel-filling operation, the capability of brightness perception is reduced for the JND model. For the geometric video, due to the depth filling operation, the capability of depth perception is degraded in the boundary area for depth based JND models (JNDD). In this paper, a joint JND model (J_JND) is proposed for the attribute video to improve the brightness perception capacity, and an occupancy map guided JNDD model (O_JNDD) is proposed for the geometric video to improve the depth difference estimation accuracy of the boundaries. Based on the two improved JND models, a fast V-PCC Coding Unit (CU) partitioning algorithm is proposed with adaptive CU depth prediction. The experimental results show that the proposed algorithm eliminates 27.46% of total coding time at the cost of only 0.36% and 0.75% Bjontegaard Delta rate increment under the geometry Point-to-Point (D1) error and attribute Luma Peak-signal-Noise-Ratio (PSNR), respectively.
View full abstract
-
Chang SUN, Yitong LIU, Hongwen YANG
Article type: LETTER
Subject area: Biological Engineering
2024 Volume E107.D Issue 8 Pages
1105-1109
Published: August 01, 2024
Released on J-STAGE: August 01, 2024
JOURNAL
FREE ACCESS
Sparse-view CT reconstruction has gained significant attention due to the growing concerns about radiation safety. Although recent deep learning-based image domain reconstruction methods have achieved encouraging performance over iterative methods, effectively capturing intricate details and organ structures while suppressing noise remains challenging. This study presents a novel dual-stream encoder-decoder-based reconstruction network that combines global path reconstruction from the entire image with local path reconstruction from image patches. These two branches interact through an attention module, which enhances visual quality and preserves image details by learning correlations between image features and patch features. Visual and numerical results show that the proposed method has superior reconstruction capabilities to state-of-the-art 180-, 90-, and 45-view CT reconstruction methods.
View full abstract