Journal of Advanced Computational Intelligence and Intelligent Informatics
Online ISSN : 1883-8014
Print ISSN : 1343-0130
ISSN-L : 1883-8014
28 巻, 1 号
選択された号の論文の24件中1~24を表示しています
Regular Papers
  • Marvin Jade Genoguin, Ronnie S. Concepcion II, Andres Philip Mayol, Ar ...
    原稿種別: Research Paper
    2024 年 28 巻 1 号 p. 5-11
    発行日: 2024/01/20
    公開日: 2024/01/20
    ジャーナル オープンアクセス

    Extreme weather conditions such as heavy rainfalls have been wreaking havoc not only in urban areas but also in an entire watershed. The development of a flood management plan and flood mitigating structures to alleviate the impacts of flooding is very crucial because it needs intensive and continuous historical data. However, missing data due to equipment failure that gathers the rainfall data could be a problem. Rainfall data is not only useful in designing flood mitigating structures but also in planning our day-to-day activities ahead of time. To address this problem, this paper proposes a predictive model which able to forecast in a short lead-time and predict missing data within the dataset. In this paper, three predictive models will be compared namely recurrent neural network, Gaussian processing regression, and the proposed 6-gene genetic expression-based predictive modeling (MGGP). 29-year 24-hour cumulative rainfall data which were sourced in PAGASA Tacloban city weather station, Philippines, was used. The data were cleaned by removing negative values. Two datasets were created, the first (RFDS1) dataset which makes use of three indices (year, month, and days), and the second (RFDS2) dataset which was orchestrated and transformed to increase correlation and reduce prediction errors which had an additional two datasets (ave(t-1,t-2),t-1). Each method used three and five time-based indices. The result shows an erratic behavior of the model from three methods that used the RFDS1, while RFDS2 had a more stable predictive model. This shows that the data orchestration and transformation greatly improved the correlation and reduced errors. However, MGGP showed the best results among the three methods.

  • Arvin H. Fernando, Laurence A. Gan Lim, Argel A. Bandala, Ryan Rhay P. ...
    原稿種別: Research Paper
    2024 年 28 巻 1 号 p. 12-20
    発行日: 2024/01/20
    公開日: 2024/01/20
    ジャーナル オープンアクセス

    Constant demand for sustainable mechanisms to perform heavy and risky tasks has driven more robotics innovations to arise. Modular reconfigurable robotics system is one of these promising technologies that are continuously explored. Homogeneous types, to be specific, can accomplish similar missions at the same time as individual and heavier missions as an integrated system. This paper presents an analysis of the carrying capacity of a six-wheeled modular multi-agent system using the symbiotic model. The objective is to determine the resulting symbiotic relationship of a given configuration and module state combinations. The results show that the dominant relationship among the trials for linear traversal mission is commensalism. That means, the system neither benefits nor gets harmed from the symbiosis formed. This is true both in simulated and actual test environments although the percentage difference is about 12%. MATLAB Simulink was used for simulation while Maqueen robot in a 3D-printed chassis was used for actual testing. With this study, future configurations for several other missions such as object tracking and ramp climbing can be assessed using the same approach so that possible fault occurrence during operations can be prevented since the developed analysis method is performed prior to the deployment of the system.

  • Arvin H. Fernando, Marielet A. Guillermo, Ronnie S. Concepcion II, Lau ...
    原稿種別: Research Paper
    2024 年 28 巻 1 号 p. 21-28
    発行日: 2024/01/20
    公開日: 2024/01/20
    ジャーナル オープンアクセス

    Precise luffing of payload in mobile cranes is a crucial part of material handling and safety in industrial operations. Explosive ordnance disposal uses a bomb disposal robot to safely disable explosive devices at a safe distance. This robot is a mobile jib crane type with a gripper as an end effector instead of the typical suspension cable with a hook. The common challenge of this crane type is the arm/jib movement sensitivity with respect to tip-over stability of the crane body. This is directly influenced by the payload and constrained by the degree of freedom. This paper presents a control strategy of the joint for luffing such that the back wheels remain in contact with the surface ground and avoid bucking at any given instance of weight change in the payload. Fuzzy logic was applied to control the motor torque and luffing angle of the arm in response to the load and gripper opening size to maintain the tip-over stability margin at the highest value possible. The response curves at different configurations of the two input signals were also analyzed based on the rules set to determine its precision with respect to the expected response curve.

  • Niraj Pahari, Kazutaka Shimada
    原稿種別: Research Paper
    2024 年 28 巻 1 号 p. 29-40
    発行日: 2024/01/20
    公開日: 2024/01/20
    ジャーナル オープンアクセス

    Multitask learning (MTL) and data augmentation are becoming increasingly popular in natural language processing (NLP). These techniques are particularly useful when data are scarce. In MTL, knowledge learned from one task is applied to another. To address data scarcity, data augmentation facilitates by providing additional synthetic data during model training. In NLP, the bidirectional encoder representations from transformers (BERT) model is the default candidate for various tasks. MTL and data augmentation using BERT have yielded promising results. However, a detailed study regarding the effect of using MTL in different layers of BERT and the benefit of data augmentation in these configurations has not been conducted. In this study, we investigate the use of MTL and data augmentation from generative models, specifically for category classification, sentiment classification, and aspect-opinion sequence-labeling using BERT. The layers of BERT are categorized into top, middle, and bottom layers, which are frozen, shared, or unshared. Experiments are conducted to identify the optimal layer configuration for improved performance compared with that of single-task learning. Generative models are used to generate augmented data, and experiments are performed to reveal their effectiveness. The results indicate the effectiveness of the MTL configuration compared with single-task learning as well as the effectiveness of data augmentation using generative models for classification tasks.

  • Hilario A. Calinao Jr., Reggie C. Gustilo, Elmer P. Dadios, Ronnie S. ...
    原稿種別: Research Paper
    2024 年 28 巻 1 号 p. 41-48
    発行日: 2024/01/20
    公開日: 2024/01/20
    ジャーナル オープンアクセス

    This study integrates fuzzy logic-controlled data switching and the radial basis function neural network (RBFNN) for fault detection and classification in grid-tied solar energy systems. The fuzzy logic controller filters out invalid sensor data through a data switch, ensuring that the fault detection and classification system receives reliable input. Training data were prepared through data normalization using the z-score function and principal component analysis, thereby reducing data complexity and standardizing the inputs. The resulting RBFNN model exhibited a low mean squared error with a value of 7.67×10-4, indicating its ability to classify faults based on the actual system scenarios. Performance evaluation metrics, including accuracy, precision, recall, and F1-score, were used to assess the effectiveness of the RBFNN model. The model demonstrated high accuracy (96.4%), precision (98.281%), recall (98.013%), and F1-score (98.147%), indicating the suitability and effectiveness of the RBFNN model to identify and classify faults in grid-tied solar energy systems.

  • Athena Rosz Ann R. Pascua, Dino Dominic F. Ligutan, Marielet A. Guille ...
    原稿種別: Research Paper
    2024 年 28 巻 1 号 p. 49-58
    発行日: 2024/01/20
    公開日: 2024/01/20
    ジャーナル オープンアクセス

    This paper aims to solve the nonlinearity in PID control of a force-sensitive resistor on a haptic device and gripper using a fuzzy logic controller. The proposed system will match the force exerted by the haptic device to those applied at the gripper, and will be simulated using simulation open framework architecture.

  • Jonnel D. Alejandrino, Ronnie S. Concepcion II, Argel A. Bandala, Edwi ...
    原稿種別: Research Paper
    2024 年 28 巻 1 号 p. 59-66
    発行日: 2024/01/20
    公開日: 2024/01/20
    ジャーナル オープンアクセス

    Plant root imaging is crucial for progress in various domains such as plant breeding and crop optimization. Traditionally, root tomography involves invasive methods that disrupt plant systems and yield non-reproducible results. As a result, non-invasive techniques, particularly electrical tomography, have gained significant attention. Despite the advantages, these techniques have limitations in terms of radiation efficiency and directivity due to suboptimal antenna design. This paper presents a comprehensive simulation on antenna design optimization focusing on dimensions, spacing, and integration of advanced algorithms. A micropatch transducer antenna was engineered for an existing in-silico plant root setup operating within a 3–5 MHz frequency range. The optimized dimensions of the antenna are 109.32 mm × 140.67 mm × 2.55 mm, and it resonates effectively within a frequency range of 3.1–5.68 MHz. Using scalar minimization techniques, patch transducers were interconnected into an antenna array with an optimized 3 mm spacing. Utilizing multi-objective optimization algorithm based on sperm fertilization procedure and shuffled frog leaping algorithm, optimal frequencies were obtained at 3,989,796.88 Hz and 3,989,951.83 Hz, respectively. Validated using CADFEKO software, the proposed antenna design demonstrated distinctive voltage distribution, superior directivity of 9.24 dBi, gain of 9.15 dBi, and 98.6% radiation efficiency when compared to the existing silicon-based root tomography antenna setups.

  • Enzhi Zhang, Mohamed Wahib, Rui Zhong, Masaharu Munetomo
    原稿種別: Research Paper
    2024 年 28 巻 1 号 p. 67-78
    発行日: 2024/01/20
    公開日: 2024/01/20
    ジャーナル オープンアクセス

    Deep model optimization methods discard the training weights which contain information about the validation loss landscape that can guide further model optimization. In this paper, we first show that a supervisor neural network can be used to predict the validation losses or accuracy of another deep model (student) through its discarded training weights. Then based on this behavior, we propose a weight-loss (accuracy) pair-based training framework called regularization by validation to help decrease overfitting and increase the generalization performance of the student model by predicting the validation losses. We conduct our experiments on the MNIST, CIFAR-10, and CIFAR-100 datasets with the multilayer perceptron and ResNet-56 to show that we can improve the generalization performance with the past training trajectories.

  • Takashi Sugiyama, Masayoshi Kanoh
    原稿種別: Research Paper
    2024 年 28 巻 1 号 p. 79-85
    発行日: 2024/01/20
    公開日: 2024/01/20
    ジャーナル オープンアクセス

    Expressing emotions is essential for ensuring smooth communication between people. In the context of human-robot symbiosis, robots are also required to express emotions. Although one method of robot emotion expression involves using LEDs or other forms of light to display colors, considering the possibility of expressing emotions through clothing colors is also necessary. In this study, we developed a simple robot called the “Tilting Robot,” which only performs simple tilting motions to investigate whether changes in the robot’s clothing color would affect the expressed emotions. In the experiment, participants were divided into two groups: motion and posture groups. The motion group was shown videos of the robot’s motion whereas the posture group was shown still images of the robot’s posture. The results showed that the red clothing in the posture group significantly expressed anger, whereas the blue clothing in the motion group significantly expressed sadness. The rating for blue clothing was 4.04 ± 1.30, which was near “undecided.” This suggests that blue clothing does not necessarily intensify the emotion of sadness, but other clothing colors may weaken its expression. The rating for red clothing was 2.86 ± 1.06, which was lower than “undecided.” This suggests that red clothing may not express anger, but could give an impression of vitality.

  • Shota Shimizu, Shun Sakayauchi, Hiroki Shibata, Yasufumi Takama
    原稿種別: Research Paper
    2024 年 28 巻 1 号 p. 86-93
    発行日: 2024/01/20
    公開日: 2024/01/20
    ジャーナル オープンアクセス

    This paper proposes a constrained K-means clustering method that dynamically generates subordinate clusters based on Bayesian information criterion (BIC). COP K-means, which considers a pairwise constraints in partition-based clustering, have difficulty in handling the case that a must-link is given to instances located far away from each other. To address this problem, the proposed method generates subordinate clusters that have a must-link to a master cluster during a clustering process. The final clustering result is obtained by merging the subordinate clusters. The proposed method determines whether to generate subordinate clusters or not based on the BIC. This paper also introduces an idea of mitigating the sensitivity to initial position of subordinate clusters. The effectiveness of the proposed methods is shown through the experiment with two synthetic datasets.

  • Natsuki Yamamura, Junichi Chikazoe, Takaaki Yoshimoto, Koji Jimura, No ...
    原稿種別: Research Paper
    2024 年 28 巻 1 号 p. 94-102
    発行日: 2024/01/20
    公開日: 2024/01/20
    ジャーナル オープンアクセス

    In this study, the roles of shape and color features in metaphor generation for abstract images were investigated through simulations using retrained convolutional neural network (CNN) models based on the pretrained CNN model, AlexNet. A computational experiment was conducted using five types of retrained object recognition models: an object recognition model using the cleaned ILSVRC-2012 training dataset, one to recognize more shape features using edge-detected images, one to recognize fewer shape features using blurred images, one to recognize fewer color features using grayscale images, and one to recognize only shape features using Canny edge-detected images. The metaphors generated for abstract images were collected from behavioral data obtained in a psychological experiment aimed at investigating the neural mechanisms of metaphor generation for abstract images. In the computational experiment, the simulation results of the five models for abstract images were compared to examine how well they predicted the objects used in the metaphors generated for abstract images in the psychological experiment. The edge-only model using Canny edge-detected images and the color-inhibited model using grayscale images exhibited better performance in metaphor recognition for abstract images than the control condition. This indicates that shape features play a more important role than color features in metaphor generation for abstract images. Furthermore, because the Canny edge detection technique extracts only object outlines that can be regarded as the caricaturization of objects, the caricatured images, based on the shape features of the abstract images, likely influence object recognition for metaphor generation.

  • Efosa Osagie, Wei Ji, Na Helian
    原稿種別: Review
    2024 年 28 巻 1 号 p. 103-110
    発行日: 2024/01/20
    公開日: 2024/01/20
    ジャーナル オープンアクセス

    In recent times, medical imaging has become a significant component of clinical diagnosis and examinations to detect and evaluate various medical conditions. The interpretation of these medical examinations and the patient’s demographics are usually textual data, which is burned in on the pixel content of medical imaging modalities (MIM). Example of these MIM includes ultrasound and X-ray imaging. As artificial intelligence advances for medical applications, there is a high demand for the accessibility of these burned-in textual data for various needs. This article aims to review the significance of burned-in textual data recognition in MIM and recent research regarding the machine learning approach, challenges, and open issues for further investigation on this application. The review describes the significant problems in this study area as low resolution and background interference of textual data. Finally, the review suggests applying more advanced deep learning ensemble algorithms as possible solutions.

  • Kaichi Nihira, Hiroki Shibata, Yasufumi Takama
    原稿種別: Research Paper
    2024 年 28 巻 1 号 p. 111-121
    発行日: 2024/01/20
    公開日: 2024/01/20
    ジャーナル オープンアクセス

    This paper proposes a personal values modeling method that does not require attribute ratings. The proposed method is applied to memory-based and model-based collaborative filtering (CF) to demonstrate its effectiveness. A recent trend in CF is to introduce additional factors than interaction history. A rate matching rate (RMRate) has been proposed for modeling user’s personal values, and it has been shown to be effective in increasing diversity and recommending niche (long-tail or unpopular) items. However, RMRate needs an attribute-level evaluations in addition to rating (total evaluation) to items, which limits its applicability. To obtain users’ personal values model only from a rating matrix, this paper defines users’ personal values as their tendency to select popular/unpopular items and reputable/unreputable items. Ten attributes are proposed to model user’s personal values, all of which can be calculated from a rating matrix without additional information. Experimental results on four datasets show that the proposed attributes have different characteristics from the RMRate, and can improve precision, recall, and normalized discounted cumulative gain of memory-based CF and factorization machines. It is also shown that the proposed modeling method is useful for mitigating a cold-start problem.

  • Sujata Saini, Hiroki Shibata, Yasufumi Takama
    原稿種別: Research Paper
    2024 年 28 巻 1 号 p. 122-128
    発行日: 2024/01/20
    公開日: 2024/01/20
    ジャーナル オープンアクセス

    This paper constructs a dataset of handwritten Indus signs employing a social approach. A writing system called the Indus script was created in the Indus civilization. It has been decoded numerous times throughout the years, but it has not yet been fully deciphered. Due to a lack of information and the scarcity of evidence, the mystery of the Indus signs has not yet been fully solved. Recently, there has been an increase in demand for huge datasets in order to use cutting-edge machine learning techniques. Considering the restricted availability of images of authentic Indus signs, this paper proposes creating an Indus signs dataset by asking participants to draw the Indus signs while referring to the image of the original Indus signs. A web application was developed and used to collect the 44 participants’ handwritten images of ten Indus signs. To show the availability of the constructed dataset, it is used to train convolutional neural networks. The experimental result demonstrates that the model can classify the images of original Indus script with 70% accuracy.

  • Kenshin Moriyoshi, Hiroki Shibata, Yasufumi Takama
    原稿種別: Research Paper
    2024 年 28 巻 1 号 p. 129-140
    発行日: 2024/01/20
    公開日: 2024/01/20
    ジャーナル オープンアクセス

    This paper proposes a method to generate a synthetic rating matrix based on user’s rational behavior, with the aim of generating a large-scale rating matrix at low cost. Collaborative filtering is one of the major techniques for recommender systems, which is widely used because it can recommend items using only a history of ratings given to the items by users. However, collaborative filtering has some problems such as the cold-start problem and the sparsity problem, both of which are caused by the shortage of ratings in a database (rating matrix). This problem is particularly serious for services that have just started operation or do not have a large number of users. The proposed method generates a rating matrix without missing values using users’ rating probabilities, which are obtained from the distribution of their actual ratings. The final synthetic rating matrix is generated after adjusting its sparsity by introducing missing values. The validity of the proposed method is evaluated by comparing the synthetic rating matrix in terms of the similarity of the distribution of several statistics with that of the real data. The synthetic rating matrix is also evaluated by applying it to recommendation to actual users. The experimental results show that the proposed method can generate the synthetic rating matrix that has similar statistics to the real data, and recommendation models trained with the synthetic data achieve comparable recall to that trained with the real data when using the real data as test data. Based on the results of these experiments, this paper also tries to generate the synthetic rating matrix that contains richer information than the real data by increasing the number of users or reducing the sparsity of the rating matrix. The results of these experiments show the possibility that increasing the information contained in a rating matrix could improve recall.

  • Chao Wang, Jiahan Dong, Guangxin Guo, Bowen Li, Tianyu Ren
    原稿種別: Research Paper
    2024 年 28 巻 1 号 p. 141-149
    発行日: 2024/01/20
    公開日: 2024/01/20
    ジャーナル オープンアクセス

    With the rapid development of Internet technology and its application, the existence of network vulnerabilities is very common. Attackers may use the defects of software, hardware, or system security policy in the network system to access or destroy the system without authorization. How to nip in the bud and carry out a safety risk assessment and early warning is an urgent problem to be solved. Based on the overall assessment of the risk factors in the whole network, the more dangerous nodes are found and priority measures are taken. The method proposed in this paper can reflect and predict the actions of attackers, repair, and adjust the previously predicted probability. It is compared with the method that evaluates the uncertainty in the network solely by calculating the static probability. The proposed new ideas and methods better reflect the real-time changes in the actual environment of the Internet, thereby better responding to the actual situation. This method can be well applied to threat detection, threat analysis, and risk assessment of monitoring system networks, enabling monitoring network managers to evaluate and protect the security of real-time power grids. It is of great significance to effectively defend against network attacks, ensure system security, and study the resistance of control systems under network attacks.

  • Ryusei Kasai, Kouki Nagamune
    原稿種別: Research Paper
    2024 年 28 巻 1 号 p. 150-158
    発行日: 2024/01/20
    公開日: 2024/01/20
    ジャーナル オープンアクセス

    In total knee arthroplasty (TKA), many surgical instruments are available. Many of these surgical instruments are similar in shape and size. For this reason, there have been accidents due to incorrect selection of implants. Furthermore, a shortage of nurses is expected worldwide. There will also be a shortage of scrub nurses, which will result in an increased burden on each scrub nurse. For these reasons, we have developed a surgical instrument detection system for TKA to reduce the burden on scrub nurses and the number of accidents, such as implant selection errors. This study also focuses on automating the acquisition of data for training. We also develop a method to reduce the additional training time when the number of detection targets increases. In this study, YOLOv5 is used as the object detection method. In experiments, we examine the accuracy of the training data automatically acquired and the accuracy of object detection for surgical instruments. In object detection, several training files are created and compared. The results show that the training data is sufficiently effective, and high accuracy is obtained in object detection. Object detection is performed in several cases, and one of the results shows an IoU of 0.865 and an F-measure of 0.930.

  • Yoshie Suzuki, Stephen Raharja, Toshiharu Sugawara
    原稿種別: Research Paper
    2024 年 28 巻 1 号 p. 159-168
    発行日: 2024/01/20
    公開日: 2024/01/20
    ジャーナル オープンアクセス

    This study proposes a method to automatically generate paths for multiple autonomous agents to collectively form a sequence of consecutive patterns. Several studies have considered minimizing the total travel distances of all agents for formation transitions in applications with multiple self-driving robots, such as unmanned aerial vehicle shows by drones or group actions in which self-propelled robots synchronously move together, consecutively transforming the patterns without collisions. However, few studies consider fairness in travel distance between agents, which can lead to battery exhaustion for certain agents and thereafter reduced operating time. Furthermore, because these group actions are usually performed with a large number of agents, they can have only small batteries to reduce cost and weight, but their performance time depends on the battery duration. The proposed method, which is based on ant colony optimization (ACO), considers the fairness in distances traveled by agents as well as the less total traveling distances, and can achieve long transitions in both three- and two-dimensional spaces. Our experiments demonstrate that the proposed method based on ACO allows agents to execute more formation patterns without collisions than the conventional method, which is also based on ACO.

  • Kenji Kato, Tatsuya Yoshimi, Daiki Shimotori, Keita Aimoto, Naoki Itoh ...
    原稿種別: Research Paper
    2024 年 28 巻 1 号 p. 169-178
    発行日: 2024/01/20
    公開日: 2024/01/20
    ジャーナル オープンアクセス

    Assistive robots and technologies can play a key role in supporting the independence and social participation of older people, helping them living healthy lives and reducing the burden on caregivers. To support the effective development of assistive robots and technologies, it is important to develop a “living laboratory” to verify and adapt technology in real-life living spaces. The purpose of this study is to validate assistive robots using a living laboratory that simulates typical indoor and outdoor real-life situations. The rationale is to enable evaluation of daily living activities of older people in a simulated living space. To minimize the risk of trauma after falls, a ceiling suspension system was installed in the living laboratory. Six different commercially available mobility and transfer support robots were introduced and tested. We demonstrated that effective scenarios could be implemented using these assistive robots within the living laboratory. We implemented a 3D markerless motion capturing system in the outdoor space and showed that outdoor activities, including walking up and down a ramp, could be verified with sufficient accuracy in three cases: (i) normal use without a robot, (ii) use of the ceiling suspension system, and (iii) use of a mobility support robot on three healthy subjects. These results suggest that the proposed living laboratory can support testing and verification of assistive robots in simulated living environments.

  • Qiao Kang, Jing Kan, Fangyan Dong, Kewei Chen
    原稿種別: Research Paper
    2024 年 28 巻 1 号 p. 179-185
    発行日: 2024/01/20
    公開日: 2024/01/20
    ジャーナル オープンアクセス

    Sentences are composed of words, phrases, and clauses. The relationship between them is usually tree-like. In the hierarchical structure of the sentence, the dependency relationships between different components affect the syntactic structure. Syntactic structure is very important for understanding the meaning of the whole sentence. However, the gated recursive unit (GRU) models cannot fully encode hierarchical syntactic dependencies, which leads to its poor performance in various natural language tasks. In this paper, a model called relative syntactic distance bidirectional gated recursive unit (RSD-BiGRU) is constructed to capture syntactic structure dependencies. The model modifies the gating mechanism in GRU through relative syntactic distance. It also offers a transformation gate to model the syntactic structure more directly. Embedding sentence meanings with sentence structure dependency into dense vectors. This model is used to conduct semantic similarity experiments on the QQP and SICK datasets. The results show that the sentence representation obtained by RSD-BiGRU model contains more semantic information. This is helpful for semantic similarity analysis tasks.

  • Kun Mao, Yanni Wang, Weiwei Ma, Jiangang Ye, Wen Zhou
    原稿種別: Research Paper
    2024 年 28 巻 1 号 p. 186-195
    発行日: 2024/01/20
    公開日: 2024/01/20
    ジャーナル オープンアクセス

    Evidential reasoning (ER) under uncertainty is essential for various applications such as classification, prediction, and clustering. The effective realization of ER is still an open issue. Reliability plays a decisive role in the final performance as a major parameter of ER, reflecting the evidence’s inner information. This paper proposed ER based on the information volume of the mass function (ER-IVMF), which considers both weight and reliability. Numerical examples were designed to illustrate the effectiveness of the ER-IVMF. Additionally, a sports scoring system experiment was conducted to validate the superiority of the ER-IVMF. Considering the reliability based on high-order evidence information, the output of the proposed method was more accurate than that of the other methods. The experimental results proved that the proposed method was practical for addressing sports-scoring problems.

  • Hao-Yan Zhang, Long-Bo Zhang, Qi-Feng Shi, Zhen-Tao Liu
    原稿種別: Research Paper
    2024 年 28 巻 1 号 p. 196-205
    発行日: 2024/01/20
    公開日: 2024/01/20
    ジャーナル オープンアクセス

    Initiative service is a key research direction for the new generation of service robots. It is important to automatically track humans for initiative service in human-robot interaction. To solve the problems of low precision and poor anti-interference capability of only using single-modal (audio or visual) information, a speaker positioning and tracking method based on an audio-visual bimodal combination is proposed. First, the azimuth of the speaker is obtained based on the time difference of arrival using a microphone array, and face detection based on AdaBoost is carried out using the camera. A distance and azimuth calculation model is established to obtain the position of the speaker. Second, a speaker positioning strategy based on an audio-visual bimodal combination is designed to handle different situations. Third, the path is planned by which the azimuth and distance between the robot and the speaker are maintained in a limited range. Different azimuths and distances for speaker tracking are set to perform various simulations. Finally, the mobile robot is driven to follow the path using the STM32 real-time control system. Information from the microphone array and the camera is collected and processed by Raspberry Pi. The tracking accuracy was tested under a single-face situation by setting 20 different target points, and 10 tests were carried out under each point. Under multi-face situations, the audio-visual bimodal information is combined to identify the speaker, and then the Kalman filter is used in face tracking. The experimental results demonstrate that the running trajectory of the mobile robot is close to the ideal trajectory, which ensures effective speaker tracking.

  • Jianqi Li, Jinfei Shen, Keheng Nie, Rui Du, Jiang Zhu, Hongyu Long
    原稿種別: Research Paper
    2024 年 28 巻 1 号 p. 206-215
    発行日: 2024/01/20
    公開日: 2024/01/20
    ジャーナル オープンアクセス

    To satisfy the demand for real-time and high-precision recognition of mechanical water meter readings in natural scenes, a reading recognition method for mechanical water meters based on you only look once version 4 (YOLOv4) is proposed in this paper. First, a focus structure is introduced into the feature extraction network to expand the receptive field and reduce the loss of original information. Second, a ghost block cross stage partial module is constructed to improve the feature fusion of the network and enhance the feature representation. Finally, the loss function of YOLOv4 is improved to further enhance the detection accuracy of the network. Experimental results show that the mAP@0.5 and mAP@0.5:.95 of the proposed method are 97.9% and 77.3%, respectively, which are 1.6% and 6.0% higher, respectively, than those of YOLOv4. Additionally, the number of parameters and computation amount of the proposed method are 48.6% and 36.8% lower, respectively, whereas its inference speed is 27% higher. The proposed method is applied to assist meter reading, which significantly reduces the workload of on-site meter-reading personnel and improves work efficiency. The datasets used are available at https://github.com/914284382/Mechanical-water-meter.

  • Shu-Hua Li, Feng-Long Yan, Ying-Qiu Li
    原稿種別: Research Paper
    2024 年 28 巻 1 号 p. 216-223
    発行日: 2024/01/20
    公開日: 2024/01/20
    ジャーナル オープンアクセス

    Deep learning is the major technique used to identify objects in images captured by the synthetic aperture radar (SAR). While SAR images can be used to identify ships in general, detecting multiple ships or small vessels in these images in complex contexts remains an outstanding challenge. This study proposes a model of detection based on the improved PP-YOLO deep convolutional neural network that can identify multiple ships as well as small vessels in complex scenarios from SAR images. The histogram equalization algorithm is first used to preprocess the SAR images, and then the initial anchor box is optimized by using the shape similarity distance-based K-means clustering algorithm. Following this, the accuracy of the training network is improved based on the feature pyramid network and an attention mechanism. The experimental results show that the average accuracy (average precision) of the model was 94.25% at 41.63 frames per second on the GF-3 and the Sentinel-1 SAR datasets, superior to those of YOLOv3 (Darknet), YOLOv7, FPN (VGG), SSD, Faster R-CNN, and PP-YOLO (RestNet50-vd). The model also satisfies the demands of real-time detection.

feedback
Top