IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
E96.D 巻, 11 号
選択された号の論文の26件中1~26を表示しています
Regular Section
  • Pooia LALBAKHSH, Bahram ZAERI, Ali LALBAKHSH
    原稿種別: PAPER
    専門分野: Fundamentals of Information Systems
    2013 年 E96.D 巻 11 号 p. 2309-2318
    発行日: 2013/11/01
    公開日: 2013/11/01
    ジャーナル フリー
    The paper introduces a novel pheromone update strategy to improve the functionality of ant colony optimization algorithms. This modification tries to extend the search area by an optimistic reinforcement strategy in which not only the most desirable sub-solution is reinforced in each step, but some of the other partial solutions with acceptable levels of optimality are also favored. therefore, it improves the desire for the other potential solutions to be selected by the following artificial ants towards a more exhaustive algorithm by increasing the overall exploration. The modifications can be adopted in all ant-based optimization algorithms; however, this paper focuses on two static problems of travelling salesman problem and classification rule mining. To work on these challenging problems we considered two ACO algorithms of ACS (Ant Colony System) and AntMiner 3.0 and modified their pheromone update strategy. As shown by simulation experiments, the novel pheromone update method can improve the behavior of both algorithms regarding almost all the performance evaluation metrics.
  • Jinwei WANG, Xirong MA, Yuanping ZHU, Jizhou SUN
    原稿種別: PAPER
    専門分野: Fundamentals of Information Systems
    2013 年 E96.D 巻 11 号 p. 2319-2326
    発行日: 2013/11/01
    公開日: 2013/11/01
    ジャーナル フリー
    Modern GPUs have evolved to become a more general processor capable of executing scientific and engineering computations. It provides a highly parallel computing environment due to its large number of computing cores, which are suitable for numerous data parallel arithmetic computations, particularly linear algebra operations. The matrix-vector multiplication is one of the most important dense linear algebraic operations. It is applied to a diverse set of applications in many fields and must therefore be fully optimized to achieve a high-performance. In this paper, we proposed a novel auto-tuning method for matrix-vector multiplication on GPUs, where the number of assigned threads that are used to compute one element of the result vector can be auto-tuned according to the size of matrix. On the Nvidia's GPU GTX 650 with the most recent Kepler architecture, we developed an auto-tuner that can automatically select the optimal number of assigned threads for calculation. Based on the auto-tuner's result, we developed a versatile generic matrix-vector multiplication kernel with the CUDA programming model. A series of experiments on different shapes and sizes of matrices were conducted for comparing the performance of our kernel with that of the kernels from CUBLAS 5.0, MAGMA 1.3 and a warp method. The experiments results show that the performance of our matrix-vector multiplication kernel is close to the optimal behavior with increasing of the size of the matrix and has very little dependency on the shape of the matrix, which is a significant improvement compared to the other three kernels that exhibit unstable performance behavior for different shapes of matrices.
  • Asahi TAKAOKA, Satoshi TAYU, Shuichi UENO
    原稿種別: PAPER
    専門分野: Fundamentals of Information Systems
    2013 年 E96.D 巻 11 号 p. 2327-2332
    発行日: 2013/11/01
    公開日: 2013/11/01
    ジャーナル フリー
    We consider the minimum feedback vertex set problem for some bipartite graphs and degree-constrained graphs. We show that the problem is linear time solvable for bipartite permutation graphs and NP-hard for grid intersection graphs. We also show that the problem is solvable in O(n2log 6n) time for n-vertex graphs with maximum degree at most three.
  • Yohei HORI, Toshihiro KATASHITA, Hirofumi SAKANE, Kenji TODA, Akashi S ...
    原稿種別: PAPER
    専門分野: Computer System
    2013 年 E96.D 巻 11 号 p. 2333-2343
    発行日: 2013/11/01
    公開日: 2013/11/01
    ジャーナル フリー
    Protecting the confidentiality and integrity of a configuration bitstream is essential for the dynamic partial reconfiguration (DPR) of field-programmable gate arrays (FPGAs). This is because erroneous or falsified bitstreams can cause fatal damage to FPGAs. In this paper, we present a high-speed and area-efficient bitstream protection scheme for DPR systems using the Advanced Encryption Standard with Galois/Counter Mode (AES-GCM), which is an authenticated encryption algorithm. Unlike many previous studies, our bitstream protection scheme also provides a mechanism for error recovery and tamper resistance against configuration block deletion, insertion, and disorder. The implementation and evaluation results show that our DPR scheme achieves a higher performance, in terms of speed and area, than previous methods.
  • Chuanyi LIU, Jie LIN, Binxing FANG
    原稿種別: PAPER
    専門分野: Computer System
    2013 年 E96.D 巻 11 号 p. 2344-2353
    発行日: 2013/11/01
    公開日: 2013/11/01
    ジャーナル フリー
    Cloud computing is broadly recognized as as the prevalent trend in IT. However, in cloud computing mode, customers lose the direct control of their data and applications hosted by the cloud providers, which leads to the trustworthiness issue of the cloud providers, hindering the widespread use of cloud computing. This paper proposes a trustworthiness verification and audit mechanism on cloud providers called T-YUN. It introduces a trusted third party to cyclically attest the remote clouds, which are instrumented with the trusted chain covering the whole architecture stack. According to the main operations of the clouds, remote verification protocols are also proposed in T-YUN, with a dedicated key management scheme. This paper also implements a proof-of-concept emulator to validate the effectiveness and performance overhead of T-YUN. The experimental results show that T-YUN is effective and the extra overhead incurred by it is acceptable.
  • Jinghua YAN, Xiaochun YUN, Hao LUO, Zhigang WU, Shuzhuang ZHANG
    原稿種別: PAPER
    専門分野: Information Network
    2013 年 E96.D 巻 11 号 p. 2354-2364
    発行日: 2013/11/01
    公開日: 2013/11/01
    ジャーナル フリー
    Traffic classification has recently gained much attention in both academic and industrial research communities. Many machine learning methods have been proposed to tackle this problem and have shown good results. However, when applied to traffic with out-of-sequence packets, the accuracy of existing machine learning approaches decreases dramatically. We observe the main reason is that the out-of-sequence packets change the spatial representation of feature vectors, which means the property of linear mapping relation among features used in machine learning approaches cannot hold any more. To address this problem, this paper proposes an Improved Dynamic Time Warping (IDTW) method, which can align two feature vectors using non-linear alignment. Experimental results on two real traces show that IDTW achieves better classification accuracy in out-of-sequence traffic classification, in comparison to existing machine learning approaches.
  • Xiuyan JIANG, Dejian YE, Yiming CHEN, Xuejun TIAN
    原稿種別: PAPER
    専門分野: Information Network
    2013 年 E96.D 巻 11 号 p. 2365-2375
    発行日: 2013/11/01
    公開日: 2013/11/01
    ジャーナル フリー
    Smart TVs are expected to play a leading role in the future networked intelligent screen market. Currently, many operators are planning to deploy it in large scale in a few years. Therefore, it is necessary for smart TVs to provide high quality services for users. Packet loss is one critical reason that decreases the QoS in smart TVs. Even a very small amount of packet loss (1-2%) can decrease the QoS and affect users' experience seriously. This paper applies stochastic differential equations to analyzing the queue in the buffer of access points in smart TV multicast systems, demonstrates the reason for packet loss, and then proposes an end-to-end error recovery scheme (short as OPRSFEC) whose core algorithm is based on Reed-Solomon theory, and optimizes four aspects in finite fields: 1) Using Cauchy matrix instead of Vandermonde matrix to code and decode; 2) generating inverse matrix by table look-up; 3) changing the matrix multiplication into the table look-up; 4) originally dividing the matrix multiplication. This paper implements the scheme on the application layer, which screens the heterogeneity of terminals and servers, corrects 100% packet loss (loss rate is 1%-2%) in multicast systems, and brings very little effect on real-time users experience. Simulations demonstrate that the proposed scheme has good performances, successfully runs on Sigma and Mstar Moca TV terminals, and increases the QoS of smart TVs. Recently, OPRSFEC middleware has become a part of IPTV2.0 standard in Shanghai Telecom and has been running on the Mstar boards of Haier Moca TVs properly.
  • Dung Duc NGUYEN, Maike ERDMANN, Tomoya TAKEYOSHI, Gen HATTORI, Kazunor ...
    原稿種別: PAPER
    専門分野: Artificial Intelligence, Data Mining
    2013 年 E96.D 巻 11 号 p. 2376-2384
    発行日: 2013/11/01
    公開日: 2013/11/01
    ジャーナル フリー
    The abundance of information published on the Internet makes filtering of hazardous Web pages a difficult yet important task. Supervised learning methods such as Support Vector Machines (SVMs) can be used to identify hazardous Web content. However, scalability is a big challenge, especially if we have to train multiple classifiers, since different policies exist on what kind of information is hazardous. We therefore propose two different strategies to train multiple SVMs for personalized Web content filters. The first strategy identifies common data clusters and then performs optimization on these clusters in order to obtain good initial solutions for individual problems. This initialization shortens the path to the optimal solutions and reduces the training time on individual training sets. The second approach is to train all SVMs simultaneously. We introduce an SMO-based kernel-biased heuristic that balances the reduction rate of individual objective functions and the computational cost of kernel matrix. The heuristic primarily relies on the optimality conditions of all optimization problems and secondly on the pre-calculated part of the whole kernel matrix. This strategy increases the amount of information sharing among learning tasks, thus reduces the number of kernel calculation and training time. In our experiments on inconsistently labeled training examples, both strategies were able to predict hazardous Web pages accurately (> 91%) with a training time of only 26% and 18% compared to that of the normal sequential training.
  • Kai SHI, Yuichi GOTO, Zhiliang ZHU, Jingde CHENG
    原稿種別: PAPER
    専門分野: Artificial Intelligence, Data Mining
    2013 年 E96.D 巻 11 号 p. 2385-2396
    発行日: 2013/11/01
    公開日: 2013/11/01
    ジャーナル フリー
    Avoiding runway incursions is a significant challenge and a top priority in aviation. Due to all causes of runway incursions belong to human factors, runway incursion prevention systems should remove human from the system operation loop as much as possible. Although current runway incursion prevention systems have made big progress on how to obtain accurate and sufficient information of aircraft/vehicles, they cannot predict and detect runway incursions as early as experienced air traffic controllers by using the same surveillance information, and cannot give explicit instructions and/or suggestions to prevent runway incursions like real air traffic controllers either. In one word, human still plays an important position in current runway incursion prevention systems. In order to remove human factors from the system operation loop as much as possible, this paper proposes a new type of runway incursion prevention system based on logic-based reasoning. The system predicts and detects runway incursions, then gives explicit instructions and/or suggestions to pilots/drivers to avoid runway incursions/collisions. The features of the system include long-range prediction of incidents, explicit instructions and/or suggestions, and flexible model for different policies and airports. To evaluate our system, we built a simulation system, and evaluated our system using both real historical scenarios and conventional fictional scenarios. The evaluation showed that our system is effective at providing earlier prediction of incidents than current systems, giving explicit instructions and/or suggestions for handling the incidents effectively, and customizing for specific policies and airports using flexible model.
  • Warin WATTANAPORNPROM, Prabhas CHONGSTITVATANA
    原稿種別: PAPER
    専門分野: Artificial Intelligence, Data Mining
    2013 年 E96.D 巻 11 号 p. 2397-2408
    発行日: 2013/11/01
    公開日: 2013/11/01
    ジャーナル フリー
    This article introduces the Coincidence Algorithm (COIN) to solve several multimodal puzzles. COIN is an algorithm in the category of Estimation of Distribution Algorithms (EDAs) that makes use of probabilistic models to generate solutions. The model of COIN is a joint probability table of adjacent events (coincidence) derived from the population of candidate solutions. A unique characteristic of COIN is the ability to learn from a negative sample. Various experiments show that learning from a negative example helps to prevent premature convergence, promotes diversity and preserves good building blocks.
  • Yong-Soo SEOL, Han-Woo KIM
    原稿種別: PAPER
    専門分野: Human-computer Interaction
    2013 年 E96.D 巻 11 号 p. 2409-2416
    発行日: 2013/11/01
    公開日: 2013/11/01
    ジャーナル フリー
    To understand human emotion, it is necessary to be aware of the surrounding situation and individual personalities. In most previous studies, however, these important aspects were not considered. Emotion recognition has been considered as a classification problem. In this paper, we attempt new approaches to utilize a person's situational information and personality for use in understanding emotion. We propose a method of extracting situational information and building a personalized emotion model for reflecting the personality of each character in the text. To extract and utilize situational information, we propose a situation model using lexical and syntactic information. In addition, to reflect the personality of an individual, we propose a personalized emotion model using KBANN (Knowledge-based Artificial Neural Network). Our proposed system has the advantage of using a traditional keyword-spotting algorithm. In addition, we also reflect the fact that the strength of emotion decreases over time. Experimental results show that the proposed system can more accurately and intelligently recognize a person's emotion than previous methods.
  • Trung-Nghia PHUNG, Thanh-Son PHAN, Thang Tat VU, Mai Chi LUONG, Masato ...
    原稿種別: PAPER
    専門分野: Speech and Hearing
    2013 年 E96.D 巻 11 号 p. 2417-2426
    発行日: 2013/11/01
    公開日: 2013/11/01
    ジャーナル フリー
    The most important advantage of HMM-based TTS is its highly intelligible. However, speech synthesized by HMM-based TTS is muffled and far from natural, especially under limited data conditions, which is mainly caused by its over-smoothness. Therefore, the motivation for this paper is to improve the naturalness of HMM-based TTS trained under limited data conditions while preserving its intelligibility. To achieve this motivation, a hybrid TTS between HMM-based TTS and the modified restricted Temporal Decomposition (MRTD), named HTD in this paper, was proposed. Here, TD is an interpolation model of decomposing a spectral or prosodic sequence of speech into sparse event targets and dynamic event functions, and MRTD is one simplified version of TD. With a determination of event functions close to the concept of co-articulation in speech, MRTD can synthesize smooth speech and the smoothness in synthesized speech can be adjusted by manipulating event targets of MRTD. Previous studies have also found that event functions of MRTD can represent linguistic information of speech, which is important to perceive speech intelligibility, while sparse event targets can convey the non-linguistics information, which is important to perceive the naturalness of speech. Therefore, prosodic trajectories and MRTD event functions of the spectral trajectory generated by HMM-based TTS were kept unchanged to preserve the high and stable intelligibility of HMM-based TTS. Whereas MRTD event targets of the spectral trajectory generated by HMM-based TTS were rendered with an original speech database to enhance the naturalness of synthesized speech. Experimental results with small Vietnamese datasets revealed that the proposed HTD was equivalent to HMM-based TTS in terms of intelligibility but was superior to it in terms of naturalness. Further discussions show that HTD had a small footprint. Therefore, the proposed HTD showed its strong efficiency under limited data conditions.
  • Kazu MISHIBA, Masaaki IKEHARA, Takeshi YOSHITOME
    原稿種別: PAPER
    専門分野: Image Processing and Video Processing
    2013 年 E96.D 巻 11 号 p. 2427-2436
    発行日: 2013/11/01
    公開日: 2013/11/01
    ジャーナル フリー
    In this paper, we propose a novel content-aware image resizing method based on grid transformation. Our method focuses on not only keeping important regions unchanged but also keeping the aspect ratio of the main object in an image unchanged. The dual conditions can avoid distortion which often occurs when only using the former condition. Our method first calculates image importance. Next, we extract the main objects on an image by using image importance. Finally, we calculate the optimal grid transformation which suppresses changes in size of important regions and in the aspect ratios of the main objects. Our method uses lower and upper thresholds for transformation to suppress distortion due to extreme shrinking and enlargement. To achieve better resizing results, we introduce a boundary discarding process. This process can assign wider regions to important regions, reducing distortions on important regions. Experimental results demonstrate that our proposed method resizes images with less distortion than other resizing methods.
  • Fuji REN, Bo LI, Qimei CHEN
    原稿種別: PAPER
    専門分野: Image Processing and Video Processing
    2013 年 E96.D 巻 11 号 p. 2437-2449
    発行日: 2013/11/01
    公開日: 2013/11/01
    ジャーナル フリー
    Considering the non-linear properties of the human visual system, many non-linear operators and models have been developed, particularly the logarithmic image processing (LIP) model proposed by Jourlin and Pinoli, which has been proved to be physically justified in several laws of the human visual system and has been successfully applied in image processing areas. Recently, several modifications based on this logarithmic mathematical framework have been presented, such as parameterized logarithmic image processing (PLIP), pseudo-logarithmic image processing, homomorphic logarithmic image processing. In this paper, a new single parameter logarithmic model for image processing with an adaptive parameter-based Sobel edge detection algorithm is presented. On the basis of analyzing the distributive law, the subtractive law, and the isomorphic property of the PLIP model, the five parameters in PLIP are replaced by a single parameter to ensure the completeness of the model and physical constancy with the nature of an image, and then an adaptive parameter-based Sobel edge detection algorithm is proposed. By using an image noise estimation method to evaluate the noise level of image, the adaptive parameter in the single parameter LIP model is calculated based on the noise level and grayscale value of a corresponding image area, followed by the single-parameter LIP-based Sobel operation to overcome the noise-sensitive problem of classical LIP-based Sobel edge detection methods, especially in the dark area of an image, while retaining edge sensitivity. Compared with the classical LIP and PLIP model, the given single parameter LIP achieves satisfactory results in noise suppression and edge accuracy.
  • Wei LI, Yang WU, Masayuki MUKUNOKI, Michihiko MINOH
    原稿種別: PAPER
    専門分野: Image Recognition, Computer Vision
    2013 年 E96.D 巻 11 号 p. 2450-2461
    発行日: 2013/11/01
    公開日: 2013/11/01
    ジャーナル フリー
    Multiple-shot person re-identification, which is valuable for application in visual surveillance, tackles the problem of building the correspondence between images of the same person from different cameras. It is challenging because of the large within-class variations due to the changeable body appearance and environment and the small between-class differences arising from the possibly similar body shape and clothes style. A novel method named “Bi-level Relative Information Analysis” is proposed in this paper for the issue by treating it as a set-based ranking problem. It creatively designs a relative dissimilarity using set-level neighborhood information, called “Set-level Common-Near-Neighbor Modeling”, complementary to the sample-level relative feature “Third-Party Collaborative Representation” which has recently been proven to be quite effective for multiple-shot person re-identification. Experiments implemented on several public benchmark datasets show significant improvements over state-of-the-art methods.
  • Chuzo IWAMOTO, Yuta MATSUI
    原稿種別: LETTER
    専門分野: Fundamentals of Information Systems
    2013 年 E96.D 巻 11 号 p. 2462-2465
    発行日: 2013/11/01
    公開日: 2013/11/01
    ジャーナル フリー
    Pyramid is a solitaire game, where the object is to remove all cards from both a pyramidal layout and a stock of cards. Two exposed cards can be matched and removed if their values total 13. Any exposed card of value 13 and the top card of the stock can be discarded immediately. We prove that the generalized version of Pyramid is NP-complete.
  • Zhong ZHENG, Zhiying WANG, Li SHEN
    原稿種別: LETTER
    専門分野: Computer System
    2013 年 E96.D 巻 11 号 p. 2466-2469
    発行日: 2013/11/01
    公開日: 2013/11/01
    ジャーナル フリー
    Power consumption has become a critical factor for embedded systems, especially for battery powered ones. Caches in these systems consume a large portion of the whole chip power. Embedded systems usually adopt set-associative caches to get better performance. However, parallel accessed cache ways incur more energy dissipation. This paper proposed a region-based way-partitioning scheme to reduce cache way access, and without sacrificing performance, to reduce the cache power consumption. The stack accesses and non-stack accesses are isolated and redirected to different ways of the L1 data cache. Under way-partitioning, cache way accesses are reduced, as well as the memory reference interference. Experimental results show that the proposed approach could save around 27.5% of L1 data cache energy on average, without significant performance degradation.
  • Takashi ISHIO, Hiroki WAKISAKA, Yuki MANABE, Katsuro INOUE
    原稿種別: LETTER
    専門分野: Software System
    2013 年 E96.D 巻 11 号 p. 2470-2472
    発行日: 2013/11/01
    公開日: 2013/11/01
    ジャーナル フリー
    Logging the execution process of a program is a popular activity for practical program understanding. However, understanding the behavior of a program from a complete execution trace is difficult because a system may generate a substantial number of runtime events. To focus on a small subset of runtime events, a dynamic object process graph (DOPG) has been proposed. Although a DOPG can potentially facilitate program understanding, the logging process has not been adapted for DOPGs. If a developer is interested in the behavior of a particular object, only the runtime events related to the object are necessary to construct a DOPG. The vast majority of runtime events in a complete execution trace are irrelevant to the interesting object. This paper analyzes actual DOPGs and reports that a logging tool can be optimized to record only the runtime events related to a particular object specified by a developer.
  • Jinho AHN
    原稿種別: LETTER
    専門分野: Dependable Computing
    2013 年 E96.D 巻 11 号 p. 2473-2477
    発行日: 2013/11/01
    公開日: 2013/11/01
    ジャーナル フリー
    This paper presents a new scalable method to considerably reduce the rollback propagation effect of the conventional optimistic message logging by utilizing positive features of reliable FIFO group communication links. To satisfy this goal, the proposed method forces group members to replicate different receive sequence numbers (RSNs), which they assigned for each identical message to their group respectively, into their volatile memories. As the degree of redundancy of RSNs increases, the possibility of local recovery for each crashed process may significantly be higher. Experimental results show that our method can outperform the previous one in terms of the rollback distance of non-faulty processes with a little normal time overhead.
  • Xin LI, Jielin PAN, Qingwei ZHAO, Yonghong YAN
    原稿種別: LETTER
    専門分野: Speech and Hearing
    2013 年 E96.D 巻 11 号 p. 2478-2482
    発行日: 2013/11/01
    公開日: 2013/11/01
    ジャーナル フリー
    Morphemes, which are obtained from morphological parsing, and statistical sub-words, which are derived from data-driven splitting, are commonly used as the recognition units for speech recognition of agglutinative languages. In this letter, we propose a discriminative approach to select the splitting result, which is more likely to improve the recognizer's performance, for each distinct word type. An objective function which involves the unigram language model (LM) probability and the count of misrecognized phones on the acoustic training data is defined and minimized. After determining the splitting result for each word in the text corpus, we select the frequent units to build a hybrid vocabulary including morphemes and statistical sub-words. Compared to a statistical sub-word based system, the hybrid system achieves 0.8% letter error rates (LERs) reduction on the test set.
  • Jangwon CHOI, Yoonsik CHOE, Yong-Goo KIM
    原稿種別: LETTER
    専門分野: Image Processing and Video Processing
    2013 年 E96.D 巻 11 号 p. 2483-2486
    発行日: 2013/11/01
    公開日: 2013/11/01
    ジャーナル フリー
    This letter proposes a novel depth-guided inpainting scheme for the high quality hole-filling in 2D-to-3D video conversion. The proposed scheme detects and removes foreground depth layers in an image patch, enabling appropriate patch formation using only disoccluded background information. This background only patch formation helps to avoid the propagation of wrong depths over hole area, and thus improve the overall quality of converted 3D video experience. Experimental results demonstrate the proposed scheme provides visually much more pleasing inpainting results with better preserved object edges compared to the state-of-the-art depth-guided inpainting schemes.
  • Jin-Ping HE, Kun GAO, Guo-Qiang NI, Guang-Da SU, Jian-Sheng CHEN
    原稿種別: LETTER
    専門分野: Image Processing and Video Processing
    2013 年 E96.D 巻 11 号 p. 2487-2491
    発行日: 2013/11/01
    公開日: 2013/11/01
    ジャーナル フリー
    Considering the real existent fact of the ideal edge and the learning style of image analogy without reference parameters, a blind image recovery algorithm using a self-adaptive learning method is proposed in this paper. We show that a specific local image patch with degradation characteristic can be utilized for restoring the whole image. In the training process, a clear counterpart of the local image patch is constructed based on the ideal edge assumption so that identification of the Point Spread Function is no longer needed. Experiments demonstrate the effectiveness of the proposed method on remote sensing images.
  • Sungchan OH, Hyug-Jae LEE, Gyeonghwan KIM
    原稿種別: LETTER
    専門分野: Image Processing and Video Processing
    2013 年 E96.D 巻 11 号 p. 2492-2495
    発行日: 2013/11/01
    公開日: 2013/11/01
    ジャーナル フリー
    This letter presents a method of adding a virtual halo effect to an object of interest in video sequences. A modified graph-cut segmentation algorithm extracts object layers. The halo is modeled by the accumulation of gradually changing Gaussians. With a synthesized blooming effect, the experimental results show that the proposed method conveys realistic halo effect.
  • Kun LU, Xin ZHANG
    原稿種別: LETTER
    専門分野: Image Recognition, Computer Vision
    2013 年 E96.D 巻 11 号 p. 2496-2499
    発行日: 2013/11/01
    公開日: 2013/11/01
    ジャーナル フリー
    This letter presents a novel approach for automatic multimodal affect recognition. The audio and visual channels provide complementary information for human affective states recognition, and we utilize Boltzmann zippers as model-level fusion to learn intrinsic correlations between the different modalities. We extract effective audio and visual feature streams with different time scales and feed them to two component Boltzmann chains respectively. Hidden units of the two chains are interconnected to form a Boltzmann zipper which can effectively avoid local energy minima during training. Second-order methods are applied to Boltzmann zippers to speed up learning and pruning process. Experimental results on audio-visual emotion data recorded by ourselves in Wizard of Oz scenarios and collected from the SEMAINE naturalistic database both demonstrate our approach is robust and outperforms the state-of-the-art methods.
  • Guoqi LIU, Zhiheng ZHOU, Shengli XIE, Dongcheng WU
    原稿種別: LETTER
    専門分野: Image Recognition, Computer Vision
    2013 年 E96.D 巻 11 号 p. 2500-2503
    発行日: 2013/11/01
    公開日: 2013/11/01
    ジャーナル フリー
    Vector field convolution (VFC) provides a successful external force for an active contour model. However, it fails to extract the complex geometries, especially the deep concavity when the initial contour is set outside the object or the concave region. In this letter, dynamically constrained vector field convolution (DCVFC) external force is proposed to solve this problem. In DCVFC, the indicator function with respect to the evolving contour is introduced to restrain the correlation of external forces generated by different edges, and the forces dynamically generated by complex concave edges gradually make the contour move to the object. On the other hand, traditional vector field, a component of the proposed DCVFC, makes the evolving contour stop at the object boundary. The connections between VFC and DCVFC are also analyzed. DCVFC maintains desirable properties of VFC, such as robustness to initialization. Experimental results demonstrate that DCVFC snake provides a much better segmentation than VFC snake.
  • Jea-Yul YOON, Chai-Jong SONG, Hochong PARK
    原稿種別: LETTER
    専門分野: Music Information Processing
    2013 年 E96.D 巻 11 号 p. 2504-2507
    発行日: 2013/11/01
    公開日: 2013/11/01
    ジャーナル フリー
    A new method for predominant melody extraction from polyphonic music signals based on harmonic structure is proposed. The proposed method first extracts a set of fundamental frequency candidates by analyzing the distance between spectral peaks. Then, the predominant fundamental frequency is selected by pitch tracking according to the harmonic strength of the selected candidates. Finally, the method runs pitch smoothing on a large temporal scale for eliminating pitch doubling error, and conducts voicing frame detection. The proposed method shows the best overall performance for ADC 2004 DB in the MIREX 2011 audio melody extraction task.
feedback
Top