Special Section on the Architectures, Protocols, and Applications for the Future Internet
Special Section on Enriched Multimedia — Creation of a New Society through Value-added Multimedia Content —
-
Isao ECHIZEN
2016 Volume E99.D Issue 1 Pages
40
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
-
Jorge TREVINO, Shuichi SAKAMOTO, Junfeng LI, Yôiti SUZUKI
Article type: INVITED PAPER
2016 Volume E99.D Issue 1 Pages
41-49
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
There is a strong push towards the ultra-realistic presentation of multimedia contents made possible by the latest advances in computational and signal processing technologies. Three-dimensional sound presentation is necessary to convey a natural and rich multimedia experience. Promising ways to achieve this include the sound field reproduction technique known as high-order Ambisonics (HOA). While these advanced methods are now within the capabilities of consumer-level processing systems, their adoption is hindered by the lack of contents. Production and coding of the audio components in multimedia focus on traditional formats such as stereophonic sound. Mainstream audio codecs and media such as CDs or DVDs do not support advanced, rich contents such as HOA encodings. To ameliorate this problem and speed up the adoption of spatial sound technologies, this paper proposes a novel way to downmix HOA contents into a stereo signal. The resulting data can be distributed using conventional methods such as audio CDs or as the audio component of an internet video stream. The results can be listened to using legacy stereo reproduction systems. However, they include spatial information encoded as the inter-channel level and phase differences. The proposed method consists of a downmixing filterbank which independently modulate inter-channel differences at each frequency bin. The proposal is evaluated using simple test signals and found to outperform conventional methods such as matrix-encoded surround and the Ambisonics UHJ format in terms of spatial resolution. The proposal can be coupled with a previously presented method to recover HOA signals from stereo recordings. The resulting system allows for the preservation of full-surround spatial information in ultra-realistic contents when they are transferred using a stereo stream. Simulation results show that a compatible decoder can accurately recover up to five HOA channels from a stereo signal (2nd order HOA data in the horizontal plane).
View full abstract
-
Minoru KURIBAYASHI
Article type: PAPER
2016 Volume E99.D Issue 1 Pages
50-59
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
Based upon the Kerckhoffs' principle, illegal users can get to know the embedding and detection algorithms except for a secret key. Then, it is possible to access to a host signal which may be selected from frequency components of a digital content for embedding watermark signal. Especially for a fingerprinting scheme which embeds user's information as a watermark, the selected components can be easily found by the observation of differently watermarked copies of a same content. In this scenario, it is reported that some non-linear collusion attacks will be able to remove/modify the embedded signal. In this paper, we study the security analysis of our previously proposed spread-spectrum (SS) fingerprinting scheme[1], [2] under the Kerckhoffs' principle, and reveal its drawback when an SS sequence is embedded in a color image. If non-linear collusion attacks are performed only to the components selected for embedding, the traceability is greatly degraded while the pirated copy keeps high quality after the attacks. We also propose a simple countermeasure to enhance the robustness against non-linear collusion attacks as well as possible signal processing attacks for the underlying watermarking method.
View full abstract
-
Ibuki NAKAMURA, Yoshihide TONOMURA, Hitoshi KIYA
Article type: PAPER
2016 Volume E99.D Issue 1 Pages
60-68
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
We focus on the feature transform approach as one methodology for biometric template protection, where the template consists of the features extracted from the biometric trait. This study considers some properties of the unitary (including orthogonal) transform-based template protection in particular. It is known that the Euclidean distance between the templates protected by a unitary transform is the same as that between original (non-protected) ones as a property. In this study, moreover, it is shown that it provides the same results in
l2-norm minimization problems as those of original templates. This means that there is no degradation of recognition performance in authentication systems using
l2-norm minimization. Therefore, the protected templates can be reissued multiple times without original templates. In addition, a DFT-based template protection scheme is proposed as an unitary transform-based one. The proposed scheme enables to efficiently generate protected templates by the FFT, in addition to the useful properties. It is also applied to face recognition experiments to evaluate the effectiveness.
View full abstract
-
Kazuto OGAWA, Go OHTAKE
Article type: PAPER
2016 Volume E99.D Issue 1 Pages
69-82
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
Broadcasting and communications networks can be used together to offer hybrid broadcasting services that incorporate a variety of personalized information from communications networks in TV programs. To enable these services, many different applications have to be run on a user terminal, and it is necessary to establish an environment where any service provider can create applications and distribute them to users. The danger is that malicious service providers might distribute applications which may cause user terminals to take undesirable actions. To prevent such applications from being distributed, we propose an application authentication protocol for hybrid broadcasting and communications services. Concretely, we modify a key-insulated signature scheme and apply it to this protocol. In the protocol, a broadcaster distributes a distinct signing key to each service provider that the broadcaster trusts. As a result, users can verify that an application is reliable. If a signed application causes an undesirable action, a broadcaster can revoke the privileges and permissions of the service provider. In addition, the broadcaster can update the signing key. That is, our protocol is secure against leakage of the signing key by the broadcaster and service providers. Moreover, a user terminal uses only one verification key for verifying a signature, so the memory needed for storing the verification key in the user terminal is very small. With our protocol, users can securely receive hybrid services from broadcasting and communications networks.
View full abstract
-
Akira NISHIMURA
Article type: PAPER
2016 Volume E99.D Issue 1 Pages
83-91
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
Reversible data hiding is a technique in which hidden data are embedded in host data such that the consistency of the host is perfectly preserved and its data are restored during extraction of the hidden data. In this paper, a linear prediction technique for reversible data hiding of audio waveforms is improved. The proposed variable expansion method is able to control the payload size through varying the expansion factor. The proposed technique is combined with the prediction error expansion method. Reversible embedding, perfect payload detection, and perfect recovery of the host signal are achieved for a framed audio signal. A smaller expansion factor results in a smaller payload size and less degradation in the stego audio quality. Computer simulations reveal that embedding a random-bit payload of less than 0.4 bits per sample into CD-format music signals provide stego audio with acceptable objective quality. The method is also applied to G.711 µ-law-coded speech signals. Computer simulations reveal that embedding a random-bit payload of less than 0.1 bits per sample into speech signals provide stego speech with good objective quality.
View full abstract
-
Nhut Minh NGO, Masashi UNOKI
Article type: PAPER
2016 Volume E99.D Issue 1 Pages
92-101
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
This paper proposes a method of watermarking for digital audio signals based on adaptive phase modulation. Audio signals are usually non-stationary, i.e., their own characteristics are time-variant. The features for watermarking are usually not selected by combining the principle of variability, which affects the performance of the whole watermarking system. The proposed method embeds a watermark into an audio signal by adaptively modulating its phase with the watermark using IIR all-pass filters. The frequency location of the pole-zero of an IIR all-pass filter that characterizes the transfer function of the filter is adapted on the basis of signal power distribution on sub-bands in a magnitude spectrum domain. The pole-zero locations are adapted so that the phase modulation produces slight distortion in watermarked signals to achieve the best sound quality. The experimental results show that the proposed method could embed inaudible watermarks into various kinds of audio signals and correctly detect watermarks without the aid of original signals. A reasonable trade-off between inaudibility and robustness could be obtained by balancing the phase modulation scheme. The proposed method can embed a watermark into audio signals up to 100 bits per second with 99% accuracy and 6 bits per second with 94.3% accuracy in the cases of no attack and attacks, respectively.
View full abstract
-
Taichi UENO, Tomoko KAJIYAMA, Noritomo OUCHI
Article type: PAPER
2016 Volume E99.D Issue 1 Pages
102-110
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
Product packaging is a significant factor in a buyer's purchasing decision. We have developed a method for creating package images reflecting consumers' taste impressions that balances the need to provide product information and the need to motivate purchasing. It uses a database showing the correspondence between adjectives and colors as extracted from consumer reviews. This correspondence is used to revise the colors in the original package image. Evaluation was done by having 40 participants drink target beverages and answer questions before and after drinking regarding their impressions of the taste and their desire to drink the beverage. The results revealed that displaying appropriately revised images reduced the gap between the expected taste when viewing the image and the actual taste. Displaying appropriately revised images should motivate purchasing decisions as well as increase product satisfaction.
View full abstract
-
Vanessa BRACAMONTE, Hitoshi OKADA
Article type: PAPER
2016 Volume E99.D Issue 1 Pages
111-119
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
The sense of presence, that is, the sense of the website being psychologically transported to the consumer, has been identified as an important factor for bringing back the feeling of sociability and physicality that is lost in online shopping. Previous research has investigated how visual content in the design can influence the sense of presence in a website, but the focus has been limited to the domestic electronic commerce context. In this paper, we conduct an experimental study in a cross-border electronic commerce context to evaluate the effect of country-related pictures on the perception of country presence, visual appeal and trust in a foreign online store. Two experimental conditions were considered: country-related pictures and generic pictures, each one evaluated for Thai and Singaporean websites. It was hypothesized that country-related content in pictures included in the design of the foreign online store would result in a higher level of country presence, and that this would in turn result in higher visual appeal and trust in the website. We conducted a survey among Japanese online consumers, with a total of 1991 participants obtained. The subjects were randomly assigned into four groups corresponding to the combination of country-of-origin of the website and picture condition. We used structural equation modeling in order to analyze the proposed hypotheses. The results showed that for both the Thai and Singaporean websites, country-related pictures resulted in higher country presence, and visual appeal was positively influenced by this increase in country presence. However, country presence did not have a direct effect on trust; this effect was completely mediated by visual appeal. We discuss these results and their implications for cross-border electronic commerce.
View full abstract
-
Kenji OZAWA, Shota TSUKAHARA, Yuichiro KINOSHITA, Masanori MORISE
Article type: PAPER
2016 Volume E99.D Issue 1 Pages
120-127
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
The sense of presence is often used to evaluate the performances of audio-visual (AV) content and systems. However, a
presence meter has yet to be realized. We consider that the sense of presence can be divided into two aspects:
system presence and
content presence. In this study we focused on content presence. To estimate the overall presence of a content item, we have developed estimation models for the sense of presence in audio-only and audio-visual content. In this study, the audio-visual model is expanded to estimate the instantaneous presence in an AV content item. Initially, we conducted an evaluation experiment of the presence with 40 content items to investigate the relationship between the features of the AV content and the instantaneous presence. Based on the experimental data, a neural-network-based model was developed by expanding the previous model. To express the variation in instantaneous presence, 6 audio-related features and 14 visual-related features, which are extracted from the content items in 500-ms intervals, are used as inputs for the model. The audio-related features are loudness, sharpness, roughness, dynamic range and standard deviation in sound pressure levels, and movement of sound images. The visual-related features involve hue, lightness, saturation, and movement of visual images. After constructing the model, a generalization test confirmed that the model is sufficiently accurate to estimate the instantaneous presence. Hence, the model should contribute to the development of a presence meter.
View full abstract
-
Yuta OHWATARI, Takahiro KAWAMURA, Yuichi SEI, Yasuyuki TAHARA, Akihiko ...
Article type: PAPER
2016 Volume E99.D Issue 1 Pages
128-137
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
In many movies, social conditions and awareness of the issues of the times are depicted in any form. Even if fantasy and science fiction are works far from reality, the character relationship does mirror the real world. Therefore, we try to understand social conditions of the real world by analyzing the movie. As a way to analyze the movies, we propose a method of estimating interpersonal relationships of the characters, using a machine learning technique called Markov Logic Network (MLN) from movie script databases on the Web. The MLN is a probabilistic logic network that can describe the relationships between characters, which are not necessarily satisfied on every line. In experiments, we confirmed that our proposed method can estimate favors between the characters in a movie with F-measure of 58.7%. Finally, by comparing the relationships with social indicators, we discussed the relevance of the movies to the real world.
View full abstract
-
Soyoung CHUNG, Min Gyo CHUNG
Article type: LETTER
2016 Volume E99.D Issue 1 Pages
138-140
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
Chen proposed an image quality assessment method to evaluate image quality at a ratio of noise in an image. However, Chen's method had some drawbacks that unnoticeable noise is reflected in the evaluation or noise position is not accurately detected. Therefore, in this paper, we propose a new image quality measurement scheme using the mean-centered WLNI (Weber's Law Noise Identifier) and the saliency map. The experimental results show that the proposed method outperforms Chen's and agrees more consistently with human visual judgment.
View full abstract
-
Xiaojuan LIAO, Hui ZHANG, Miyuki KOSHIMURA
Article type: PAPER
Subject area: Fundamentals of Information Systems
2016 Volume E99.D Issue 1 Pages
141-150
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
Cold boot attack is a side channel attack that recovers data from memory, which persists for a short period after power is lost. In the course of this attack, the memory gradually degrades over time and only a corrupted version of the data may be available to the attacker. Recently, great efforts have been made to reconstruct the original data from a corrupted version of AES key schedules, based on the assumption that all bits in the charged states tend to decay to the ground states while no bit in the ground state ever inverts. However, in practice, there is a small number of bits flipping in the opposite direction, called
reverse flipping errors. In this paper, motivated by the latest work that formulates the relations of AES key bits as a Boolean Satisfiability problem, we move one step further by taking the reverse flipping errors into consideration and employing off-the-shelf SAT and MaxSAT solvers to accomplish the recovery of AES-128 key schedules from decayed memory images. Experimental results show that, in the presence of reverse flipping errors, the MaxSAT approach enables reliable recovery of key schedules with significantly less time, compared with the SAT approach that relies on brute force search to find out the target errors. Moreover, in order to further enhance the efficiency of key recovery, we simplify the original problem by removing variables and formulas that have relatively weak relations to the whole key schedule. Experimental results demonstrate that the improved MaxSAT approach reduces the scale of the problem and recover AES key schedules more efficiently when the decay factor is relatively large.
View full abstract
-
Passakorn PHANNACHITTA, Akito MONDEN, Jacky KEUNG, Kenichi MATSUMOTO
Article type: PAPER
Subject area: Software Engineering
2016 Volume E99.D Issue 1 Pages
151-162
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
Analogy-based software effort estimation has gained a considerable amount of attention in current research and practice. Its excellent estimation accuracy relies on its solution adaptation stage, where an effort estimate is produced from similar past projects. This study proposes a solution adaptation technique named
LSA-X that introduces an approach to exploit the potential of productivity factors, i.e., project variables with a high correlation with software productivity, in the solution adaptation stage. The
LSA-X technique tailors the exploitation of the productivity factors with a procedure based on the Linear Size Adaptation (
LSA) technique. The results, based on 19 datasets show that in circumstances where a dataset exhibits a high correlation coefficient between productivity and a related factor (
r≥0.30), the proposed
LSA-X technique statistically outperformed (95% confidence) the other 8 commonly used techniques compared in this study. In other circumstances, our results suggest using any linear adaptation technique based on software size to compensate for the limitations of the
LSA-X technique.
View full abstract
-
Yang CAO, Masatoshi YOSHIKAWA
Article type: PAPER
Subject area: Data Engineering, Web Information Systems
2016 Volume E99.D Issue 1 Pages
163-175
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
Recent emerging mobile and wearable technologies make it easy to collect personal spatiotemporal data such as activity trajectories in daily life. Publishing real-time statistics over trajectory streams produced by crowds of people is expected to be valuable for both academia and business, answering questions such as “How many people are in Kyoto Station now?” However, analyzing these raw data will entail risks of compromising individual privacy.
ε-Differential Privacy has emerged as a well-known standard for private statistics publishing because of its guarantee of being rigorous and mathematically provable. However, since user trajectories will be generated infinitely, it is difficult to protect every trajectory under ε-differential privacy. On the other hand, in real life, not all users require the same level of privacy. To this end, we propose a flexible privacy model of
l-trajectory privacy to ensure every desired length of trajectory under protection of ε-differential privacy. We also design an algorithmic framework to publish
l-trajectory private data in
real time. Experiments using four real-life datasets show that our proposed algorithms are effective and efficient.
View full abstract
-
Hideko KAWAKUBO, Marthinus Christoffel DU PLESSIS, Masashi SUGIYAMA
Article type: PAPER
Subject area: Artificial Intelligence, Data Mining
2016 Volume E99.D Issue 1 Pages
176-186
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
In many real-world classification problems, the class balance often changes between training and test datasets, due to sample selection bias or the non-stationarity of the environment. Naive classifier training under such changes of class balance systematically yields a biased solution. It is known that such a systematic bias can be corrected by weighted training according to the test class balance. However, the test class balance is often unknown in practice. In this paper, we consider a semi-supervised learning setup where labeled training samples and unlabeled test samples are available and propose a class balance estimator based on the
energy distance. Through experiments, we demonstrate that the proposed method is computationally much more efficient than existing approaches, with comparable accuracy.
View full abstract
-
Truc Hung NGO, Yen-Wei CHEN, Naoki MATSUSHIRO, Masataka SEO
Article type: PAPER
Subject area: Pattern Recognition
2016 Volume E99.D Issue 1 Pages
187-196
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
Facial paralysis is a popular clinical condition occurring in 30 to 40 patients per 100,000 people per year. A quantitative tool to support medical diagnostics is necessary. This paper proposes a simple, visual and robust method that can objectively measure the degree of the facial paralysis by the use of spatiotemporal features. The main contribution of this paper is the proposal of an effective spatiotemporal feature extraction method based on a tracking of landmarks. Our method overcomes the drawbacks of the other techniques such as the influence of irrelevant regions, noise, illumination change and time-consuming process. In addition, the method is simple and visual. The simplification helps to reduce the time-consuming process. Also, the movements of landmarks, which relate to muscle movement ability, are visual. Therefore, the visualization helps reveal regions of serious facial paralysis. For recognition rate, experimental results show that our proposed method outperformed the other techniques tested on a dynamic facial expression image database.
View full abstract
-
Yuechan HAO, Bilan ZHU, Masaki NAKAGAWA
Article type: PAPER
Subject area: Pattern Recognition
2016 Volume E99.D Issue 1 Pages
197-207
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
This paper describes a significantly improved recognition system for on-line handwritten Japanese text free from line direction and character orientation constraints. The recognition system separates handwritten text of arbitrary character orientation and line direction into text line elements, estimates and normalizes character orientation and line direction, applies two-stage over-segmentation, constructs a segmentation-recognition candidate lattice and evaluates the likelihood of candidate segmentation-recognition paths by combining the scores of character recognition, geometric features and linguistic context. Enhancements over previous systems are made in line segmentation, over-segmentation and context integration model. The results of experiments on text from the
HANDS-Kondate_t_bf-2001-11 database demonstrate significant improvements in the character recognition rate compared with the previous systems. Its recognition rate on text of arbitrary character orientation and line direction is now comparable with that possible on horizontal text with normal character orientation. Moreover, its recognition speed and memory requirement do not limit the platforms or applications that employ the recognition system.
View full abstract
-
Ran LI, Hongbing LIU, Jie CHEN, Zongliang GAN
Article type: PAPER
Subject area: Image Processing and Video Processing
2016 Volume E99.D Issue 1 Pages
208-218
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
The conventional bilateral motion estimation (BME) for motion-compensated frame rate up-conversion (MC-FRUC) can avoid the problem of overlapped areas and holes but usually results in lots of inaccurate motion vectors (MVs) since 1) the MV of an object between the previous and following frames is more likely to have no temporal symmetry with respect to the target block of the interpolated frame and 2) the repetitive patterns existing in video frame lead to the problem of mismatch due to the lack of the interpolated block. In this paper, a new BME algorithm with a low computational complexity is proposed to resolve the above problems. The proposed algorithm incorporates multi-resolution search into BME, since it can easily utilize the MV consistency between two adjacent pyramid levels and spatial neighboring MVs to correct the inaccurate MVs resulting from no temporal symmetry while guaranteeing low computational cost. Besides, the multi-resolution search uses the fast wavelet transform to construct the wavelet pyramid, which not only can guarantee low computational complexity but also can reserve the high-frequency components of image at each level while sub-sampling. The high-frequency components are used to regularize the traditional block matching criterion for reducing the probability of mismatch in BME. Experiments show that the proposed algorithm can significantly improve both the objective and subjective quality of the interpolated frame with low computational complexity, and provide the better performance than the existing BME algorithms.
View full abstract
-
Huimin LU, Yujie LI, Shota NAKASHIMA, Seiichi SERIKAWA
Article type: PAPER
Subject area: Image Processing and Video Processing
2016 Volume E99.D Issue 1 Pages
219-227
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
Absorption, scattering, and color distortion are three major issues in underwater optical imaging. Light rays traveling through water are scattered and absorbed according to their wavelength. Scattering is caused by large suspended particles that degrade underwater optical images. Color distortion occurs because different wavelengths are attenuated to different degrees in water; consequently, images of ambient underwater environments are dominated by a bluish tone. In the present paper, we propose a novel underwater imaging model that compensates for the attenuation discrepancy along the propagation path. In addition, we develop a fast weighted guided normalized convolution domain filtering algorithm for enhancing underwater optical images. The enhanced images are characterized by a reduced noise level, better exposure in dark regions, and improved global contrast, by which the finest details and edges are enhanced significantly.
View full abstract
-
Ryo MATSUOKA, Tomohiro YAMAUCHI, Tatsuya BABA, Masahiro OKUDA
Article type: PAPER
Subject area: Image Processing and Video Processing
2016 Volume E99.D Issue 1 Pages
228-235
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
We propose an image restoration technique that uses multiple image integration. The detail of the dark area when acquiring a dark scene is often deteriorated by sensor noise. Simple image integration inherently has the capability of reducing random noises, but it is especially insufficient in scenes that have a dark area. We introduce a novel image integration technique that optimizes the weights for the integration. We find the optimal weight map by solving a convex optimization problem for the weight optimization. Additionally, we apply the proposed weight optimization scheme to a single-image super-resolution problem, where we slightly modify the weight optimization problem to estimate the high-resolution image from a single low-resolution one. We use some of our experimental results to show that the weight optimization significantly improves the denoising and super-resolution performances.
View full abstract
-
Kazuhiro TASHIRO, Takahiro KAWAMURA, Yuichi SEI, Hiroyuki NAKAGAWA, Ya ...
Article type: PAPER
Subject area: Image Recognition, Computer Vision
2016 Volume E99.D Issue 1 Pages
236-247
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
The objective of this paper is to recognize and classify the poses of idols in still images on the web. The poses found in Japanese idol photos are often complicated and their classification is highly challenging. Although advances in computer vision research have made huge contributions to image recognition, it is not enough to estimate human poses accurately. We thus propose a method that refines result of human pose estimation by Pose Guide Ontology (PGO) and a set of energy functions. PGO, which we introduce in this paper, contains useful background knowledge, such as semantic hierarchies and constraints related to the positional relationship between body parts. Energy functions compute the right positions of body parts based on knowledge of the human body. Through experiments, we also refine PGO iteratively for further improvement of classification accuracy. We demonstrate pose classification into 8 classes on a dataset containing 400 idol images on the web. Result of experiments shows the efficiency of PGO and the energy functions; the F-measure of classification is 15% higher than the non-refined results. In addition to this, we confirm the validity of the energy functions.
View full abstract
-
Norimichi UKITA
Article type: PAPER
Subject area: Image Recognition, Computer Vision
2016 Volume E99.D Issue 1 Pages
248-256
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
We propose part-segment (PS) features for estimating an articulated pose in still images. The PS feature evaluates the image likelihood of each body part (e.g. head, torso, and arms) robustly to background clutter and nuisance textures on the body. While general gradient features (e.g. HOG) might include many nuisance responses, the PS feature represents only the region of the body part by iterative segmentation while updating the shape prior of each part. In contrast to similar segmentation features, part segmentation is improved by part-specific shape priors that are optimized by training images with fully-automatically obtained seeds. The shape priors are modeled efficiently based on clustering for fast extraction of PS features. The PS feature is fused complementarily with gradient features using discriminative training and adaptive weighting for robust and accurate evaluation of part similarity. Comparative experiments with public datasets demonstrate improvement in pose estimation by the PS features.
View full abstract
-
Zhen GUO, Yujie ZHANG, Chen SU, Jinan XU, Hitoshi ISAHARA
Article type: PAPER
Subject area: Natural Language Processing
2016 Volume E99.D Issue 1 Pages
257-264
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
Recent work on joint word segmentation, POS (Part Of Speech) tagging, and dependency parsing in Chinese has two key problems: the first is that word segmentation based on character and dependency parsing based on word were not combined well in the transition-based framework, and the second is that the joint model suffers from the insufficiency of annotated corpus. In order to resolve the first problem, we propose to transform the traditional word-based dependency tree into character-based dependency tree by using the internal structure of words and then propose a novel character-level joint model for the three tasks. In order to resolve the second problem, we propose a novel semi-supervised joint model for exploiting n-gram feature and dependency subtree feature from partially-annotated corpus. Experimental results on the Chinese Treebank show that our joint model achieved 98.31%, 94.84% and 81.71% for Chinese word segmentation, POS tagging, and dependency parsing, respectively. Our model outperforms the pipeline model of the three tasks by 0.92%, 1.77% and 3.95%, respectively. Particularly, the F1 value of word segmentation and POS tagging achieved the best result compared with those reported until now.
View full abstract
-
Yan LEI, Min ZHANG, Bixin LI, Jingan REN, Yinhua JIANG
Article type: LETTER
Subject area: Software Engineering
2016 Volume E99.D Issue 1 Pages
265-269
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
Many recent studies have focused on leveraging rich information types to increase useful information for improving fault localization effectiveness. However, they rarely investigate the impact of information richness on fault localization to give guidance on how to enrich information for improving localization effectiveness. This paper presents the first systematic study to fill this void. Our study chooses four representative information types and investigates the relationship between their richness and the localization effectiveness. The results show that information richness related to frequency execution count involves a high risk of degrading the localization effectiveness, and backward slice is effective in improving localization effectiveness.
View full abstract
-
Chen CHEN, Chunyan HOU, Jiakun XIAO, Xiaojie YUAN
Article type: LETTER
Subject area: Artificial Intelligence, Data Mining
2016 Volume E99.D Issue 1 Pages
270-274
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
Purchase behavior prediction is one of the most important issues for the precision marketing of e-commerce companies. This Letter presents our solution to the purchase behavior prediction problem in E-commerce, specifically the task of Big Data Contest of China Computer Federation in 2014. The goal of this task is to predict which users will have the purchase behavior based on users' historical data. The traditional methods of recommendation encounter two crucial problems in this scenario. First, this task just predicts which users will have the purchase behavior, rather than which items should be recommended to which users. Second, the large-scale dataset poses a big challenge for building the empirical model. Feature engineering and Factorization Model shed some light on these problems. We propose to use Factorization Machines model based on the multiple classes and high dimensions of feature engineering. Experimental results on a real-world dataset demonstrate the advantages of our proposed method.
View full abstract
-
Raissa RELATOR, Nozomi NAGANO, Tsuyoshi KATO
Article type: LETTER
Subject area: Artificial Intelligence, Data Mining
2016 Volume E99.D Issue 1 Pages
275-278
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
Although many 3D structures have been solved for proteins to date, functions of some proteins remain unknown. To predict protein functions, comparison of local structures of proteins with pre-defined model structures, whose functions have been elucidated, is widely performed. For the comparison, the root mean square deviation (RMSD) has been used as a conventional index. In this work, adaptive deviation was incorporated, along with Bregmann Divergence Regularized Machine, in order to detect analogous local structures with such model structures more effectively than the conventional index.
View full abstract
-
Yi-Jia ZHANG, Zhong-Jian KANG, Xin-Feng LI, Zhe-Ming LU
Article type: LETTER
Subject area: Artificial Intelligence, Data Mining
2016 Volume E99.D Issue 1 Pages
279-282
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
The controllability of complex networks has attracted increasing attention within various scientific fields. Many power grids are complex networks with some common topological characteristics such as small-world and scale-free features. This Letter investigate the controllability of some real power grids in comparison with classical complex network models with the same number of nodes. Several conclusions are drawn after detailed analyses using several real power grids together with Erdös-Rényi (ER) random networks, Wattz-Strogatz (WS) small-world networks, Barabási-Albert (BA) scale-free networks and configuration model (CM) networks. The main conclusion is that most driver nodes of power grids are hub-free nodes with low nodal degree values of 1 or 2. The controllability of power grids is determined by degree distribution and heterogeneity, and power grids are harder to control than WS networks and CM networks while easier than BA networks. Some power grids are relatively difficult to control because they require a far higher ratio of driver nodes than ER networks, while other power grids are easier to control for they require a driver node ratio less than or equal to ER random networks.
View full abstract
-
M. Shahidur RAHMAN, Tetsuya SHIMAMURA
Article type: LETTER
Subject area: Speech and Hearing
2016 Volume E99.D Issue 1 Pages
283-287
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
This paper explores the potential of pitch determination from bone conducted (BC) speech. Pitch determination from normal air conducted (AC) speech signal can not attain the expected level of accuracy for every voice and background conditions. In contrast, since BC speech is caused by the vibrations that have traveled through the vocal tract wall, it is robust against ambient conditions. Though an appropriate model of BC speech is not known, it has regular harmonic structure in the lower spectral region. Due to this lowpass nature, pitch determination from BC speech is not usually affected by the dominant first formant. Experiments conducted on simultaneously recorded AC and BC speech show that BC speech is more reliable for pitch estimation than AC speech. With little human work, pitch contour estimated from BC speech can also be used as pitch reference that can serve as an alternate to the pitch contour extracted from laryngograph output which is sometimes inconsistent with simultaneously recorded AC speech.
View full abstract
-
Xia WANG, Ruiyu LIANG, Qingyun WANG, Li ZHAO, Cairong ZOU
Article type: LETTER
Subject area: Speech and Hearing
2016 Volume E99.D Issue 1 Pages
288-291
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
In this letter, an effective acoustic feedback cancellation algorithm is proposed based on the normalized sub-band adaptive filter (NSAF). To improve the confliction between fast convergence rate and low misalignment in the NSAF algorithm, a variable step size is designed to automatically vary according to the update state of the filter. The update state of the filter is adaptively detected via the normalized distance between the long term average and the short term average of the tap-weight vector. Simulation results demonstrate that the proposed algorithm has superior performance in terms of convergence rate and misalignment.
View full abstract
-
Qingyun WANG, Ruiyu LIANG, Li JING, Cairong ZOU, Li ZHAO
Article type: LETTER
Subject area: Speech and Hearing
2016 Volume E99.D Issue 1 Pages
292-295
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
Since digital hearing aids are sensitive to time delay and power consumption, the computational complexity of noise reduction must be reduced as much as possible. Therefore, some complicated algorithms based on the analysis of the time-frequency domain are very difficult to implement in digital hearing aids. This paper presents a new approach that yields an improved noise reduction algorithm with greatly reduce computational complexity for multi-channel digital hearing aids. First, the sub-band sound pressure level (SPL) is calculated in real time. Then, based on the calculated sub-band SPL, the noise in the sub-band is estimated and the possibility of speech is computed. Finally, a posteriori and a priori signal-to-noise ratios are estimated and the gain function is acquired to reduce the noise adaptively. By replacing the FFT and IFFT transforms by the known SPL, the proposed algorithm greatly reduces the computation loads. Experiments on a prototype digital hearing aid show that the time delay is decreased to nearly half that of the traditional adaptive Wiener filtering and spectral subtraction algorithms, but the SNR improvement and PESQ score are rather satisfied. Compared with modulation frequency-based noise reduction algorithm, which is used in many commercial digital hearing aids, the proposed algorithm achieves not only more than 5dB SNR improvement but also less time delay and power consumption.
View full abstract
-
Meng SUN, Hugo VAN HAMME, Yimin WANG, Xiongwei ZHANG
Article type: LETTER
Subject area: Speech and Hearing
2016 Volume E99.D Issue 1 Pages
296-299
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
Unsupervised spoken unit discovery or zero-source speech recognition is an emerging research topic which is important for spoken document analysis of languages or dialects with little human annotation. In this paper, we extend our earlier joint training framework for unsupervised learning of
discrete density HMM to
continuous density HMM (CDHMM) and apply it to spoken unit discovery. In the proposed recipe, we first cluster a group of Gaussians which then act as initializations to the joint training framework of nonnegative matrix factorization and semi-continuous density HMM (SCDHMM). In SCDHMM, all the hidden states share the same group of Gaussians but with different mixture weights. A CDHMM is subsequently constructed by tying the top-N activated Gaussians to each hidden state. Baum-Welch training is finally conducted to update the parameters of the Gaussians, mixture weights and HMM transition probabilities. Experiments were conducted on word discovery from TIDIGITS and phone discovery from TIMIT. For TIDIGITS, units were modeled by 10 states which turn out to be strongly related to words; while for TIMIT, units were modeled by 3 states which are likely to be phonemes.
View full abstract
-
Jae-Hee JUN, Ji-Hoon CHOI, Jong-Ok KIM
Article type: LETTER
Subject area: Image Processing and Video Processing
2016 Volume E99.D Issue 1 Pages
300-304
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
This letter proposes a novel post-processing method for self-similarity based super-resolution (SR). Existing back-projection (BP) methods enhance SR images by refining the reconstructed coarse high-frequency (HF) information. However, it causes artifacts due to interpolation and excessively smoothes small HF signals, particularly in texture regions. Motivated by these observations, we propose a novel post-processing method referred to as middle-frequency (MF) based refinement. The proposed method refines the reconstructed HF information in the MF domain rather than in the spatial domain, as in BP. In addition, it does not require an internal interpolation process, so it is free from the side-effects of interpolation. Experimental results show that the proposed algorithm provides superior performance in terms of both the quantity of reproduced HF information and the visual quality.
View full abstract
-
Zifen HE, Yinhui ZHANG
Article type: LETTER
Subject area: Image Processing and Video Processing
2016 Volume E99.D Issue 1 Pages
305-308
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
This work presents an approximate global optimization method for image halftone by fusing multi-scale information of the tree model. We employ Gaussian mixture model and hidden Markov tree to characterized the intra-scale clustering and inter-scale persistence properties of the detailed coefficients, respectively. The model of multiscale perceived error metric and the theory of scale-related perceived error metric are used to fuse the statistical distribution of the error metric of the scale of clustering and cross-scale persistence. An Energy function is then generated. Through energy minimization via graph cuts, we gain the halftone image. In the related experiment, we demonstrate the superior performance of this new algorithm when compared with several algorithms and quantitative evaluation.
View full abstract
-
Keun-Chang KWAK
Article type: LETTER
Subject area: Biocybernetics, Neurocomputing
2016 Volume E99.D Issue 1 Pages
309-312
Published: January 01, 2016
Released on J-STAGE: January 01, 2016
JOURNAL
FREE ACCESS
In this paper, a method for designing of Incremental Granular Model (IGM) based on integration of Linear Regression (LR) and Linguistic Model (LM) with the aid of fuzzy granulation is proposed. Here, IGM is designed by the use of information granulation realized via Context-based Interval Type-2 Fuzzy C-Means (CIT2FCM) clustering. This clustering approach are used not only to estimate the cluster centers by preserving the homogeneity between the clustered patterns from linguistic contexts produced in the output space, but also deal with the uncertainty associated with fuzzification factor. Furthermore, IGM is developed by construction of a LR as a global model, refine it through the local fuzzy if-then rules that capture more localized nonlinearities of the system by LM. The experimental results on two examples reveal that the proposed method shows a good performance in comparison with the previous works.
View full abstract