IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Volume E99.D , Issue 1
Showing 1-40 articles out of 40 articles from the selected issue
Special Section on the Architectures, Protocols, and Applications for the Future Internet
  • Toyokazu AKIYAMA
    2016 Volume E99.D Issue 1 Pages 1
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    Download PDF (75K)
  • Wasin PASSORNPAKORN, Sinchai KAMOLPHIWONG
    Type: INVITED PAPER
    2016 Volume E99.D Issue 1 Pages 2-9
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    Personal e-healthcare service is growing significantly. A large number of personal e-health measuring and monitoring devices are now in the market. However, to achieve better health outcome, various devices or services need to work together. This coordination among services remains challenge, due to their variations and complexities. To address this issue, we have proposed an ontology-based framework for interactive self-assessment of RESTful e-health services. Unlike existing e-health service frameworks where they had tightly coupling between services, as well as their data schemas were difficult to change and extend in the future. In our work, the loosely coupling among services and flexibility of each service are achieved through the design and implementation based on HYDRA vocabulary and REST principles. We have implemented clinical knowledge through the combination of OWL-DL and SPARQL rules. All of these services evolve independently; their interfaces are based on REST principles, especially HATEOAS constraints. We have demonstrated how to apply our framework for interactive self-assessment in e-health applications. We have shown that it allows the medical knowledge to drive the system workflow according to the event-driven principles. New data schema can be maintained during run-time. This is the essential feature to support arriving of IoT (Internet of Things) based medical devices, which have their own data schema and evolve overtime.
    Download PDF (793K)
  • Takumi SANADA, Xuejun TIAN, Takashi OKUDA, Tetsuo IDEGUCHI
    Type: PAPER
    2016 Volume E99.D Issue 1 Pages 10-20
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    WLANs have become increasingly popular and widely deployed. The MAC protocol is one of the important technology of the WLAN and affects communication efficiency directly. In this paper, focusing on MAC protocol, we propose a novel protocol that network nodes dynamically optimize their backoff process to achieve high throughput while supporting satisfied QoS. A distributed MAC protocol has an advantage that no infrastructure such as access point is necessary. On the other hand, total throughput decreases heavily and cannot guarantee QoS under high traffic load, which needs to be improved. Through theoretical analysis, we find that the average idle interval can represent current network traffic load and can be used together with estimated number of nodes for setting optimal CW. Since necessary indexes can be obtained directly through observing channel, our scheme based on those indexes will not increase any added load to networks, which makes our schemes simpler and more effective. Through simulation comparison with conventional method, we show that our scheme can greatly enhance the throughput and the QoS no matter the network is in saturated or non-saturated case, while maintaining good fairness.
    Download PDF (1291K)
  • Shimin SUN, Li HAN, Sunyoung HAN
    Type: PAPER
    2016 Volume E99.D Issue 1 Pages 21-29
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    Information Centric Networking (ICN) is a promising architecture as an alternative paradigm to traditional IP networking. The innovative concepts, such as named data, name-based routing, and in-network caching bring lots of benefits to Wireless Sensor Networks (WSNs). Simple and robust communication model of ICN, based on interest/data messages exchange, is appealing to be deployed in WSNs. However, ICN architectures are designed for power supplied network devices rather than resource-constrained sensor nodes. Introducing ICN-liked architecture to WSNs needs to rethink the naming scheme and forwarding strategy to meet the requirements of energy efficiency and failure recovery. This paper presents a light weight data centric routing mechanism (GRMR) for interest dissemination and data delivery in location-aware WSNs. A simple naming scheme gives assistance for routing decision by individual nodes. Greedy routing engaging with regional multicast mechanism provides an efficient data centric routing approach. The performance is analytically evaluated and simulated in NS-2. The results indicate that GRMR achieves significant energy efficiency under investigated scenarios.
    Download PDF (693K)
  • Chao-Wen TSENG, Yu-Chang CHEN, Chua-Huang HUANG
    Type: PAPER
    2016 Volume E99.D Issue 1 Pages 30-39
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    EPCglobal architecture framework is divided into identify, capture, and share layers and defines a collection of standards. It is not fully adequate to build IoT applications because the transducer capability is lacking. IEEE 1451 is a set of standards that defines data exchange format, communication protocols, and various connection interfaces between sensors/actuators and transducer interface modules. By appending IEEE 1451 transducer capability to EPCglobal architecture framework, a consistent EPC scheme expression for heterogeneous things can be achieved at identify layer. It is benefit to extend the upper layers of EPCglobal architecture framework seamlessly. In this paper, we put our emphasis on how to leverage the transducer capability at the capture layer. A device cycle, transducer cycle specification, and transducer cycle report are introduced to collect and process sensor/actuator data. The design and implementation of GS1 EPCglobal Application Level Events (ALE) modules extension are proposed for explaining the design philosophy and verifying the feasibility. It will interact with the capture and query services of EPC Information Services (EPCIS) for IoT applications at the share layer. By cooperating and interacting with these layers of EPCglobal architecture framework, the IoT architecture EPCglobal+ based on international standards is built.
    Download PDF (2500K)
Special Section on Enriched Multimedia — Creation of a New Society through Value-added Multimedia Content —
  • Isao ECHIZEN
    2016 Volume E99.D Issue 1 Pages 40
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    Download PDF (77K)
  • Jorge TREVINO, Shuichi SAKAMOTO, Junfeng LI, Yôiti SUZUKI
    Type: INVITED PAPER
    2016 Volume E99.D Issue 1 Pages 41-49
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    There is a strong push towards the ultra-realistic presentation of multimedia contents made possible by the latest advances in computational and signal processing technologies. Three-dimensional sound presentation is necessary to convey a natural and rich multimedia experience. Promising ways to achieve this include the sound field reproduction technique known as high-order Ambisonics (HOA). While these advanced methods are now within the capabilities of consumer-level processing systems, their adoption is hindered by the lack of contents. Production and coding of the audio components in multimedia focus on traditional formats such as stereophonic sound. Mainstream audio codecs and media such as CDs or DVDs do not support advanced, rich contents such as HOA encodings. To ameliorate this problem and speed up the adoption of spatial sound technologies, this paper proposes a novel way to downmix HOA contents into a stereo signal. The resulting data can be distributed using conventional methods such as audio CDs or as the audio component of an internet video stream. The results can be listened to using legacy stereo reproduction systems. However, they include spatial information encoded as the inter-channel level and phase differences. The proposed method consists of a downmixing filterbank which independently modulate inter-channel differences at each frequency bin. The proposal is evaluated using simple test signals and found to outperform conventional methods such as matrix-encoded surround and the Ambisonics UHJ format in terms of spatial resolution. The proposal can be coupled with a previously presented method to recover HOA signals from stereo recordings. The resulting system allows for the preservation of full-surround spatial information in ultra-realistic contents when they are transferred using a stereo stream. Simulation results show that a compatible decoder can accurately recover up to five HOA channels from a stereo signal (2nd order HOA data in the horizontal plane).
    Download PDF (1066K)
  • Minoru KURIBAYASHI
    Type: PAPER
    2016 Volume E99.D Issue 1 Pages 50-59
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    Based upon the Kerckhoffs' principle, illegal users can get to know the embedding and detection algorithms except for a secret key. Then, it is possible to access to a host signal which may be selected from frequency components of a digital content for embedding watermark signal. Especially for a fingerprinting scheme which embeds user's information as a watermark, the selected components can be easily found by the observation of differently watermarked copies of a same content. In this scenario, it is reported that some non-linear collusion attacks will be able to remove/modify the embedded signal. In this paper, we study the security analysis of our previously proposed spread-spectrum (SS) fingerprinting scheme[1], [2] under the Kerckhoffs' principle, and reveal its drawback when an SS sequence is embedded in a color image. If non-linear collusion attacks are performed only to the components selected for embedding, the traceability is greatly degraded while the pirated copy keeps high quality after the attacks. We also propose a simple countermeasure to enhance the robustness against non-linear collusion attacks as well as possible signal processing attacks for the underlying watermarking method.
    Download PDF (1174K)
  • Ibuki NAKAMURA, Yoshihide TONOMURA, Hitoshi KIYA
    Type: PAPER
    2016 Volume E99.D Issue 1 Pages 60-68
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    We focus on the feature transform approach as one methodology for biometric template protection, where the template consists of the features extracted from the biometric trait. This study considers some properties of the unitary (including orthogonal) transform-based template protection in particular. It is known that the Euclidean distance between the templates protected by a unitary transform is the same as that between original (non-protected) ones as a property. In this study, moreover, it is shown that it provides the same results in l2-norm minimization problems as those of original templates. This means that there is no degradation of recognition performance in authentication systems using l2-norm minimization. Therefore, the protected templates can be reissued multiple times without original templates. In addition, a DFT-based template protection scheme is proposed as an unitary transform-based one. The proposed scheme enables to efficiently generate protected templates by the FFT, in addition to the useful properties. It is also applied to face recognition experiments to evaluate the effectiveness.
    Download PDF (1394K)
  • Kazuto OGAWA, Go OHTAKE
    Type: PAPER
    2016 Volume E99.D Issue 1 Pages 69-82
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    Broadcasting and communications networks can be used together to offer hybrid broadcasting services that incorporate a variety of personalized information from communications networks in TV programs. To enable these services, many different applications have to be run on a user terminal, and it is necessary to establish an environment where any service provider can create applications and distribute them to users. The danger is that malicious service providers might distribute applications which may cause user terminals to take undesirable actions. To prevent such applications from being distributed, we propose an application authentication protocol for hybrid broadcasting and communications services. Concretely, we modify a key-insulated signature scheme and apply it to this protocol. In the protocol, a broadcaster distributes a distinct signing key to each service provider that the broadcaster trusts. As a result, users can verify that an application is reliable. If a signed application causes an undesirable action, a broadcaster can revoke the privileges and permissions of the service provider. In addition, the broadcaster can update the signing key. That is, our protocol is secure against leakage of the signing key by the broadcaster and service providers. Moreover, a user terminal uses only one verification key for verifying a signature, so the memory needed for storing the verification key in the user terminal is very small. With our protocol, users can securely receive hybrid services from broadcasting and communications networks.
    Download PDF (938K)
  • Akira NISHIMURA
    Type: PAPER
    2016 Volume E99.D Issue 1 Pages 83-91
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    Reversible data hiding is a technique in which hidden data are embedded in host data such that the consistency of the host is perfectly preserved and its data are restored during extraction of the hidden data. In this paper, a linear prediction technique for reversible data hiding of audio waveforms is improved. The proposed variable expansion method is able to control the payload size through varying the expansion factor. The proposed technique is combined with the prediction error expansion method. Reversible embedding, perfect payload detection, and perfect recovery of the host signal are achieved for a framed audio signal. A smaller expansion factor results in a smaller payload size and less degradation in the stego audio quality. Computer simulations reveal that embedding a random-bit payload of less than 0.4 bits per sample into CD-format music signals provide stego audio with acceptable objective quality. The method is also applied to G.711 µ-law-coded speech signals. Computer simulations reveal that embedding a random-bit payload of less than 0.1 bits per sample into speech signals provide stego speech with good objective quality.
    Download PDF (416K)
  • Nhut Minh NGO, Masashi UNOKI
    Type: PAPER
    2016 Volume E99.D Issue 1 Pages 92-101
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    This paper proposes a method of watermarking for digital audio signals based on adaptive phase modulation. Audio signals are usually non-stationary, i.e., their own characteristics are time-variant. The features for watermarking are usually not selected by combining the principle of variability, which affects the performance of the whole watermarking system. The proposed method embeds a watermark into an audio signal by adaptively modulating its phase with the watermark using IIR all-pass filters. The frequency location of the pole-zero of an IIR all-pass filter that characterizes the transfer function of the filter is adapted on the basis of signal power distribution on sub-bands in a magnitude spectrum domain. The pole-zero locations are adapted so that the phase modulation produces slight distortion in watermarked signals to achieve the best sound quality. The experimental results show that the proposed method could embed inaudible watermarks into various kinds of audio signals and correctly detect watermarks without the aid of original signals. A reasonable trade-off between inaudibility and robustness could be obtained by balancing the phase modulation scheme. The proposed method can embed a watermark into audio signals up to 100 bits per second with 99% accuracy and 6 bits per second with 94.3% accuracy in the cases of no attack and attacks, respectively.
    Download PDF (1915K)
  • Taichi UENO, Tomoko KAJIYAMA, Noritomo OUCHI
    Type: PAPER
    2016 Volume E99.D Issue 1 Pages 102-110
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    Product packaging is a significant factor in a buyer's purchasing decision. We have developed a method for creating package images reflecting consumers' taste impressions that balances the need to provide product information and the need to motivate purchasing. It uses a database showing the correspondence between adjectives and colors as extracted from consumer reviews. This correspondence is used to revise the colors in the original package image. Evaluation was done by having 40 participants drink target beverages and answer questions before and after drinking regarding their impressions of the taste and their desire to drink the beverage. The results revealed that displaying appropriately revised images reduced the gap between the expected taste when viewing the image and the actual taste. Displaying appropriately revised images should motivate purchasing decisions as well as increase product satisfaction.
    Download PDF (1576K)
  • Vanessa BRACAMONTE, Hitoshi OKADA
    Type: PAPER
    2016 Volume E99.D Issue 1 Pages 111-119
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    The sense of presence, that is, the sense of the website being psychologically transported to the consumer, has been identified as an important factor for bringing back the feeling of sociability and physicality that is lost in online shopping. Previous research has investigated how visual content in the design can influence the sense of presence in a website, but the focus has been limited to the domestic electronic commerce context. In this paper, we conduct an experimental study in a cross-border electronic commerce context to evaluate the effect of country-related pictures on the perception of country presence, visual appeal and trust in a foreign online store. Two experimental conditions were considered: country-related pictures and generic pictures, each one evaluated for Thai and Singaporean websites. It was hypothesized that country-related content in pictures included in the design of the foreign online store would result in a higher level of country presence, and that this would in turn result in higher visual appeal and trust in the website. We conducted a survey among Japanese online consumers, with a total of 1991 participants obtained. The subjects were randomly assigned into four groups corresponding to the combination of country-of-origin of the website and picture condition. We used structural equation modeling in order to analyze the proposed hypotheses. The results showed that for both the Thai and Singaporean websites, country-related pictures resulted in higher country presence, and visual appeal was positively influenced by this increase in country presence. However, country presence did not have a direct effect on trust; this effect was completely mediated by visual appeal. We discuss these results and their implications for cross-border electronic commerce.
    Download PDF (1166K)
  • Kenji OZAWA, Shota TSUKAHARA, Yuichiro KINOSHITA, Masanori MORISE
    Type: PAPER
    2016 Volume E99.D Issue 1 Pages 120-127
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    The sense of presence is often used to evaluate the performances of audio-visual (AV) content and systems. However, a presence meter has yet to be realized. We consider that the sense of presence can be divided into two aspects: system presence and content presence. In this study we focused on content presence. To estimate the overall presence of a content item, we have developed estimation models for the sense of presence in audio-only and audio-visual content. In this study, the audio-visual model is expanded to estimate the instantaneous presence in an AV content item. Initially, we conducted an evaluation experiment of the presence with 40 content items to investigate the relationship between the features of the AV content and the instantaneous presence. Based on the experimental data, a neural-network-based model was developed by expanding the previous model. To express the variation in instantaneous presence, 6 audio-related features and 14 visual-related features, which are extracted from the content items in 500-ms intervals, are used as inputs for the model. The audio-related features are loudness, sharpness, roughness, dynamic range and standard deviation in sound pressure levels, and movement of sound images. The visual-related features involve hue, lightness, saturation, and movement of visual images. After constructing the model, a generalization test confirmed that the model is sufficiently accurate to estimate the instantaneous presence. Hence, the model should contribute to the development of a presence meter.
    Download PDF (1887K)
  • Yuta OHWATARI, Takahiro KAWAMURA, Yuichi SEI, Yasuyuki TAHARA, Akihiko ...
    Type: PAPER
    2016 Volume E99.D Issue 1 Pages 128-137
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    In many movies, social conditions and awareness of the issues of the times are depicted in any form. Even if fantasy and science fiction are works far from reality, the character relationship does mirror the real world. Therefore, we try to understand social conditions of the real world by analyzing the movie. As a way to analyze the movies, we propose a method of estimating interpersonal relationships of the characters, using a machine learning technique called Markov Logic Network (MLN) from movie script databases on the Web. The MLN is a probabilistic logic network that can describe the relationships between characters, which are not necessarily satisfied on every line. In experiments, we confirmed that our proposed method can estimate favors between the characters in a movie with F-measure of 58.7%. Finally, by comparing the relationships with social indicators, we discussed the relevance of the movies to the real world.
    Download PDF (926K)
  • Soyoung CHUNG, Min Gyo CHUNG
    Type: LETTER
    2016 Volume E99.D Issue 1 Pages 138-140
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    Chen proposed an image quality assessment method to evaluate image quality at a ratio of noise in an image. However, Chen's method had some drawbacks that unnoticeable noise is reflected in the evaluation or noise position is not accurately detected. Therefore, in this paper, we propose a new image quality measurement scheme using the mean-centered WLNI (Weber's Law Noise Identifier) and the saliency map. The experimental results show that the proposed method outperforms Chen's and agrees more consistently with human visual judgment.
    Download PDF (763K)
Regular Section
  • Xiaojuan LIAO, Hui ZHANG, Miyuki KOSHIMURA
    Type: PAPER
    Subject area: Fundamentals of Information Systems
    2016 Volume E99.D Issue 1 Pages 141-150
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    Cold boot attack is a side channel attack that recovers data from memory, which persists for a short period after power is lost. In the course of this attack, the memory gradually degrades over time and only a corrupted version of the data may be available to the attacker. Recently, great efforts have been made to reconstruct the original data from a corrupted version of AES key schedules, based on the assumption that all bits in the charged states tend to decay to the ground states while no bit in the ground state ever inverts. However, in practice, there is a small number of bits flipping in the opposite direction, called reverse flipping errors. In this paper, motivated by the latest work that formulates the relations of AES key bits as a Boolean Satisfiability problem, we move one step further by taking the reverse flipping errors into consideration and employing off-the-shelf SAT and MaxSAT solvers to accomplish the recovery of AES-128 key schedules from decayed memory images. Experimental results show that, in the presence of reverse flipping errors, the MaxSAT approach enables reliable recovery of key schedules with significantly less time, compared with the SAT approach that relies on brute force search to find out the target errors. Moreover, in order to further enhance the efficiency of key recovery, we simplify the original problem by removing variables and formulas that have relatively weak relations to the whole key schedule. Experimental results demonstrate that the improved MaxSAT approach reduces the scale of the problem and recover AES key schedules more efficiently when the decay factor is relatively large.
    Download PDF (345K)
  • Passakorn PHANNACHITTA, Akito MONDEN, Jacky KEUNG, Kenichi MATSUMOTO
    Type: PAPER
    Subject area: Software Engineering
    2016 Volume E99.D Issue 1 Pages 151-162
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    Analogy-based software effort estimation has gained a considerable amount of attention in current research and practice. Its excellent estimation accuracy relies on its solution adaptation stage, where an effort estimate is produced from similar past projects. This study proposes a solution adaptation technique named LSA-X that introduces an approach to exploit the potential of productivity factors, i.e., project variables with a high correlation with software productivity, in the solution adaptation stage. The LSA-X technique tailors the exploitation of the productivity factors with a procedure based on the Linear Size Adaptation (LSA) technique. The results, based on 19 datasets show that in circumstances where a dataset exhibits a high correlation coefficient between productivity and a related factor (r≥0.30), the proposed LSA-X technique statistically outperformed (95% confidence) the other 8 commonly used techniques compared in this study. In other circumstances, our results suggest using any linear adaptation technique based on software size to compensate for the limitations of the LSA-X technique.
    Download PDF (481K)
  • Yang CAO, Masatoshi YOSHIKAWA
    Type: PAPER
    Subject area: Data Engineering, Web Information Systems
    2016 Volume E99.D Issue 1 Pages 163-175
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    Recent emerging mobile and wearable technologies make it easy to collect personal spatiotemporal data such as activity trajectories in daily life. Publishing real-time statistics over trajectory streams produced by crowds of people is expected to be valuable for both academia and business, answering questions such as “How many people are in Kyoto Station now?” However, analyzing these raw data will entail risks of compromising individual privacy. ε-Differential Privacy has emerged as a well-known standard for private statistics publishing because of its guarantee of being rigorous and mathematically provable. However, since user trajectories will be generated infinitely, it is difficult to protect every trajectory under ε-differential privacy. On the other hand, in real life, not all users require the same level of privacy. To this end, we propose a flexible privacy model of l-trajectory privacy to ensure every desired length of trajectory under protection of ε-differential privacy. We also design an algorithmic framework to publish l-trajectory private data in real time. Experiments using four real-life datasets show that our proposed algorithms are effective and efficient.
    Download PDF (3278K)
  • Hideko KAWAKUBO, Marthinus Christoffel DU PLESSIS, Masashi SUGIYAMA
    Type: PAPER
    Subject area: Artificial Intelligence, Data Mining
    2016 Volume E99.D Issue 1 Pages 176-186
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    In many real-world classification problems, the class balance often changes between training and test datasets, due to sample selection bias or the non-stationarity of the environment. Naive classifier training under such changes of class balance systematically yields a biased solution. It is known that such a systematic bias can be corrected by weighted training according to the test class balance. However, the test class balance is often unknown in practice. In this paper, we consider a semi-supervised learning setup where labeled training samples and unlabeled test samples are available and propose a class balance estimator based on the energy distance. Through experiments, we demonstrate that the proposed method is computationally much more efficient than existing approaches, with comparable accuracy.
    Download PDF (885K)
  • Truc Hung NGO, Yen-Wei CHEN, Naoki MATSUSHIRO, Masataka SEO
    Type: PAPER
    Subject area: Pattern Recognition
    2016 Volume E99.D Issue 1 Pages 187-196
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    Facial paralysis is a popular clinical condition occurring in 30 to 40 patients per 100,000 people per year. A quantitative tool to support medical diagnostics is necessary. This paper proposes a simple, visual and robust method that can objectively measure the degree of the facial paralysis by the use of spatiotemporal features. The main contribution of this paper is the proposal of an effective spatiotemporal feature extraction method based on a tracking of landmarks. Our method overcomes the drawbacks of the other techniques such as the influence of irrelevant regions, noise, illumination change and time-consuming process. In addition, the method is simple and visual. The simplification helps to reduce the time-consuming process. Also, the movements of landmarks, which relate to muscle movement ability, are visual. Therefore, the visualization helps reveal regions of serious facial paralysis. For recognition rate, experimental results show that our proposed method outperformed the other techniques tested on a dynamic facial expression image database.
    Download PDF (2418K)
  • Yuechan HAO, Bilan ZHU, Masaki NAKAGAWA
    Type: PAPER
    Subject area: Pattern Recognition
    2016 Volume E99.D Issue 1 Pages 197-207
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    This paper describes a significantly improved recognition system for on-line handwritten Japanese text free from line direction and character orientation constraints. The recognition system separates handwritten text of arbitrary character orientation and line direction into text line elements, estimates and normalizes character orientation and line direction, applies two-stage over-segmentation, constructs a segmentation-recognition candidate lattice and evaluates the likelihood of candidate segmentation-recognition paths by combining the scores of character recognition, geometric features and linguistic context. Enhancements over previous systems are made in line segmentation, over-segmentation and context integration model. The results of experiments on text from the HANDS-Kondate_t_bf-2001-11 database demonstrate significant improvements in the character recognition rate compared with the previous systems. Its recognition rate on text of arbitrary character orientation and line direction is now comparable with that possible on horizontal text with normal character orientation. Moreover, its recognition speed and memory requirement do not limit the platforms or applications that employ the recognition system.
    Download PDF (3581K)
  • Ran LI, Hongbing LIU, Jie CHEN, Zongliang GAN
    Type: PAPER
    Subject area: Image Processing and Video Processing
    2016 Volume E99.D Issue 1 Pages 208-218
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    The conventional bilateral motion estimation (BME) for motion-compensated frame rate up-conversion (MC-FRUC) can avoid the problem of overlapped areas and holes but usually results in lots of inaccurate motion vectors (MVs) since 1) the MV of an object between the previous and following frames is more likely to have no temporal symmetry with respect to the target block of the interpolated frame and 2) the repetitive patterns existing in video frame lead to the problem of mismatch due to the lack of the interpolated block. In this paper, a new BME algorithm with a low computational complexity is proposed to resolve the above problems. The proposed algorithm incorporates multi-resolution search into BME, since it can easily utilize the MV consistency between two adjacent pyramid levels and spatial neighboring MVs to correct the inaccurate MVs resulting from no temporal symmetry while guaranteeing low computational cost. Besides, the multi-resolution search uses the fast wavelet transform to construct the wavelet pyramid, which not only can guarantee low computational complexity but also can reserve the high-frequency components of image at each level while sub-sampling. The high-frequency components are used to regularize the traditional block matching criterion for reducing the probability of mismatch in BME. Experiments show that the proposed algorithm can significantly improve both the objective and subjective quality of the interpolated frame with low computational complexity, and provide the better performance than the existing BME algorithms.
    Download PDF (2836K)
  • Huimin LU, Yujie LI, Shota NAKASHIMA, Seiichi SERIKAWA
    Type: PAPER
    Subject area: Image Processing and Video Processing
    2016 Volume E99.D Issue 1 Pages 219-227
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    Absorption, scattering, and color distortion are three major issues in underwater optical imaging. Light rays traveling through water are scattered and absorbed according to their wavelength. Scattering is caused by large suspended particles that degrade underwater optical images. Color distortion occurs because different wavelengths are attenuated to different degrees in water; consequently, images of ambient underwater environments are dominated by a bluish tone. In the present paper, we propose a novel underwater imaging model that compensates for the attenuation discrepancy along the propagation path. In addition, we develop a fast weighted guided normalized convolution domain filtering algorithm for enhancing underwater optical images. The enhanced images are characterized by a reduced noise level, better exposure in dark regions, and improved global contrast, by which the finest details and edges are enhanced significantly.
    Download PDF (1558K)
  • Ryo MATSUOKA, Tomohiro YAMAUCHI, Tatsuya BABA, Masahiro OKUDA
    Type: PAPER
    Subject area: Image Processing and Video Processing
    2016 Volume E99.D Issue 1 Pages 228-235
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    We propose an image restoration technique that uses multiple image integration. The detail of the dark area when acquiring a dark scene is often deteriorated by sensor noise. Simple image integration inherently has the capability of reducing random noises, but it is especially insufficient in scenes that have a dark area. We introduce a novel image integration technique that optimizes the weights for the integration. We find the optimal weight map by solving a convex optimization problem for the weight optimization. Additionally, we apply the proposed weight optimization scheme to a single-image super-resolution problem, where we slightly modify the weight optimization problem to estimate the high-resolution image from a single low-resolution one. We use some of our experimental results to show that the weight optimization significantly improves the denoising and super-resolution performances.
    Download PDF (1752K)
  • Kazuhiro TASHIRO, Takahiro KAWAMURA, Yuichi SEI, Hiroyuki NAKAGAWA, Ya ...
    Type: PAPER
    Subject area: Image Recognition, Computer Vision
    2016 Volume E99.D Issue 1 Pages 236-247
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    The objective of this paper is to recognize and classify the poses of idols in still images on the web. The poses found in Japanese idol photos are often complicated and their classification is highly challenging. Although advances in computer vision research have made huge contributions to image recognition, it is not enough to estimate human poses accurately. We thus propose a method that refines result of human pose estimation by Pose Guide Ontology (PGO) and a set of energy functions. PGO, which we introduce in this paper, contains useful background knowledge, such as semantic hierarchies and constraints related to the positional relationship between body parts. Energy functions compute the right positions of body parts based on knowledge of the human body. Through experiments, we also refine PGO iteratively for further improvement of classification accuracy. We demonstrate pose classification into 8 classes on a dataset containing 400 idol images on the web. Result of experiments shows the efficiency of PGO and the energy functions; the F-measure of classification is 15% higher than the non-refined results. In addition to this, we confirm the validity of the energy functions.
    Download PDF (3318K)
  • Norimichi UKITA
    Type: PAPER
    Subject area: Image Recognition, Computer Vision
    2016 Volume E99.D Issue 1 Pages 248-256
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    We propose part-segment (PS) features for estimating an articulated pose in still images. The PS feature evaluates the image likelihood of each body part (e.g. head, torso, and arms) robustly to background clutter and nuisance textures on the body. While general gradient features (e.g. HOG) might include many nuisance responses, the PS feature represents only the region of the body part by iterative segmentation while updating the shape prior of each part. In contrast to similar segmentation features, part segmentation is improved by part-specific shape priors that are optimized by training images with fully-automatically obtained seeds. The shape priors are modeled efficiently based on clustering for fast extraction of PS features. The PS feature is fused complementarily with gradient features using discriminative training and adaptive weighting for robust and accurate evaluation of part similarity. Comparative experiments with public datasets demonstrate improvement in pose estimation by the PS features.
    Download PDF (2035K)
  • Zhen GUO, Yujie ZHANG, Chen SU, Jinan XU, Hitoshi ISAHARA
    Type: PAPER
    Subject area: Natural Language Processing
    2016 Volume E99.D Issue 1 Pages 257-264
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    Recent work on joint word segmentation, POS (Part Of Speech) tagging, and dependency parsing in Chinese has two key problems: the first is that word segmentation based on character and dependency parsing based on word were not combined well in the transition-based framework, and the second is that the joint model suffers from the insufficiency of annotated corpus. In order to resolve the first problem, we propose to transform the traditional word-based dependency tree into character-based dependency tree by using the internal structure of words and then propose a novel character-level joint model for the three tasks. In order to resolve the second problem, we propose a novel semi-supervised joint model for exploiting n-gram feature and dependency subtree feature from partially-annotated corpus. Experimental results on the Chinese Treebank show that our joint model achieved 98.31%, 94.84% and 81.71% for Chinese word segmentation, POS tagging, and dependency parsing, respectively. Our model outperforms the pipeline model of the three tasks by 0.92%, 1.77% and 3.95%, respectively. Particularly, the F1 value of word segmentation and POS tagging achieved the best result compared with those reported until now.
    Download PDF (1669K)
  • Yan LEI, Min ZHANG, Bixin LI, Jingan REN, Yinhua JIANG
    Type: LETTER
    Subject area: Software Engineering
    2016 Volume E99.D Issue 1 Pages 265-269
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    Many recent studies have focused on leveraging rich information types to increase useful information for improving fault localization effectiveness. However, they rarely investigate the impact of information richness on fault localization to give guidance on how to enrich information for improving localization effectiveness. This paper presents the first systematic study to fill this void. Our study chooses four representative information types and investigates the relationship between their richness and the localization effectiveness. The results show that information richness related to frequency execution count involves a high risk of degrading the localization effectiveness, and backward slice is effective in improving localization effectiveness.
    Download PDF (230K)
  • Chen CHEN, Chunyan HOU, Jiakun XIAO, Xiaojie YUAN
    Type: LETTER
    Subject area: Artificial Intelligence, Data Mining
    2016 Volume E99.D Issue 1 Pages 270-274
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    Purchase behavior prediction is one of the most important issues for the precision marketing of e-commerce companies. This Letter presents our solution to the purchase behavior prediction problem in E-commerce, specifically the task of Big Data Contest of China Computer Federation in 2014. The goal of this task is to predict which users will have the purchase behavior based on users' historical data. The traditional methods of recommendation encounter two crucial problems in this scenario. First, this task just predicts which users will have the purchase behavior, rather than which items should be recommended to which users. Second, the large-scale dataset poses a big challenge for building the empirical model. Feature engineering and Factorization Model shed some light on these problems. We propose to use Factorization Machines model based on the multiple classes and high dimensions of feature engineering. Experimental results on a real-world dataset demonstrate the advantages of our proposed method.
    Download PDF (909K)
  • Raissa RELATOR, Nozomi NAGANO, Tsuyoshi KATO
    Type: LETTER
    Subject area: Artificial Intelligence, Data Mining
    2016 Volume E99.D Issue 1 Pages 275-278
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    Although many 3D structures have been solved for proteins to date, functions of some proteins remain unknown. To predict protein functions, comparison of local structures of proteins with pre-defined model structures, whose functions have been elucidated, is widely performed. For the comparison, the root mean square deviation (RMSD) has been used as a conventional index. In this work, adaptive deviation was incorporated, along with Bregmann Divergence Regularized Machine, in order to detect analogous local structures with such model structures more effectively than the conventional index.
    Download PDF (123K)
  • Yi-Jia ZHANG, Zhong-Jian KANG, Xin-Feng LI, Zhe-Ming LU
    Type: LETTER
    Subject area: Artificial Intelligence, Data Mining
    2016 Volume E99.D Issue 1 Pages 279-282
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    The controllability of complex networks has attracted increasing attention within various scientific fields. Many power grids are complex networks with some common topological characteristics such as small-world and scale-free features. This Letter investigate the controllability of some real power grids in comparison with classical complex network models with the same number of nodes. Several conclusions are drawn after detailed analyses using several real power grids together with Erdös-Rényi (ER) random networks, Wattz-Strogatz (WS) small-world networks, Barabási-Albert (BA) scale-free networks and configuration model (CM) networks. The main conclusion is that most driver nodes of power grids are hub-free nodes with low nodal degree values of 1 or 2. The controllability of power grids is determined by degree distribution and heterogeneity, and power grids are harder to control than WS networks and CM networks while easier than BA networks. Some power grids are relatively difficult to control because they require a far higher ratio of driver nodes than ER networks, while other power grids are easier to control for they require a driver node ratio less than or equal to ER random networks.
    Download PDF (1388K)
  • M. Shahidur RAHMAN, Tetsuya SHIMAMURA
    Type: LETTER
    Subject area: Speech and Hearing
    2016 Volume E99.D Issue 1 Pages 283-287
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    This paper explores the potential of pitch determination from bone conducted (BC) speech. Pitch determination from normal air conducted (AC) speech signal can not attain the expected level of accuracy for every voice and background conditions. In contrast, since BC speech is caused by the vibrations that have traveled through the vocal tract wall, it is robust against ambient conditions. Though an appropriate model of BC speech is not known, it has regular harmonic structure in the lower spectral region. Due to this lowpass nature, pitch determination from BC speech is not usually affected by the dominant first formant. Experiments conducted on simultaneously recorded AC and BC speech show that BC speech is more reliable for pitch estimation than AC speech. With little human work, pitch contour estimated from BC speech can also be used as pitch reference that can serve as an alternate to the pitch contour extracted from laryngograph output which is sometimes inconsistent with simultaneously recorded AC speech.
    Download PDF (700K)
  • Xia WANG, Ruiyu LIANG, Qingyun WANG, Li ZHAO, Cairong ZOU
    Type: LETTER
    Subject area: Speech and Hearing
    2016 Volume E99.D Issue 1 Pages 288-291
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    In this letter, an effective acoustic feedback cancellation algorithm is proposed based on the normalized sub-band adaptive filter (NSAF). To improve the confliction between fast convergence rate and low misalignment in the NSAF algorithm, a variable step size is designed to automatically vary according to the update state of the filter. The update state of the filter is adaptively detected via the normalized distance between the long term average and the short term average of the tap-weight vector. Simulation results demonstrate that the proposed algorithm has superior performance in terms of convergence rate and misalignment.
    Download PDF (310K)
  • Qingyun WANG, Ruiyu LIANG, Li JING, Cairong ZOU, Li ZHAO
    Type: LETTER
    Subject area: Speech and Hearing
    2016 Volume E99.D Issue 1 Pages 292-295
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    Since digital hearing aids are sensitive to time delay and power consumption, the computational complexity of noise reduction must be reduced as much as possible. Therefore, some complicated algorithms based on the analysis of the time-frequency domain are very difficult to implement in digital hearing aids. This paper presents a new approach that yields an improved noise reduction algorithm with greatly reduce computational complexity for multi-channel digital hearing aids. First, the sub-band sound pressure level (SPL) is calculated in real time. Then, based on the calculated sub-band SPL, the noise in the sub-band is estimated and the possibility of speech is computed. Finally, a posteriori and a priori signal-to-noise ratios are estimated and the gain function is acquired to reduce the noise adaptively. By replacing the FFT and IFFT transforms by the known SPL, the proposed algorithm greatly reduces the computation loads. Experiments on a prototype digital hearing aid show that the time delay is decreased to nearly half that of the traditional adaptive Wiener filtering and spectral subtraction algorithms, but the SNR improvement and PESQ score are rather satisfied. Compared with modulation frequency-based noise reduction algorithm, which is used in many commercial digital hearing aids, the proposed algorithm achieves not only more than 5dB SNR improvement but also less time delay and power consumption.
    Download PDF (841K)
  • Meng SUN, Hugo VAN HAMME, Yimin WANG, Xiongwei ZHANG
    Type: LETTER
    Subject area: Speech and Hearing
    2016 Volume E99.D Issue 1 Pages 296-299
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    Unsupervised spoken unit discovery or zero-source speech recognition is an emerging research topic which is important for spoken document analysis of languages or dialects with little human annotation. In this paper, we extend our earlier joint training framework for unsupervised learning of discrete density HMM to continuous density HMM (CDHMM) and apply it to spoken unit discovery. In the proposed recipe, we first cluster a group of Gaussians which then act as initializations to the joint training framework of nonnegative matrix factorization and semi-continuous density HMM (SCDHMM). In SCDHMM, all the hidden states share the same group of Gaussians but with different mixture weights. A CDHMM is subsequently constructed by tying the top-N activated Gaussians to each hidden state. Baum-Welch training is finally conducted to update the parameters of the Gaussians, mixture weights and HMM transition probabilities. Experiments were conducted on word discovery from TIDIGITS and phone discovery from TIMIT. For TIDIGITS, units were modeled by 10 states which turn out to be strongly related to words; while for TIMIT, units were modeled by 3 states which are likely to be phonemes.
    Download PDF (206K)
  • Jae-Hee JUN, Ji-Hoon CHOI, Jong-Ok KIM
    Type: LETTER
    Subject area: Image Processing and Video Processing
    2016 Volume E99.D Issue 1 Pages 300-304
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    This letter proposes a novel post-processing method for self-similarity based super-resolution (SR). Existing back-projection (BP) methods enhance SR images by refining the reconstructed coarse high-frequency (HF) information. However, it causes artifacts due to interpolation and excessively smoothes small HF signals, particularly in texture regions. Motivated by these observations, we propose a novel post-processing method referred to as middle-frequency (MF) based refinement. The proposed method refines the reconstructed HF information in the MF domain rather than in the spatial domain, as in BP. In addition, it does not require an internal interpolation process, so it is free from the side-effects of interpolation. Experimental results show that the proposed algorithm provides superior performance in terms of both the quantity of reproduced HF information and the visual quality.
    Download PDF (624K)
  • Zifen HE, Yinhui ZHANG
    Type: LETTER
    Subject area: Image Processing and Video Processing
    2016 Volume E99.D Issue 1 Pages 305-308
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    This work presents an approximate global optimization method for image halftone by fusing multi-scale information of the tree model. We employ Gaussian mixture model and hidden Markov tree to characterized the intra-scale clustering and inter-scale persistence properties of the detailed coefficients, respectively. The model of multiscale perceived error metric and the theory of scale-related perceived error metric are used to fuse the statistical distribution of the error metric of the scale of clustering and cross-scale persistence. An Energy function is then generated. Through energy minimization via graph cuts, we gain the halftone image. In the related experiment, we demonstrate the superior performance of this new algorithm when compared with several algorithms and quantitative evaluation.
    Download PDF (782K)
  • Keun-Chang KWAK
    Type: LETTER
    Subject area: Biocybernetics, Neurocomputing
    2016 Volume E99.D Issue 1 Pages 309-312
    Published: January 01, 2016
    Released: January 01, 2016
    JOURNALS FREE ACCESS
    In this paper, a method for designing of Incremental Granular Model (IGM) based on integration of Linear Regression (LR) and Linguistic Model (LM) with the aid of fuzzy granulation is proposed. Here, IGM is designed by the use of information granulation realized via Context-based Interval Type-2 Fuzzy C-Means (CIT2FCM) clustering. This clustering approach are used not only to estimate the cluster centers by preserving the homogeneity between the clustered patterns from linguistic contexts produced in the output space, but also deal with the uncertainty associated with fuzzification factor. Furthermore, IGM is developed by construction of a LR as a global model, refine it through the local fuzzy if-then rules that capture more localized nonlinearities of the system by LM. The experimental results on two examples reveal that the proposed method shows a good performance in comparison with the previous works.
    Download PDF (602K)
feedback
Top