IEICE Transactions on Information and Systems

Special Section on Medical Imaging

FOREWORD

Hiroshi FUJITA

2013Volume E96.DIssue 4 Pages 771
Published: April 01, 2013
Released on J-STAGE: April 01, 2013

DOIhttps://doi.org/10.1587/transinf.E96.D.771

JOURNAL FREE ACCESS

Download PDF (74K)
Machine Learning in Computer-Aided Diagnosis of the Thorax and Colon in CT: A Survey

Kenji SUZUKI

Article type: INVITED SURVEY PAPER
Subject area: Computer-Aided Diagnosis
2013Volume E96.DIssue 4 Pages 772-783
Published: April 01, 2013
Released on J-STAGE: April 01, 2013

DOIhttps://doi.org/10.1587/transinf.E96.D.772

JOURNAL FREE ACCESS

Show abstractHide abstract

Computer-aided detection (CADe) and diagnosis (CAD) has been a rapidly growing, active area of research in medical imaging. Machine leaning (ML) plays an essential role in CAD, because objects such as lesions and organs may not be represented accurately by a simple equation; thus, medical pattern recognition essentially require “learning from examples.” One of the most popular uses of ML is the classification of objects such as lesion candidates into certain classes (e.g., abnormal or normal, and lesions or non-lesions) based on input features (e.g., contrast and area) obtained from segmented lesion candidates. The task of ML is to determine “optimal” boundaries for separating classes in the multi-dimensional feature space which is formed by the input features. ML algorithms for classification include linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), multilayer perceptrons, and support vector machines (SVM). Recently, pixel/voxel-based ML (PML) emerged in medical imageprocessing/analysis, which uses pixel/voxel values in images directly, instead of features calculated from segmented lesions, as input information; thus, feature calculation or segmentation is not required. In this paper, ML techniques used in CAD schemes for detection and diagnosis of lung nodules in thoracic CT and for detection of polyps in CT colonography (CTC) are surveyed and reviewed.

View full abstract

Download PDF (3243K)
A Survey on Statistical Modeling and Machine Learning Approaches to Computer Assisted Medical Intervention: Intraoperative Anatomy Modeling and Optimization of Interventional Procedures

Ken'ichi MOROOKA, Masahiko NAKAMOTO, Yoshinobu SATO

Article type: SURVEY PAPER
Subject area: Computer Assisted Medical Intervention
2013Volume E96.DIssue 4 Pages 784-797
Published: April 01, 2013
Released on J-STAGE: April 01, 2013

DOIhttps://doi.org/10.1587/transinf.E96.D.784

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper reviews methods for computer assisted medical intervention using statistical models and machine learning technologies, which would be particularly useful for representing prior information of anatomical shape, motion, and deformation to extrapolate intraoperative sparse data as well as surgeons' expertise and pathology to optimize interventions. Firstly, we present a review of methods for recovery of static anatomical structures by only using intraoperative data without any preoperative patient-specific information. Then, methods for recovery of intraoperative motion and deformation are reviewed by combining intraoperative sparse data with preoperative patient-specific stationary data, which is followed by a survey of articles which incorporated biomechanics. Furthermore, the articles are reviewed which addressed the used of statistical models for optimization of interventions. Finally, we conclude the survey by describing the future perspective.

View full abstract

Download PDF (3964K)
Segmentation of Liver in Low-Contrast Images Using K-Means Clustering and Geodesic Active Contour Algorithms

Amir H. FORUZAN, Yen-Wei CHEN, Reza A. ZOROOFI, Akira FURUKAWA, Yoshin ...

Article type: PAPER
Subject area: Medical Image Processing
2013Volume E96.DIssue 4 Pages 798-807
Published: April 01, 2013
Released on J-STAGE: April 01, 2013

DOIhttps://doi.org/10.1587/transinf.E96.D.798

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we present an algorithm to segment the liver in low-contrast CT images. As the first step of our algorithm, we define a search range for the liver boundary. Then, the EM algorithm is utilized to estimate parameters of a ‘Gaussian Mixture’ model that conforms to the intensity distribution of the liver. Using the statistical parameters of the intensity distribution, we introduce a new thresholding technique to classify image pixels. We assign a distance feature vectors to each pixel and segment the liver by a K-means clustering scheme. This initial boundary of the liver is conditioned by the Fourier transform. Then, a Geodesic Active Contour algorithm uses the boundaries to find the final surface. The novelty in our method is the proper selection and combination of sub-algorithms so as to find the border of an object in a low-contrast image. The number of parameters in the proposed method is low and the parameters have a low range of variations. We applied our method to 30 datasets including normal and abnormal cases of low-contrast/high-contrast images and it was extensively evaluated both quantitatively and qualitatively. Minimum of Dice similarity measures of the results is 0.89. Assessment of the results proves the potential of the proposed method for segmentation in low-contrast images.

View full abstract

Download PDF (4105K)
Automated Ulcer Detection Method from CT Images for Computer Aided Diagnosis of Crohn's Disease

Masahiro ODA, Takayuki KITASAKA, Kazuhiro FURUKAWA, Osamu WATANABE, Ta ...

Article type: PAPER
Subject area: Medical Image Processing
2013Volume E96.DIssue 4 Pages 808-818
Published: April 01, 2013
Released on J-STAGE: April 01, 2013

DOIhttps://doi.org/10.1587/transinf.E96.D.808

JOURNAL FREE ACCESS

Show abstractHide abstract

Crohn's disease commonly affects the small and large intestines. Its symptoms include ulcers and intestinal stenosis, and its diagnosis is currently performed using an endoscope. However, because the endoscope cannot pass through the stenosed parts of the intestines, diagnosis of the entire intestines is difficult. A CT image-based method is expected to become an alternative way for the diagnosis of Crohn's disease because it enables observation of the entire intestine even if stenosis exists. To achieve efficient CT image-based diagnosis, diagnostic-aid by computers is required. This paper presents an automated detection method of the surface of ulcers in the small and large intestines from fecal tagging CT images. Ulcers cause rough surfaces on the intestinal wall and consist of small convex and concave (CC) regions. We detect them by blob and inverse-blob structure enhancement filters. A roughness value is utilized to reduce the false positives of the detection results. Many CC regions are concentrated in ulcers. The roughness value evaluates the concentration ratio of the detected regions. Detected regions with low roughness values are removed by a thresholding process. The thickness of the intestinal lumen and the CT values of the surrounding tissue of the intestinal lumen are also used to reduce false positives. Experimental results using ten cases of CT images showed that our proposed method detects 70.6% of ulcers with 12.7FPs/case. The proposed method detected most of the ulcers.

View full abstract

Download PDF (2003K)
A Proposal of Spatio-Temporal Reconstruction Method Based on a Fast Block-Iterative Algorithm

Tatsuya KON, Takashi OBI, Hideaki TASHIMA, Nagaaki OHYAMA

Article type: PAPER
Subject area: Medical Image Processing
2013Volume E96.DIssue 4 Pages 819-825
Published: April 01, 2013
Released on J-STAGE: April 01, 2013

DOIhttps://doi.org/10.1587/transinf.E96.D.819

JOURNAL FREE ACCESS

Show abstractHide abstract

Parametric images can help investigate disease mechanisms and vital functions. To estimate parametric images, it is necessary to obtain the tissue time activity curves (tTACs), which express temporal changes of tracer activity in human tissue. In general, the tTACs are calculated from each voxel's value of the time sequential PET images estimated from dynamic PET data. Recently, spatio-temporal PET reconstruction methods have been proposed in order to take into account the temporal correlation within each tTAC. Such spatio-temporal algorithms are generally quite computationally intensive. On the other hand, typical algorithms such as the preconditioned conjugate gradient (PCG) method still does not provide good accuracy in estimation. To overcome these problems, we propose a new spatio-temporal reconstruction method based on the dynamic row-action maximum-likelihood algorithm (DRAMA). As the original algorithm does, the proposed method takes into account the noise propagation, but it achieves much faster convergence. Performance of the method is evaluated with digital phantom simulations and it is shown that the proposed method requires only a few reconstruction processes, thereby remarkably reducing the computational cost required to estimate the tTACs. The results also show that the tTACs and parametric images from the proposed method have better accuracy.

View full abstract

Download PDF (1114K)
Fast and Robust 3D Correspondence Matching and Its Application to Volume Registration

Yuichiro TAJIMA, Kinya FUDANO, Koichi ITO, Takafumi AOKI

Article type: PAPER
Subject area: Medical Image Processing
2013Volume E96.DIssue 4 Pages 826-835
Published: April 01, 2013
Released on J-STAGE: April 01, 2013

DOIhttps://doi.org/10.1587/transinf.E96.D.826

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper presents a fast and accurate volume correspondence matching method using 3D Phase-Only Correlation (POC). The proposed method employs (i) a coarse-to-fine strategy using multi-scale volume pyramids for correspondence search and (ii) high-accuracy POC-based local block matching for finding dense volume correspondence with sub-voxel displacement accuracy. This paper also proposes its GPU implementation to achieve fast and practical computation of volume registration. Experimental evaluation shows that the proposed approach exhibits higher accuracy and lower computational cost compared with conventional method. We also demonstrate that the GPU implementation of the proposed method can align two volume data in several seconds, which is suitable for practical use in the image-guided radiation therapy.

View full abstract

Download PDF (1743K)
Classification of Pneumoconiosis on HRCT Images for Computer-Aided Diagnosis

Wei ZHAO, Rui XU, Yasushi HIRANO, Rie TACHIBANA, Shoji KIDO, Narufumi ...

Article type: PAPER
Subject area: Computer-Aided Diagnosis
2013Volume E96.DIssue 4 Pages 836-844
Published: April 01, 2013
Released on J-STAGE: April 01, 2013

DOIhttps://doi.org/10.1587/transinf.E96.D.836

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper describes a computer-aided diagnosis (CAD) method to classify pneumoconiosis on HRCT images. In Japan, the pneumoconiosis is divided into 4 types according to the density of nodules: Type 1 (no nodules), Type 2 (few small nodules), Type 3-a (numerous small nodules) and Type 3-b (numerous small nodules and presence of large nodules). Because most pneumoconiotic nodules are small-sized and irregular-shape, only few nodules can be detected by conventional nodule extraction methods, which would affect the classification of pneumoconiosis. To improve the performance of nodule extraction, we proposed a filter based on analysis the eigenvalues of Hessian matrix. The classification of pneumoconiosis is performed in the following steps: Firstly the large-sized nodules were extracted and cases of type 3-b were recognized. Secondly, for the rest cases, the small nodules were detected and false positives were eliminated. Thirdly we adopted a bag-of-features-based method to generate input vectors for a support vector machine (SVM) classifier. Finally cases of type 1,2 and 3-a were classified. The proposed method was evaluated on 175 HRCT scans of 112 subjects. The average accuracy of classification is 90.6%. Experimental result shows that our method would be helpful to classify pneumoconiosis on HRCT.

View full abstract

Download PDF (2228K)
A Bag-of-Features Approach to Classify Six Types of Pulmonary Textures on High-Resolution Computed Tomography

Rui XU, Yasushi HIRANO, Rie TACHIBANA, Shoji KIDO

Article type: PAPER
Subject area: Computer-Aided Diagnosis
2013Volume E96.DIssue 4 Pages 845-855
Published: April 01, 2013
Released on J-STAGE: April 01, 2013

DOIhttps://doi.org/10.1587/transinf.E96.D.845

JOURNAL FREE ACCESS

Show abstractHide abstract

Computer-aided diagnosis (CAD) systems on diffuse lung diseases (DLD) were required to facilitate radiologists to read high-resolution computed tomography (HRCT) scans. An important task on developing such CAD systems was to make computers automatically recognize typical pulmonary textures of DLD on HRCT. In this work, we proposed a bag-of-features based method for the classification of six kinds of DLD patterns which were consolidation (CON), ground-glass opacity (GGO), honeycombing (HCM), emphysema (EMP), nodular (NOD) and normal tissue (NOR). In order to successfully apply the bag-of-features based method on this task, we focused to design suitable local features and the classifier. Considering that the pulmonary textures were featured by not only CT values but also shapes, we proposed a set of statistical measures based local features calculated from both CT values and eigen-values of Hessian matrices. Additionally, we designed a support vector machine (SVM) classifier by optimizing parameters related to both kernels and the soft-margin penalty constant. We collected 117 HRCT scans from 117 subjects for experiments. Three experienced radiologists were asked to review the data and their agreed-regions where typical textures existed were used to generate 3009 3D volume-of-interest (VOIs) with the size of 32×32×32. These VOIs were separated into two sets. One set was used for training and tuning parameters, and the other set was used for evaluation. The overall recognition accuracy for the proposed method was 93.18%. The precisions/sensitivities for each texture were 96.67%/95.08% (CON), 92.55%/94.02% (GGO), 97.67%/99.21% (HCM), 94.74%/93.99% (EMP), 81.48%/86.03%(NOD) and 94.33%/90.74% (NOR). Additionally, experimental results showed that the proposed method performed better than four kinds of baseline methods, including two state-of-the-art methods on classification of DLD textures.

View full abstract

Download PDF (757K)
Multi-Layer Virtual Slide Scanning System with Multi-Focus Image Fusion for Cytopathology and Image Diagnosis

Hiroyuki NOZAKA, Tomisato MIURA, Zhongxi ZHENG

Article type: PAPER
Subject area: Diagnostic Systems
2013Volume E96.DIssue 4 Pages 856-863
Published: April 01, 2013
Released on J-STAGE: April 01, 2013

DOIhttps://doi.org/10.1587/transinf.E96.D.856

JOURNAL FREE ACCESS

Show abstractHide abstract

Objective: The virtual slides are high-magnification whole digital images of histopathological tissue sections. The existing virtual slide system, which is optimized for scanning flat and smooth plane slides such as histopathological paraffin-embedded tissue sections, but is unsuitable for scanning irregular plane slides such as cytological smear slides. This study aims to develop a virtual slide system suitable for cytopathology slide scanning and to evaluate the effectiveness of multi-focus image fusion (MF) in cytopathological diagnosis. Study Design: We developed a multi-layer virtual slide scanning system with MF technology. Tumors for this study were collected from 21 patients diagnosed with primary breast cancer. After surgical extraction, smear slide for cytopathological diagnosis were manufactured by the conventional stamp method, fine needle aspiration method (FNA), and tissue washing method. The stamp slides were fixed in 95% ethanol. FNA and tissue washing samples were fixed in CytoRich RED Preservative Fluid, a liquid-based cytopathology (LBC). These slides were stained with Papanicolaou stain, and scanned by virtual slide system. To evaluate the suitability of MF technology in cytopathological diagnosis, we compared single focus (SF) virtual slide with MF virtual slide. Cytopathological evaluation was carried out by 5 pathologists and cytotechnologists. Results: The virtual slide system with MF provided better results than the conventional SF virtual slide system with regard to viewing inside cell clusters and image file size. Liquid-based cytology was more suitable than the stamp method for virtual slides with MF. Conclusion: The virtual slide system with MF is a useful technique for the digitization in cytopathology, and this technology could be applied to tele-cytology and e-learning by virtual slide system.

View full abstract

Download PDF (1311K)
Ensemble Learning Based Segmentation of Metastatic Liver Tumours in Contrast-Enhanced Computed Tomography

Akinobu SHIMIZU, Takuya NARIHIRA, Hidefumi KOBATAKE, Daisuke FURUKAWA, ...

Article type: LETTER
Subject area: Medical Image Processing
2013Volume E96.DIssue 4 Pages 864-868
Published: April 01, 2013
Released on J-STAGE: April 01, 2013

DOIhttps://doi.org/10.1587/transinf.E96.D.864

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper presents an ensemble learning algorithm for liver tumour segmentation from a CT volume in the form of U-Boostand extends the loss functions to improve performance. Five segmentation algorithms trained by the ensemble learning algorithm with different loss functions are compared in terms of error rate and Jaccard Index between the extracted regions and true ones.

View full abstract

Download PDF (813K)
Model-Based Approach to Recognize the Rectus Abdominis Muscle in CT Images

Naoki KAMIYA, Xiangrong ZHOU, Huayue CHEN, Chisako MURAMATSU, Takeshi ...

Article type: LETTER
Subject area: Medical Image Processing
2013Volume E96.DIssue 4 Pages 869-871
Published: April 01, 2013
Released on J-STAGE: April 01, 2013

DOIhttps://doi.org/10.1587/transinf.E96.D.869

JOURNAL FREE ACCESS

Show abstractHide abstract

Our purpose in this study is to develop a scheme to segment the rectus abdominis muscle region in X-ray CT images. We propose a new muscle recognition method based on the shape model. In this method, three steps are included in the segmentation process. The first is to generate a shape model for representing the rectus abdominis muscle. The second is to recognize anatomical feature points corresponding to the origin and insertion of the muscle, and the third is to segment the rectus abdominis muscles using the shape model. We generated the shape model from 20 CT cases and tested the model to recognize the muscle in 10 other CT cases. The average value of the Jaccard similarity coefficient (JSC) between the manually and automatically segmented regions was 0.843. The results suggest the validity of the model-based segmentation for the rectus abdominis muscle.

View full abstract

Download PDF (371K)

Regular Section

Application of an Artificial Fish Swarm Algorithm in Symbolic Regression

Qing LIU, Tomohiro ODAKA, Jousuke KUROIWA, Hisakazu OGURA

Article type: PAPER
Subject area: Fundamentals of Information Systems
2013Volume E96.DIssue 4 Pages 872-885
Published: April 01, 2013
Released on J-STAGE: April 01, 2013

DOIhttps://doi.org/10.1587/transinf.E96.D.872

JOURNAL FREE ACCESS

Show abstractHide abstract

An artificial fish swarm algorithm for solving symbolic regression problems is introduced in this paper. In the proposed AFSA, AF individuals represent candidate solutions, which are represented by the gene expression scheme in GEP. For evaluating AF individuals, a penalty-based fitness function, in which the node number of the parse tree is considered to be a constraint, was designed in order to obtain a solution expression that not only fits the given data well but is also compact. A number of important conceptions are defined, including distance, partners, congestion degree, and feature code. Based on the above concepts, we designed four behaviors, namely, randomly moving behavior, preying behavior, following behavior, and avoiding behavior, and present their respective formalized descriptions. The exhaustive simulation results demonstrate that the proposed algorithm can not only obtain a high-quality solution expression but also provides remarkable robustness and quick convergence.

View full abstract

Download PDF (2207K)
A Scalable Communication-Induced Checkpointing Algorithm for Distributed Systems

Alberto CALIXTO SIMON, Saul E. POMARES HERNANDEZ, Jose Roberto PEREZ C ...

Article type: PAPER
Subject area: Fundamentals of Information Systems
2013Volume E96.DIssue 4 Pages 886-896
Published: April 01, 2013
Released on J-STAGE: April 01, 2013

DOIhttps://doi.org/10.1587/transinf.E96.D.886

JOURNAL FREE ACCESS

Show abstractHide abstract

Communication-induced checkpointing (CIC) has two main advantages: first, it allows processes in a distributed computation to take asynchronous checkpoints, and secondly, it avoids the domino effect. To achieve these, CIC algorithms piggyback information on the application messages and take forced local checkpoints when they recognize potentially dangerous patterns. The main disadvantages of CIC algorithms are the amount of overhead per message and the induced storage overhead. In this paper we present a communication-induced checkpointing algorithm called Scalable Fully-Informed (S-FI) that attacks the problem of message overhead. For this, our algorithm modifies the Fully-Informed algorithm by integrating it with the immediate dependency principle. The S-FI algorithm was simulated and the result shows that the algorithm is scalable since the message overhead presents an under-linear growth as the number of processes and/or the message density increase.

View full abstract

Download PDF (561K)
AspectQuery: A Method for Identification of Crosscutting Concerns in the Requirement Phase

Chengwan HE, Chengmao TU

Article type: PAPER
Subject area: Software Engineering
2013Volume E96.DIssue 4 Pages 897-905
Published: April 01, 2013
Released on J-STAGE: April 01, 2013

DOIhttps://doi.org/10.1587/transinf.E96.D.897

JOURNAL FREE ACCESS

Show abstractHide abstract

Identification of early aspects is the critical problem in aspect-oriented requirement engineering. But the representation of crosscutting concerns is various, which makes the identification difficult. To address the problem, this paper proposes the AspectQuery method based on goal model. We analyze four kinds of goal decomposition models, then summarize the main factors about identification of crosscutting concerns and conclude the identification rules based on a goal model. A goal is crosscutting concern when it satisfies one of the following conditions: i) the goal is contributed to realize one soft-goal; ii) parent goal of the goal is candidate crosscutting concern; iii) the goal has at least two parent goals. AspectQuery includes four steps: building the goal model, transforming the goal model, identifying the crosscutting concerns by identification rules, and composing the crosscutting concerns with the goals affected by them. We illustrate the AspectQuery method through a case study (a ticket booking management system). The results show the effectiveness of AspectQueryin identifying crosscutting concerns in the requirement phase.

View full abstract

Download PDF (1305K)
Efficient XML Retrieval Service with Complete Path Representation

Hsu-Kuang CHANG, King-Chu HUNG, I-Chang JOU

Article type: PAPER
Subject area: Data Engineering, Web Information Systems
2013Volume E96.DIssue 4 Pages 906-917
Published: April 01, 2013
Released on J-STAGE: April 01, 2013

DOIhttps://doi.org/10.1587/transinf.E96.D.906

JOURNAL FREE ACCESS

Show abstractHide abstract

Compiling documents in extensible markup language (XML) increasingly requires access to data services which provide both rapid response and the precise use of search engines. Efficient data service should be based on a skillful representation that can support low complexity and high precision search capabilities. In this paper, a novel complete path representation (CPR) associated with a modified inverted index is presented to provide efficient XML data services, where queries can be versatile in terms of predicates. CPR can completely preserve hierarchical information, and the new index is used to save semantic information. The CPR approach can provide template-based indexing for fast data searches. An experiment is also conducted for the evaluation of the CPR approach.

View full abstract

Download PDF (3499K)
Failure Microscope: Precisely Diagnosing Routing Instability

Hongjun LIU, Baokang ZHAO, Xiaofeng HU, Dan ZHAO, Xicheng LU

Article type: PAPER
Subject area: Information Network
2013Volume E96.DIssue 4 Pages 918-926
Published: April 01, 2013
Released on J-STAGE: April 01, 2013

DOIhttps://doi.org/10.1587/transinf.E96.D.918

JOURNAL FREE ACCESS

Show abstractHide abstract

Root cause analysis of BGP updates is the key to debug and troubleshoot BGP routing problems. However, it is a challenge to precisely diagnose the cause and the origin of routing instability. In this paper, we are the first to distinguish link failure events from policy change events based on BGP updates from single vantage points by analyzing the relationship of the closed loops formed through intersecting all the transient paths during instability and the length variation of the stable paths after instability. Once link failure events are recognized, their origins are precisely inferred with 100% accuracy. Through simulation, our method is effective to distinguish link failure events from link restoration events and policy related events, and reduce the size of candidate set of origins.

View full abstract

Download PDF (597K)
Development of a Robust and Compact On-Line Handwritten Japanese Text Recognizer for Hand-Held Devices

Jinfeng GAO, Bilan ZHU, Masaki NAKAGAWA

Article type: PAPER
Subject area: Pattern Recognition
2013Volume E96.DIssue 4 Pages 927-938
Published: April 01, 2013
Released on J-STAGE: April 01, 2013

DOIhttps://doi.org/10.1587/transinf.E96.D.927

JOURNAL FREE ACCESS

Show abstractHide abstract

The paper describes how a robust and compact on-line handwritten Japanese text recognizer was developed by compressing each component of an integrated text recognition system including a SVM classifier to evaluate segmentation points, an on-line and off-line combined character recognizer, a linguistic context processor, and a geometric context evaluation module to deploy it on hand-held devices. Selecting an elastic-matching based on-line recognizer and compressing MQDF2 via a combination of LDA, vector quantization and data type transformation, have contributed to building a remarkably small yet robust recognizer. The compact text recognizer covering 7,097 character classes just requires about 15MB memory to keep 93.11% accuracy on horizontal text lines extracted from the TUAT Kondate database. Compared with the original full-scale Japanese text recognizer, the memory size is reduced from 64.1MB to 14.9MB while the accuracy loss is only 0.5% from 93.6% to 93.11%. The method is scalable so even systems of less than 11MB or less than 6MB still remain 92.80% or 90.02% accuracy, respectively.

View full abstract

Download PDF (823K)
A Bayesian Framework Using Multiple Model Structures for Speech Recognition

Sayaka SHIOTA, Kei HASHIMOTO, Yoshihiko NANKAKU, Keiichi TOKUDA

Article type: PAPER
Subject area: Speech and Hearing
2013Volume E96.DIssue 4 Pages 939-948
Published: April 01, 2013
Released on J-STAGE: April 01, 2013

DOIhttps://doi.org/10.1587/transinf.E96.D.939

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper proposes an acoustic modeling technique based on Bayesian framework using multiple model structures for speech recognition. The aim of the Bayesian approach is to obtain good prediction of observation by marginalizing all variables related to generative processes. Although the effectiveness of marginalizing model parameters was recently reported in speech recognition, most of these systems use only “one” model structure, e.g., topologies of HMMs, the number of states and mixtures, types of state output distributions, and parameter tying structures. However, it is insufficient to represent a true model distribution, because a family of such models usually does not include a true distribution in most practical cases. One of solutions of this problem is to use multiple model structures. Although several approaches using multiple model structures have already been proposed, the consistent integration of multiple model structures based on the Bayesian approach has not seen in speech recognition. This paper focuses on integrating multiple phonetic decision trees based on the Bayesian framework in HMM based acoustic modeling. The proposed method is derived from a new marginal likelihood function which includes the model structures as a latent variable in addition to HMM state sequences and model parameters, and the posterior distributions of these latent variables are obtained using the variational Bayesian method. Furthermore, to improve the optimization algorithm, the deterministic annealing EM (DAEM) algorithm is applied to the training process. The proposed method effectively utilizes multiple model structures, especially in the early stage of training and this leads to better predictive distributions and improvement of recognition performance.

View full abstract

Download PDF (563K)
Homomorphic Filtered Spectral Peaks Energy for Automatic Detection of Vowel Onset Point in Continuous Speech

Xian ZANG, Kil To CHONG

Article type: PAPER
Subject area: Speech and Hearing
2013Volume E96.DIssue 4 Pages 949-956
Published: April 01, 2013
Released on J-STAGE: April 01, 2013

DOIhttps://doi.org/10.1587/transinf.E96.D.949

JOURNAL FREE ACCESS

Show abstractHide abstract

During the production of speech signals, the vowel onset point is an important event containing important information for many speech processing tasks, such as consonant-vowel unit recognition and speech end-points detection. In order to realize accurate automatic detection of vowel onset points, this paper proposes a reliable method using the energy characteristics of homomorphic filtered spectral peaks. The homomorphic filtering helps to separate the slowly varying vocal tract system characteristics from the rapidly fluctuating excitation characteristics in the cepstral domain. The distinct vocal tract shape related to vowels is obtained and the peaks in the estimated vocal tract spectrum provide accurate and stable information for VOP detection. Performance of the proposed method is compared with the existing method which uses the combination of evidence from the excitation source, spectral peaks, and modulation spectrum energies. The detection rate with different time resolutions, together with the missing rate and spurious rate, are used for comprehensive evaluation of the performance on continuous speech taken from the TIMIT database. The detection accuracy of the proposed method is 74.14% for ∼ ±10ms resolution and it increases to 96.33% for ±40ms resolution with 3.67% missing error and 4.14% spurious error, much better than the results obtained by the combined approach at each specified time resolution, especially the higher resolutions of ±10±30ms. In the cases of speech corrupted by white noise, pink noise and f-16 noise, the proposed method also shows significant improvement in the performance compared with the existing method.

View full abstract

Download PDF (1154K)
A Novel Imaging Method for Cell Phone Camera in Low Ambient Light Conditions Using Flash and No-Flash Image Pairs

Lin-bo LUO, Jun CHEN, Sang-woo AN, Chang-shuai WANG, Jong-joo PARK, Yi ...

Article type: PAPER
Subject area: Image Processing and Video Processing
2013Volume E96.DIssue 4 Pages 957-962
Published: April 01, 2013
Released on J-STAGE: April 01, 2013

DOIhttps://doi.org/10.1587/transinf.E96.D.957

JOURNAL FREE ACCESS

Show abstractHide abstract

In lowlight conditions, images taken by phone cameras usually have too much noise, while images taken using a flash have a high signal-noise ratio (SNR) and look unnatural. This paper proposes a novel imaging method using flash/no-flash image pairs. Through transferring the natural tone of the former to the latter, the resulting image has a high SNR and maintains a natural appearance. For realtime implementation, we use two preview images, which are taken with and without flash, to estimate the transformation function in advance. Then we use this function to adjust the tone of the image captured with flash in real time. Thus, the method does not require a frame memory and it is suitable for cell phone cameras.

View full abstract

Download PDF (2236K)
A Low-Power Packet Memory Architecture with a Latency-Aware Packet Mapping Method

Hyuk-Jun LEE, Seung-Chul KIM, Eui-Young CHUNG

Article type: LETTER
Subject area: Computer System
2013Volume E96.DIssue 4 Pages 963-966
Published: April 01, 2013
Released on J-STAGE: April 01, 2013

DOIhttps://doi.org/10.1587/transinf.E96.D.963

JOURNAL FREE ACCESS

Show abstractHide abstract

A packet memory stores packets in internet routers and it requires typically RTT×C for the buffer space, e.g. several GBytes, where RTT is an average round-trip time of a TCP flow and C is the bandwidth of the router's output link. It is implemented with DRAM parts which are accessed in parallel to achieve required bandwidth. They consume significant power in a router whose scalability is heavily limited by power and heat problems. Previous work shows the packet memory size can be reduced to $\frac{RTT\times C}{\sqrt{N}}$, where N is the number of long-lived TCP flows. In this paper, we propose a novel packet memory architecture which splits the packet memory into on-chip and off-chip packet memories. We also propose a low-power packet mapping method for this architecture by estimating the latency of packets and mapping packets with small latencies to the on-chip memory. The experimental results show that our proposed architecture and mapping method reduce the dynamic power consumption of the off-chip memory by as much as 94.1% with only 50% of the packet buffer size suggested by the previous work in realistic scenarios.

View full abstract

Download PDF (259K)
An Improved Face Clustering Method Using Weighted Graph for Matched SIFT Keypoints in Face Region

Ji-Soo KEUM, Hyon-Soo LEE

Article type: LETTER
Subject area: Pattern Recognition
2013Volume E96.DIssue 4 Pages 967-971
Published: April 01, 2013
Released on J-STAGE: April 01, 2013

DOIhttps://doi.org/10.1587/transinf.E96.D.967

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we propose an improved face clustering method using a weighted graph-based approach. We combine two parameters as the weight of a graph to improve clustering performance. One is average similarity, which is calculated with two constraints of geometric and symmetric properties, and the other is a newly proposed parameter called the orientation matching ratio, which is calculated from orientation analysis for matched keypoints in the face region. According to the results of face clustering for several datasets, the proposed method shows improved results compared to the previous method.

View full abstract

Download PDF (339K)
Early Decision of Prediction Direction with Hierarchical Correlation for HEVC Compression

Chae Eun RHEE, Hyuk-Jae LEE

Article type: LETTER
Subject area: Image Processing and Video Processing
2013Volume E96.DIssue 4 Pages 972-975
Published: April 01, 2013
Released on J-STAGE: April 01, 2013

DOIhttps://doi.org/10.1587/transinf.E96.D.972

JOURNAL FREE ACCESS

Show abstractHide abstract

The emerging High Efficiency Video Coding (HEVC) standard attempts to improve the coding efficiency by a factor of two over H.264/AVC through the use of new compression tools with high computational complexity. Although multipledirectional prediction is one of the features contributing to the improved compression efficiency, the computational complexity for prediction increases significantly. This paper presents an early uni-directional prediction decision algorithm. The proposed algorithm takes advantage of the property of HEVC that it supports a deep quad-tree block structure. Statistical observation shows that the correlation of prediction direction among different blocks which share same area is very high. Based on this observation, the mode of the current block is determined early according to the mode of upper blocks. Bi-directional prediction is not performed when the upper block is encoded as the uni-directional prediction mode. A simulation shows that it reduces ME operation time by about 22.7% with a marginal drop in compression efficiency.

View full abstract

Download PDF (783K)
Joint Motion-Compensated Interpolation Using Eight-Neighbor Block Motion Vectors

Ran LI, Zong-Liang GAN, Zi-Guan CUI, Xiu-Chang ZHU

Article type: LETTER
Subject area: Image Processing and Video Processing
2013Volume E96.DIssue 4 Pages 976-979
Published: April 01, 2013
Released on J-STAGE: April 01, 2013

DOIhttps://doi.org/10.1587/transinf.E96.D.976

JOURNAL FREE ACCESS

Show abstractHide abstract

Novel joint motion-compensated interpolation using eight-neighbor block motion vectors (8J-MCI) is presented. The proposed method uses bi-directional motion estimation (BME) to obtain the motion vector field of the interpolated frame and adopts motion vectors of the interpolated block and its 8-neighbor blocks to jointly predict the target block. Since the smoothness of the motion vector filed makes the motion vectors of 8-neighbor blocks quite close to the true motion vector of the interpolated block, the proposed algorithm has the better fault-tolerancy than traditional ones. Experiments show that the proposed algorithm outperforms the motion-aligned auto-regressive algorithm (MAAR, one of the state-of-the-art frame rate up-conversion (FRUC) schemes) in terms of the average PSNR for the test image sequence and offers better subjective visual quality.

View full abstract

Download PDF (414K)
Improved Intra Prediction Coding Scheme Based on Minimum Distance Prediction for H.264/AVC

Qingbo WU, Linfeng XU, Zhengning WANG

Article type: LETTER
Subject area: Image Processing and Video Processing
2013Volume E96.DIssue 4 Pages 980-983
Published: April 01, 2013
Released on J-STAGE: April 01, 2013

DOIhttps://doi.org/10.1587/transinf.E96.D.980

JOURNAL FREE ACCESS

Show abstractHide abstract

In this letter, we propose a novel intra prediction coding scheme for H.264/AVC. Based on our proposed minimum distance prediction (MDP) scheme, the optimal reference samples for predicting the current pixel can be adaptively updated corresponding to different video contents. The experimental results show that up to 2dB and 1dB coding gains can be achieved with the proposed method for QCIF and CIF sequences respectively.

View full abstract

Download PDF (751K)
Indoor Scene Classification Based on the Bag-of-Words Model of Local Feature Information Gain

Rong WANG, Zhiliang WANG, Xirong MA

Article type: LETTER
Subject area: Image Recognition, Computer Vision
2013Volume E96.DIssue 4 Pages 984-987
Published: April 01, 2013
Released on J-STAGE: April 01, 2013

DOIhttps://doi.org/10.1587/transinf.E96.D.984

JOURNAL FREE ACCESS

Show abstractHide abstract

For the problem of Indoor Home Scene Classification, this paper proposes the BOW Model of Local Feature Information Gain. The experimental results show that not only the performance is improved but also the computation is reduced. Consequently this method out performs the state-of-the-art approach.

View full abstract

Download PDF (900K)
Real-Time Tracking with Online Constrained Compressive Learning

Bo GUO, Juan LIU

Article type: LETTER
Subject area: Image Recognition, Computer Vision
2013Volume E96.DIssue 4 Pages 988-992
Published: April 01, 2013
Released on J-STAGE: April 01, 2013

DOIhttps://doi.org/10.1587/transinf.E96.D.988

JOURNAL FREE ACCESS

Show abstractHide abstract

In object tracking, a recent trend is using “Tracking by Detection” technique which trains a discriminative online classifier to detect objects from background. However, the incorrect updating of the online classifier and insufficient features used during the online learning often lead to the drift problems. In this work we propose an online random fern classifier with a simple but effective compressive feature in a framework integrating the online classifier, the optical-flow tracker and an update model. The compressive feature is a random projection from highly dimensional multi-scale image feature space to a low-dimensional representation by a sparse measurement matrix, which is expect to contain more information. An update model is proposed to detect tracker failure, correct tracker result and constrain the updating of online classifier, thus reducing the chance of wrong updating in online training. Our method runs at real-time and the experimental results show performance improvement compared to other state-of-the-art approaches on several challenging video clips.

View full abstract

Download PDF (1656K)
Human Attribute Analysis Using a Top-View Camera Based on Two-Stage Classification

Toshihiko YAMASAKI, Tomoaki MATSUNAMI, Tuhan CHEN

Article type: LETTER
Subject area: Image Recognition, Computer Vision
2013Volume E96.DIssue 4 Pages 993-996
Published: April 01, 2013
Released on J-STAGE: April 01, 2013

DOIhttps://doi.org/10.1587/transinf.E96.D.993

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper presents a technique that analyzes pedestrians' attributes such as gender and bag-possession status from surveillance video. One of the technically challenging issues is that we use only top-view camera images to protect privacy. The shape features over the frames are extracted by bag-of-features (BoF) using histogram of oriented gradients (HoG) vectors. In order to enhance the classification accuracy, a two-staged classification framework is presented. Multiple classifiers are trained by changing the parameters in the first stage. The outputs from the first stage is further trained and classified in the second stage classifier. The experiments using 60-minute video captured at Haneda Airport, Japan, show that the accuracies for the gender classification and the bag-possession classification were 95.8% and 97.2%, respectively, which is a significant improvement from our previous work.

View full abstract

Download PDF (780K)

Register with J-STAGE for free!