This paper proposes a stress detection method using word-length dependent classifiers. Most of the past studies focused on finding the stress position of a word without looking into the length of that word. However, in a CAPT (computer-assisted pronunciation training) scenario, the prompted word for the students is known in advance, and we can make use of this extra information to greatly improve the detection accuracy. In the proposed method, a Bayesian classifier based on GMMs (Gaussian mixture models) is trained for words of each word-length. The experimental result shows that the proposed method improves upon the existing stress detection methods. A comprehensive dataset for stress detection is also released, and this dataset, to the best knowledge of authors, is the first publicly released stress detection dataset in the community.
In future communications with rich high-level kansei information, such as presence, verisimilitude, realism and naturalness, the role of sound is extremely important to enhance the quality and versatility of communications because sound itself can provide rich semantic and emotional information. Moreover, sound (auditory information) has good synergy effects with pictures (visual information). In this paper, we introduce our recent research results toward capturing and synthesizing comprehensive 3D sound space information as well as a high-definition 3D audio-visual display realizing strict audio-visual synchronization. We believe that these systems are useful to advance universal communications, which require particularly high-quality and versatile communications technologies for all.
The McGurk effect is one of the typical phenomena caused by human multi-modal information processing between auditory and visual speech perception. In this paper, we investigated the relation between the degree of the McGurk effect and the perceived impression by speech sounds and moving images of the talker's face. As stimuli, uttered speech sounds were combined with moving images of a different talker's face. These stimuli were presented to observers, who were asked to respond to what the talker was saying. At the same time, they were asked to report their subjective impressions of these stimuli. Matching between the voice and moving image was used as the index of the judgment. Results showed that matching between a voice and a talker's facial movements affected the degree of the McGurk effect, suggesting that audio–visual kansei information affects phoneme perception.
Psychoacoustic research suggests that the human auditory system processes noise and tones differently. For example, narrowband noise has a stronger masking effect than a pure tone at the same sound pressure level. Can this be explained by cochlear mechanics, or is the distinction due to central neural processing? In this paper, we review a computational model that simulates wave propagation in the cochlea. The model incorporates recent findings in cochlear electrophysiology and describes wave propagation in both the forward and the backward direction. Simulation shows that the model produces compressive nonlinear responses to tones but quasilinear responses to noise. We may therefore argue that the distinction between tone- and noise-processing is an epiphenomenon of cochlear mechanics. If this is true, we can simplify the computational organization of audio signal-processing front-ends for applications such as hearing aids and auditory prostheses.
Three-dimensional sound auralization systems have been developed actively in the last few decades. Such systems are called virtual auditory displays (VADs). In conventional VADs based on head-related transfer functions (HRTFs), a sound source position alone is rendered by disregarding other acoustical phenomena. However, because various sounds surround us in our daily life, we usually hear not only a targeted direct sound but also ambient sounds in an actual sound space. A lack of ambient sound often engenders an unnatural perception of the virtual auditory space presented by VAD based on HRTFs. Therefore, ambient sounds should be included in the VAD system auralization. We investigated an effective rendering method of ambient sound using ordinary colored noise. Furthermore, using subjective evaluations, we discuss the relation between the realism of sound space with ambient sounds and a listener's head movement.
This paper describes packet loss concealment methods for MP3 audio. The proposed methods are based on estimation of modified discrete cosine transform (MDCT) coefficients of the lost packets. The estimation of MDCT coefficients of lower dimensions is performed by switching two concealment methods: the sign correction method and the correlation-based method. The concealment methods are switched based on redundant side information calculated subband-by-subband for reducing MDCT prediction errors. Next, a method for improving estimation of MDCT coefficients of higher dimensions was proposed. The method estimates the absolute value and sign of an MDCT coefficient independently. The subjective evaluation experiment proved that both of the improvement methods for lower and higher dimensions effectively improved the subjective audio quality.
Through our experiments with the popular SIFT-DoG keypoint detector, we find that its stability in extracting keypoints from rotated images is good, but sometimes not as good as we expect. This paper presents our endeavor to improve the stability of the DoG keypoint detector by learning from tens of millions of training samples. The learning problem is formulated in a filtering setting, where the training samples are drawn from an oracle instead of using a fixed training set. We show that, by increasing the stability of keypoint detector, we may obtain discriminative local features for matching. The matching accuracy can be improved by 10% using the learned decision function as a watchdog to block unstable keypoints, with acceptable overheads in computation.
A main challenge in deformation estimation for fluorescent imaging is to decrease the effects of unavoidable random photon shot noise of fluorescence. To solve this open problem, an efficient second-order minimization (ESM) based non-rigid deformation estimation method on fluorescent imaging of neurons is proposed as a visual aid tool for understanding the relationship of neuron activities and behaviors. Because local features, such as corners, lines, arc segments, usually used for deformation estimation, could be compromised by fluorescence noise, global intensity information of all the pixels in a region of interest (ROI) is used as a texture pattern to guide parameterized deformation estimation. By satisfying this principle, three deformation models including affine, homography and Thin Plate Spline (TPS) based on the ESM algorithm are implemented and evaluated. The experimental results illustrate that the homography is the best choice of correctly and robustly estimating parameters for fluorescent deformations under our experimental condition. By using these parameters, the fluorescent intensity in restored image can be measured and analyzed without the disturbance of noise and deformation which can be significant for understanding how nervous system affects and controls animals' behaviors.
In real world images, many algorithms for adaptive contours detection exist and various improvements to the contours detection have been proposed. The reason for such diversity is that real world images contains heterogeneous mixtures of features and each of the available algorithms exploits some of these features. Thus, depending on the image, different algorithms shows different quality of result. In this paper we propose a method that improves the result adaptive contours detection by using an algorithm selection approach. Previous methods using the algorithm selection approach have been focusing only on images with a particular class of features (artificial, cellular) because of the complexity of real world images. In order to successfully solve this problem we first determine a set of distinctive features of each algorithm using machine learning. Then using these distinctive features we teach an algorithm selector to select best algorithm when a set of features is provided. Finally, we propose a method to split the input image into sub regions that are selected in such a manner that improves the quality of the image processing result. The proposed algorithm is verified on the set of benchmarks and its performance is comparable and better in many cases than the currently best contour detection algorithms.
This paper proposes an automatic color transfer method for complex content images. When given one or more high-quality reference images, our goal is to determine a set of best reference colors for transferring their color characteristics into the target image. Although several automatic color transfer methods have been proposed, there usually exists visible and unnatural artifact when processing images with complex content and lighting variation. In this paper, we represent each image in region level and propose to incorporate region attribute, region connectivity and intrinsic reflectivity to characterize the local organization within an image. We then determine the best-matched reference region for each target region using the proposed graph-theoretic region correspondence estimation. After determining the set of reference regions, we next conduct color transfer between the best-matched region pairs in a de-correlated color space. In order to reduce artifact across complex regions, we further propose a weighted color transfer in terms of intrinsic component. Both subjective and objective evaluation of our experiments demonstrates that the proposed method outperforms existing methods.
Power optimization has always been an important issue for modern IC design. In this paper, we present a power optimization technique for clock tree by applying multi-bit flip-flops and reducing total wire length. Through merging flip-flops into MBFFs, we effectively reduce power consumption caused by clock buffers. Moreover, by judiciously merging and placing the MBFFs, the total wire length is also significantly reduced. The combined effect of both techniques leads to a strong reduction in total power consumption of the clock network.
A threshold gate is a theoretical model of a neuron, and a threshold circuit is a theoretical model of a neural network. The energy e of a threshold circuit C is defined to be the maximum number of gates outputting ones, where the maximum is taken over all input assignments to C. In this paper, we prove that the comparison function of 2n variables is computable by a polynomial-weight threshold circuit C of energy e, size s = O(n/log n), depth d = O(n/(e log n)) for any e, 3 ≤ e ≤ n/⌈log n⌉. Our result implies that one can construct an energy-efficient circuit computing the comparison function if it is allowable to use large depth.
We present a high-performance Elliptic Curve Cryptographic (ECC) processor that supports arbitrary prime field and curve parameters. A novel pipeline architecture of the Montgomery multiplication by using the inherent DSP blocks in modern FPGAs is proposed to speed up the point scalar multiplication. In addition, the improved operation scheduling is presented to optimize the operation cycles further. With Xilinx Virtex-5 FPGA devices, a 256-bit point scalar multiplication can be performed in 0.86 ms at 263 MHz, about 3.14 to 11.27 times faster than other designs with comparable functionalities. Our processor therefore outperforms others significantly in terms of throughput, area, and cost-effectiveness.
Heterogeneous multi-core processors are attracted by various type of applications from low-power media applications to high-performance computing due to their capability of drawing strengths of different cores to improve the overall performance. However, the data transfer bottlenecks between different cores becomes a serious problem. This paper presents two key methodologies to solve the data transfer bottoleneck: memory allocation considering a addressing function constraint and task allocation based on algorithm transformation. Moreover, in order to help to explore accelerator architecture suitable for applications, this paper presents a platform based on FPGAs where circuity is reconfigured by users after fabrication.
A Boltzmann machine (BM) is a basic learning model forming a Markov random field, and many approximate learning algorithms for it so far. In the present paper, a new strategy for approximate BM learnings is proposed by introducing a temperature of observed data which controls a smoothness of empirical distribution. By controlling the temperature, one can obtain better solutions to BM learning with an existing approximate learning algorithm.
A person-to-person information sharing is easily realized by P2P networks that servers are not essential. Information leakages, which are caused by malicious accesses for P2P networks, has become new social issues. To prevent information leakage, it is necessary to detect and block traffics of P2P software. Since some P2P softwares can spoof port numbers, it is difficult to detect the traffics sent from P2P softwares by using port numbers. It is more difficult to devise effective countermeasures for detecting the software because their protocol are not public. In this paper, we propose a method to identify applications using sequential transition patterns of payload length. Through real traffic experiment we show that proposed method can quickly and accurately identify network applications.