Acoustical Science and Technology
Online ISSN : 1347-5177
Print ISSN : 1346-3969
ISSN-L : 0369-4232
36 巻, 6 号
選択された号の論文の8件中1~8を表示しています
INVITED REVIEW
—Special Issue on Applied System—
PAPERS
  • Hiroki Oohashi, Sadao Hiroya, Takemi Mochida
    2015 年 36 巻 6 号 p. 478-488
    発行日: 2015年
    公開日: 2015/11/01
    ジャーナル フリー
    This paper presents a real-time robust formant tracking system for speech using a real-time phase equalization-based autoregressive exogenous model (PEAR) with electroglottography (EGG). Although linear predictive coding (LPC) analysis is a popular method for estimating formant frequencies, it is known that the estimation accuracy for speech with high fundamental frequency F0 would be degraded since the harmonic structure of the glottal source spectrum deviates more from the Gaussian noise assumption in LPC as its F0 increases. In contrast, PEAR, which employs phase equalization and LPC with an impulse train as the glottal source signals, estimates formant frequencies robustly even for speech with high F0. However, PEAR requires higher computational complexity than LPC. In this study, to reduce this computational complexity, a novel formulation of PEAR was derived, which enabled us to implement PEAR for a real-time robust formant tracking system. In addition, since PEAR requires timings of glottal closures, a stable detection method using EGG was devised. We developed the real-time system on a digital signal processor and showed that, for both the synthesized and natural vowels, the proposed method can estimate formant frequencies more robustly than LPC against a wider range of F0.
  • Gabriel Pablo Nava, Hoang Duy Nguyen, Yusuke Hioka, Yutaka Kamamoto, T ...
    2015 年 36 巻 6 号 p. 489-499
    発行日: 2015年
    公開日: 2015/11/01
    ジャーナル フリー
    Recent optical wireless acoustic sensors have demonstrated the possibility to simultaneously sense massive numbers of audio channels in real time. Although this technology has enabled the deployment of large-scale applications, it raises new challenges from the computational perspective. In this regard, Graphics Processing Units provide significant parallel computational power. However, not all the existent algorithms are GPU-implementable in a straightforward way. This paper discusses signal processing schemes and implementation strategies to achieve real-time broadband beamforming using a single GPU card. The experiments introduced here, show our prototype implementation handling over 120 audio channels in real time. The experimental results further highlight the particular advantages of using a video camera-based approach to improve the beamforming performance.
  • Yusuke Torikai, Dai Kuze, Junko Kurosawa, Yasuhiro Oikawa, Yoshio Yama ...
    2015 年 36 巻 6 号 p. 500-506
    発行日: 2015年
    公開日: 2015/11/01
    ジャーナル フリー
    We investigated a new communication-aid system focused on bone-conduction through a tooth, for listening to and recording voices. In this paper, we developed a tooth-conduction microphone (TCM) and evaluate the articulation of tooth-conducted voice (TCV). Because the TCM has the shape of one's dental mold, it is wearable like a mouthpiece. Moreover, it can extract tooth vibration during phonation as TCV. To evaluate articulation of TCV, we adopted monosyllable articulation for subjective assessment and linear predictive coding cepstral distance for objective assessment. The results of articulation show that TCV is not sufficiently clear compared to air-conducted. However, it is confirmed that TCV is robust to environmental noise because the accuracy rate is not decreased when the TCV is recorded under high ambient noise.
  • Osamu Ichikawa, Takashi Fukuda, Ryuki Tachibana
    2015 年 36 巻 6 号 p. 507-515
    発行日: 2015年
    公開日: 2015/11/01
    ジャーナル フリー
    In the financial industry, face-to-face conversation is an essential for sales. Similar to call-center monitoring, there is a significant need to monitor the conversation for compliance checks. In certain business scenarios, there is a need to record an employee's speech while protecting the customers' confidentiality and privacy. In this paper, we propose a small-scale microphone array system specially designed to record only the agent's speech. For the suppression of the customer's speech, we used CSP-based post-filtering. However, using small number of microphones, it is difficult to suppress unwanted speech completely. Because post-filtering using correlations of the multiple channels often affected by the spatial aliasing between speakers. We introduced the weighted-CSP to attenuate susceptible bins to the interfering speaker. Also we introduced flooring after the post-filtering to mask residuals. This combination helps prevent the customer's speech to be transcribed.
  • Shuichi Sakamoto, Satoshi Hongo, Takuma Okamoto, Yukio Iwaya, Yôi ...
    2015 年 36 巻 6 号 p. 516-526
    発行日: 2015年
    公開日: 2015/11/01
    ジャーナル フリー
    Sensing of high-definition three-dimensional (3D) sound-space information is of crucial importance for realizing total 3D spatial sound technology. We have proposed a sensing method for 3D sound-space information using symmetrically and densely arranged microphones. This method is called SENZI (Symmetrical object with ENchased Zillion microphones). In the SENZI method, signals recorded by the microphones are simply weighted and summed to synthesize a listener's head-related transfer functions (HRTFs), reflecting the direction in which the listener is facing even after recording. The SENZI method is being developed as a real-time system using a spherical microphone array and field-programmable gate arrays (FPGAs). In the SENZI system, 252 electric condenser microphones (ECMs) were almost uniformly distributed on a rigid sphere. The deviations of the microphone frequency responses were compensated for using the transfer function of the rigid sphere. To avoid the degradation of the accuracy of the synthesized sound space by microphone internal noise, particularly in the low-frequency region, we analyzed the effect of the signal-to-noise ratio (SNR) of microphones on the accuracy of synthesized sound-space information by controlling condition numbers of matrix constructed from transfer functions. On the basis of the results of these analyses, a compact SENZI system was implemented. Results of experiments indicated that 3D sound-space information was well expressed using the system.
  • Kohichi Ogata, Kohei Matsumura, Yusuke Matsuda
    2015 年 36 巻 6 号 p. 527-536
    発行日: 2015年
    公開日: 2015/11/01
    ジャーナル フリー
    This paper describes data-glove-driven vocal tract configuration methods. Unlike direct mapping from hand gestures to sounds using a data glove, intuitive manipulation of the data glove was applied to configure the vocal tract shape. Two manipulation methods were proposed and then evaluated in terms of the vocal tract area function, resulting formant frequencies and ease of manipulation. It was revealed that although both methods were capable of producing the resulting formant frequencies with reasonable accuracy for steady vowel production, the method with three fingers enabled users to easily configure the vocal tract shape. Moreover, the effect of training in manipulating the data glove to configure the vocal tract shape for continuous vowels was evaluated in terms of their sound spectrograms and the distribution of the resulting formant frequencies. An experiment to evaluate the effectiveness of training proved that beginners were able to produce continuous vowels within about three training sessions.
ACOUSTICAL LETTER
feedback
Top