In this work, the perception of the position and orientation of a directional acoustic source in a real enclosed environment by blindfolded listeners is investigated and compared with a method that automatically estimates the position and orientation of the source using a T-shaped microphone array. In the subjective experiment using blindfolded listeners, a human speaker acted as an acoustic source and listeners judged the speaker’s facing angle (one out of four possible orientations shifted by 90°) and position after listening to a spoken sentence. This procedure was performed twice, before and after a training phase. In the training, listeners were allowed to remove the blindfold and verify the speaker’s position and orientation. After the training, the correct orientation ratio increased from 75.0 to 76.5% and the average position error decreased from 66.2 to 60.6 cm. In addition, a subjective experiment on orientation estimation with eight orientations shifted by 45° in the same real environment showed that orientation estimation in a real environment was more difficult than that in an anechoic environment. Artificial neural networks (ANNs) were used in the automatic estimation method. A correct orientation ratio of 68.1% and an average position error of 48.0 cm were obtained by the T-shaped microphone array located nearest to the blindfolded listener among an array network consisting of eight T-shaped microphone arrays, enabling a rough comparison between human auditory perception and the automatic estimation method (a correct orientation ratio of 67% and a better average position error of 38.6 cm were the best results obtained by a T-shaped array in the network). It was clarified that the automatic estimation method cannot surpass the auditory system in terms of correct orientation ratio; however, it yielded better results in terms of the average position error.
Previous research has revealed that Japanese native speakers are highly likely to both perceive and produce epenthetic vowels between consonants. The goal of the present study is to investigate the influence of English learning backgrounds in the perception and production of consonant clusters by Japanese native speakers. In Experiment 1, a forced-choice AXB task to identify VC(u)CV is assigned to 17 highly fluent Japanese-English bilinguals and 22 Japanese monolinguals. Results show that monolinguals made significantly more errors than bilinguals. In Experiment 2, the influence of English proficiency on the production of consonant clusters, and the effect of consonant voicing on vowel epenthesis are investigated. The epenthetic vowels are acoustically analyzed and categorized into three degrees: full, partial and no epenthesis. The voicing combinations of the consonant clusters are C[+voice]-C[+voice], C[−voice]-C[+voice], and C[−voice]-C[−voice]. Results show that monolinguals inserted more epenthetic vowels than bilinguals, and that the influence of consonant voicing was stronger in monolinguals than bilinguals. Furthermore, monolinguals’ epenthetic vowels between C[−voice]-C[+voice] and C[−voice]-C[−voice] tended to become devoiced than bilinguals. This result suggests a stronger L1 influence on monolinguals. The results of the two experiments thus suggest that the English proficiency influences the perception and production of consonant clusters.
A watermarking system based on amplitude modulation is proposed. Sinusoidal amplitude modulations at relatively low modulation frequencies applied to neighboring subband signals in opposite phases are used as the carrier of embedded information. The embedded information is encoded in the form of relative phase differences between the amplitude modulations applied to several groups of subband signals. Robustness against perceptual codecs, additive white noise, reverberation, pitch modifications and time scale modifications was verified by computer simulation using 100 pieces in a music genre database. Listening tests to measure the detection thresholds of watermarking and a subjective assessment of the slight reduction of audio quality due to watermarking were conducted using trained listeners. The results of both the listening tests and the computer simulation showed that the quality degradation caused by watermarking with moderate embedding intensity was perceptible but not annoying, while maintaining sufficient detectability of the watermarks.
In this paper, we propose a sound field creation system based on the simultaneous equations method used for active noise control systems. Conventional studies on such sound field creation mainly focus on the transaural system reproducing recorded signals at the listening points. We differently purpose a system that creates the same sound field as that of a target. In the proposed system, inverse filters forming the target sound field are derived by using the simultaneous equations method, which is characterized by the use of auxiliary filters in estimating the error between the created and the target sounds. The auxiliary filters can also be used for detecting the deficiency in the number of taps of the inverse filters and in the delays to be added to the target sound field. The latter not only increases the error but also generates echoes preceding the created sound, which is annoying to listeners. In this paper, we finally verify the performance of the proposed system and also that the insufficiency can be detected by monitoring the coefficients of the auxiliary filter, by computer simulations.
We previously reported several useful shear modulus reconstruction methods obtained from motion or static equations and reference shear moduli that yield an absolute reconstruction of shear modulus, density or inertia using the finite difference method (FDM) or finite element method (FEM) with strain tensor measurement (i.e., methods referred to as Methods A to C and F). Methods A to C are particularly useful for three-dimensional (3D) shear modulus reconstruction in an incompressible case or in the case of an inhomogeneous Poisson’s ratio by using the mean normal stress as an unknown, whereas Method F using a typical Poisson’s ratio is particularly useful when the reconstruction dimensionality is low or when the Poisson’s ratio is homogeneous. For instance, when using FDM, although Method B permits the absolute reconstruction of a mean normal stress by using a reference mean normal stress, Methods A and C yield biased mean normal stress reconstructions using no reference value and a quasi-reference value (e.g., zero), respectively. In this report, we describe our newly developed Methods D and E obtained from Methods A to C and F, which yield a unique relative shear modulus reconstruction with no geometrical artifact. In Method D, no reference shear modulus is used with Methods A to C and F, whereas in Method E, an arbitrary finite value (e.g., unity) is used with Methods A to C and F as a quasi-reference shear modulus at one or more arbitrary points (i.e., a quasi-reference point) or in an arbitrary region (i.e., a quasi-reference region) in the region of interest. That is, both Method D and Method E permit the visualization of the shear modulus distribution without using an absolute reference value (e.g., no reference material or typical value), which is particularly useful when dealing with deeply located tissues such as the liver and heart tissues. Methods D and E use iterative methods such as the conjugate gradient method to solve algebraic equations. As confirmed from simulated phantom deformation data, each version of Method E results in a higher convergence speed and reconstruction accuracy than the corresponding Method D. Although several versions of Method D using FEM are practically useful, Method E obtainable should be used for such relative reconstructions rather than the corresponding version of Method D, in combination with Methods A to C and F.
To yield spatially uniform stability in shear modulus reconstruction, an effective method for setting appropriate spatially varied regularization parameters using the variances of measured strain tensor components was previously reported by us. In the method, the strain variances are evaluated at each position using multiple field measurements or in real time using a single field measurement with the Ziv-Zakai lower bound (ZZLB, i.e., Vz) when using all cross-correlation-based methods. However, a stationary assumption of the displacement or strain measurement error can also be used as an alternative method of real-time reconstruction using an estimated variance (i.e., Vd or Vs). In this study, a comparison of real-time spatially variant regularizations using the above three estimated strain variances performed on an agar phantom reveals that stationary assumptions cannot be used when the target region has a large spatial variation in the shear modulus distribution (e.g., when stress concentration regions are generated in front of and behind a stiff inclusion or tumor), although the ZZLB can be used if sufficient care is taken owing to its low local correlation. When a target has a released tissue boundary, none of the estimated variances are useful. In such regions, none of the estimated variances should be used directly, because the shear modulus is not spatially smooth.