next up previous
Next: Auditory segregation model Up: A Method of Signal Previous: A Method of Signal

Introduction

Recently, the term ``Auditory Scene Analysis: ASA'' has become widely known due to Bregman's book[Bregman1990]. ASA is understanding a real environment using acoustic events. Although the real environment, that we experience everyday, consists of speech, noise and reflection, simultaneously, it seems that the human auditory system can solve the problem of ASA. But, in solving the problem of ASA using acoustic signals received from the same environment, a unique solution can not be derived without constraints on acoustic sources and the real environment.

Bregman reported that, for solving the problem of ASA, the human auditory system uses four psychoacoustically heuristic regularities related to acoustic events: (i) common onset and offset, (ii) gradualness of change, (iii) harmonicity, and (iv) changes taken in the acoustic event[Bregman1993].

We think that, by translating these heuristic regularities into physical constraints and by using these physical constraints, it is possible to solve the problem of computational auditory scene analysis. As a first step, if it is possible to solve an acoustic source segregation problem, where the sounds required by the listener are extracted selectively while the other sounds are rejected, this solution can be used not only to construct a preprocessor for a robust speech recognition system but also to simulate cocktail party effects. And, it seems that the solution can be a computational model of auditory phenomena such as Co-modulation Masking Release (CMR).

On the one hand, there are two types of typical models of auditory segregation using some of the four regularities, based on either bottom-up or top-down processes. An example of the former type is Brown and Cooke's segregation model based on acoustic events[Brown1992,Cooke1993]. And as for the later type, there are Ellis' segregation model based on psychoacoustic grouping rules[Ellis1994] and Nakatani et al.'s stream segregation agents[Nakatani et al.1994]. All these segregation models use regularities (i) and (iii), and an amplitude (or power) spectrum as the acoustic feature. Thus they can not extract the desired signal from a noisy signal completely when the signal and noise exist in the same frequency region. And, if the power of background noise increases, it seems that these proposed models can not extract the desired signal with high precision.

In contrast, we have discussed the need for using not only the amplitude spectrum but also the phase spectrum, for completely extracting the desired signal from a noisy signal in which signal and noise exist in the same frequency region[Unoki et al.1997]. We have proposed a method for solving the problem of segregating a sinusoidal signal from noise-added signal, using physical constraints related to regularities (ii) and (iv). As a result of computer simulations, it was found that the proposed model can segregate a sinusoidal signal from noise-added signal. If the parameters of the proposed model are set to the human auditory properties, it can be a computational model of Co-modulation Masking Release[Unoki et al.1997].

In this paper, we present a method for extracting the desired signal from noisy signal by using physical constraints related to regularities (i) - (iv), as an auditory segregation model. In particular, we consider that the problem of extracting the desired signal from the following signals: (a) noise-added AM complex tone, (b) mixed AM complex tones, and (c) noisy synthetic vowel.


next up previous
Next: Auditory segregation model Up: A Method of Signal Previous: A Method of Signal
Masashi Unoki
2000-10-26