Next: Formulation of the problem Up: No Title Previous: Keyword

Introduction

Developments in recent years have caused the auditory system to be considered an active scene analysis system stimulating the study of acoustic source segregation based on auditory scene analysis (ASA) [1,2]. If it becomes possible to solve the problem of acoustic source segregation, not only will it become possible to extract sounds required by the listener while rejecting others, but could find application in a robust speech recognition system [4]. We feel that constructing a computational theory of audition in an analogy to a computational theory of vision proposed by Marr [5] will require time complete; however, we feel that modeling based on ASA suggests a new approach in the construction of a computational theory of audition [6,7,8] since ASA shows a direction for constructing a computational theory.

Bregman reported that, for solving the problem of ASA in understanding an environment through acoustic events, the human auditory system uses four psychoacoustically heuristic regularities related to acoustic events[2,3]:

(1): common onset and offset,
(2): gradualness of change,
(3): harmonicity,
(4): changes in an acoustic event.

There already exists ASA-based segregation models utilizing these four regularities: Brown and Cooke's segregation model based on acoustic events [9,10,11], Ellis's segregation model based on psychoacoustic grouping rules [12], and Nakatani et al.'s segregation model implementing a multi-agent system [13,14]. Another model is a computational model of quantitative relationships between multiple features on the spectrogram and auditory segregation for auditory segregation of two frequency components, as proposed by Kashino et al.[15,16]. All these computational segregation models use regularities (1) and (3), as well as the amplitude or power spectrum as the acoustic feature. Because of this, they cannot extract completely the signal from a noise-added signal when signal and noise exist in the same frequency region.

We stress the need for considering not only the amplitude spectrum but also the phase spectrum, when attempting to extract completely the signal from a noise-added signal in which both exist in the same frequency region [17]; based on this stance, we seek to solve the problem of segregating two acoustic sources -- basic problem of acoustic source segregation using regularities (2) and (4) as proposed by Bregman [18,24]. This paper proposes a method of signal extraction from noise-added signal as a solution for the problem of segregating two acoustic sources. This method uses amplitude and phase spectra calculated by the wavelet transform from noise-added signal; it also shows that if the parameters of the proposed model are set to the human auditory properties, the proposed model can be a computational model of co-modulation masking release (CMR) [19].

The paper is organized as follows: Section 2 illustrates the proposed model and then formulates the problem of segregating two acoustic sources; Section 3 shows the design of the wavelet filterbank and its characteristics; Section 4 shows calculation of the physical parameters and segregation algorithm; Section 5 carries out computer simulations for segregating two acoustic sources to show advantages of the proposed method; Section 6 shows that the proposed model can be a computational model of co-modulation masking release if the model parameters are set to the human auditory properties; Section 7 contains our conclusions.

Next: Formulation of the problem Up: No Title Previous: Keyword

Masashi Unoki
2000-10-26