next up previous
Next: Auditory segregation model Up: A Method of Signal Previous: Keyword

Introduction

The problem of segregating the desired signal from a noisy signal is an important issue not only in robust speech recognition systems but also in various types of signal processing. It has been investigated by many researchers, who have proposed many methods. For example, in the investigation of robust speech recognition[Furui and Sondhi1991], there are noise reduction or suppression [Boll1979] and speech enhancement methods [Junqua and Haton1996]. In the investigation of signal processing, there is signal estimation using a linear system[Papoulis1977,Shamsunder and Giannakis1997] and signal estimation based on a stochastic process for signal and noise[Papoulis1991].

However, in practice, it is difficult to segregate each original signal from a mixed signal, because this problem is an ill-inverse problem and the signals exist in a concurrent time frequency region. Therefore, it is difficult to solve this problem without using constraints.

On the other hand, the human auditory system can easily segregate the desired signal in a noisy environment that simultaneously contains speech, noise, and reflections. Recently, this ability of the auditory system has been regarded as a function of an active scene analysis system. Called ``Auditory Scene Analysis (ASA)'', it has become widely known as a result of Bregman's book [Bregman1990]. Bregman reported that to perform the problem of ASA the human auditory system uses four psychoacoustically heuristic regularities related to acoustic events:

(i)
common onset and offset,
(ii)
gradualness of change,
(iii)
harmonicity, and
(iv)
changes occurring in the acoustic event [Bregman1993].

Some ASA-based investigations have shown that it is possible to solve the segregation problem by applying constraints to sounds and the environment. These approaches are called ``Computational Auditory Scene Analysis (CASA).'' Some CASA-based segregation models already exist. There are two main types of models of auditory segregation, based on either bottom-up or top-down processes.

Typical bottom-up models include an auditory segregation model based on acoustic events [Brown1992,Cooke1993], a concurrent harmonic sounds segregation model based on the fundamental frequency [de Cheveigne1993,de Cheveigne1997], and a sound source separation system with the ability of automatic tone modeling [Kashino and Tanaka1993]. Typical top-down models include a segregation model based on psychoacoustic grouping rules [Ellis1994,Ellis1996] and a computational model of sound segregation agents [Nakatani et al.1994,Nakatani et al.1995a,Nakatani et al.1995b]. All these segregation models use regularities (i) and (iii), and the amplitude (or power) spectrum as the acoustic feature, so, they cannot completely extract the desired signal from a noisy signal if the signal and noise exist in the same frequency region.

We think that, using the same approach as in CASA, it should be possible to solve the signal segregation problem (ill-problem) uniquely, using constraints related to the four regularities. In addition, we have discussed the need to use not only the amplitude spectrum but also the phase spectrum in order to completely extract the desired signal from a noisy signal in which the signal and noise exist in the same frequency region [Unoki and Akagi1997a]. There have been two investigations based on this idea.

As the first step, the problem of segregating a sinusoidal signal from a noise-added sinusoidal signal can be solved using constraints related to two of the four regularities, (ii) and (iv) [Unoki and Akagi1997a]. Then, the problem of segregating an amplitude modulated (AM) complex tone from noise-added or concurrent AM complex tones can be solved using the four regularities [Unoki and Akagi1997b].

This paper introduces the general problem of segregating two acoustic source as summary of the above results. Then it proposes a method of extracting the desired signal (harmonic tone) from a noisy signal (noisy harmonic tone) based on auditory scene analysis.


next up previous
Next: Auditory segregation model Up: A Method of Signal Previous: Keyword
Masashi Unoki
2000-11-07