next up previous
Next: Auditory-motivated filterbank Up: Segregation of vowel in Previous: Introduction

Auditory sound segregation model


 
Table: Constraints corresponding to Bregman's psychoacoustical heuristic regularities.
 
Regularity (Bregman, 1993) Constraint (Unoki and Akagi, 1999)
(i) Unrelated sounds seldom start or stop at exactly Synchronism of onset/offset $\vert T_{\rm{S}}-T_{k,\rm{on}}\vert \leq \Delta T_{\rm{S}}$
the same time ( common onset/offset)   $\vert T_{\rm{E}}-T_{k,\rm{off}}\vert \leq \Delta T_{\rm{E}}$
(ii) Gradualness of change (a) Slowness (piecewise- dAk(t)/dt=Ck,R(t)
(a) A single sound tends to change its properties differentiable polynomial $d\theta_{1k}(t)/dt=D_{k,R}(t)$
smoothly and slowly approximation) dF0(t)/dt=E0,R(t)
(b) A sequence of sounds from the same source (b) Smoothness $\sigma_A=\int_{t_a}^{t_b} [A_k^{(R+1)}(t)]^2dt \Rightarrow \min$
tends to change its properties slowly (Spline interpolation) $\sigma_\theta=\int_{t_a}^{t_b} [\theta_{1k}^{(R+1)}(t)]^2dt \Rightarrow \min$
(iii) When a body vibrates with a repetitive period, Multiples of the repetitive  
these vibrations give rise to an acoustic pattern fundamental frequency $n\times F_0(t), \qquad n=1,2,\cdots, N_{F_0}$
in which the frequency components are multiples    
of a common fundamental ( harmonicity)    
(iv) Many changes that take place in an acoustic event Correlation between the  
will affect all the components of the resulting instantaneous amplitudes $\frac{A_k(t)}{\Vert A_k(t)\Vert} \approx \frac{A_{\ell}(t)}{\Vert A_\ell(t)\Vert}$, $\qquad k\not=\ell$
sound in the same way and at the same time    


  
Figure: Auditory sound segregation model.
\begin{figure}\center
\epsfile{file=FIGURE/BLOCK.eps,width=0.45\textwidth}
\end{figure}

In this paper, it is assumed that the desired signal f1(t) is a harmonic complex tone, where F0(t) is the fundamental frequency. The proposed model segregates the desired signal from the mixed signal by constraining the temporal differentiation of Ak(t), $\theta_{1k}(t)$, and F0(t).

The proposed model is composed of four blocks: an auditory-motivated filterbank, an F0 estimation block, a separation block, and a grouping block, as shown in Fig. [*]. Constraints used in this model are shown in Table 1.



 
next up previous
Next: Auditory-motivated filterbank Up: Segregation of vowel in Previous: Introduction
Masashi Unoki
2000-10-26