next up previous
Next: Assumption and constraints of Up: Auditory segregation model Previous: Auditory segregation model

Formulation of the problem of segregating two acoustic sources

First, only the mixed signal f(t), where f(t)=f1(t)+f2(t), can be observed in the proposed model. Here, f1(t) is the desired signal and f2(t) is a noise or the other signal. The observed signal f(t) is decomposed into its frequency components by an auditory-motivated filterbank (the number of channels is K). The output of the k-th channel Xk(t) is represented by

 
Xk(t) = X1,k(t)+X2,k(t) (1)
  = $\displaystyle S_k(t)\exp(j\omega_k t + j\phi_k(t)),$ (2)

where X1,k(t) and X2,k(t) are components of f1(t) and f2(t) that have passed through the filterbank, respectively.

Second, the outputs of the k-th channel, which correspond to f1(t) and f2(t), are assumed to be

 \begin{displaymath}X_{1,k}(t)=A_k(t)\exp(j\omega_k t + j\theta_{1k}(t))
\end{displaymath} (3)

and

 \begin{displaymath}X_{2,k}(t)=B_k(t)\exp(j\omega_k t + j\theta_{2k}(t)).
\end{displaymath} (4)

Here, $\omega_k$ is the center frequency of the k-th channel (the auditory filter) and $\theta_{1k}(t)$ and $\theta_{2k}(t)$ are the instantaneous input phases of f1(t) and f2(t), respectively. Using this assumption, the instantaneous amplitude Sk(t) and the instantaneous output phase $\phi _k(t)$ are represented by

 \begin{displaymath}S_k(t)=\sqrt{A_k^2(t)+2A_k(t)B_k(t)\cos\theta_k(t)+B_k^2(t)}
\end{displaymath} (5)

and

 \begin{displaymath}\phi_k(t)=\arctan\left(\frac{A_k(t)\sin\theta_{1k}(t)+B_k(t)\...
...t)}{A_k(t)\cos\theta_{1k}(t)+B_k(t)\cos\theta_{2k}(t)}\right).
\end{displaymath} (6)

Therefore, the instantaneous amplitudes of the two signals Ak(t) and Bk(t) can be determined by

 \begin{displaymath}A_k(t)=\frac{S_k(t)\sin(\theta_{2k}(t)-\phi_k(t))}{\sin\theta_k(t)}
\end{displaymath} (7)

and

 \begin{displaymath}B_k(t)=\frac{S_k(t)\sin(\phi_k(t)-\theta_{1k}(t))}{\sin\theta_k(t)},
\end{displaymath} (8)

where $\theta_k(t)=\theta_{2k}(t)-\theta_{1k}(t)$ and $\theta_k(t)\not= n\pi, n\in{\bf {Z}}$. Focusing on the output value of the k-th channel at time t, the relationships between every instantaneous amplitude and every instantaneous phase are shown in Fig. 2.

Hence, since the instantaneous amplitude Sk(t) and the instantaneous output phase $\phi _k(t)$ are observable (see Sec. 3.1.1), and if the instantaneous input phases $\theta_{1k}(t)$ and $\theta_{2k}(t)$ are determined, then Ak(t) and Bk(t) can be determined by the above equations.

Finally, f1(t) and f2(t) can be reconstructed by using the grouping of the instantaneous amplitude and the instantaneous phase for all channels. Thus, $\hat{f}_1(t)$ and $\hat{f}_2(t)$ are the reconstructed f1(t) and f2(t), respectively.

However, in the above formulation, it is difficult to uniquely and simultaneously determine the instantaneous amplitudes (Ak(t) and Bk(t)) and the instantaneous phases ( $\theta_{1k}(t)$ and $\theta_{2k}(t)$) using Sk(t) and $\phi _k(t)$, because there are currently no equations for determining two such instantaneous phases and the segregation of two acoustic sources is an ill-inverse problem. Therefore, in this paper, we try solving the problem of segregating two acoustic sources by constraining the desired signal using the four regularities.

\fbox{Fig. 2}


next up previous
Next: Assumption and constraints of Up: Auditory segregation model Previous: Auditory segregation model
Masashi Unoki
2000-11-07