The work of Akagi Lab. is speech signal processing and modeling of the speech perception mechanism of humans. Humans have superior speech recognition ability such as recovery from co-articulation, noisy speech recognition, and normalization of individuality. It is believed that modeling these compensation mechanisms will produce tools applicable to co-articulation recovery or other problems in speech recognition, analysis and synthesis. To construct models which mimic human speech perception ability, not only engineering but also knowledge of physiology and/or psychology are required. Thus, our approach to the study is (1) collecting psychoacoustic experiment results for investigating human speech perception ability, (2) modeling hearing mechanisms based on the experiment results by using algebraic formulation and digital signal processing techniques, and (3) improving automatic speech recognition and speech analysis/synthesis systems by applying the models. Some research items in Akagi Lab. are as follows.
(1) Recovery from Co-articulation:
Co-articulation reveals phoneme neutralization or transitional sound intervals because of human articulation organ limitations. This is one of the most serious problems in automatic speech recognition. We are constructing some models of contextual effects which can cope with co-articulation problems, especially vowel neutralization, by using results of psychoacoustical experiments.
(2) Auditory Scene Analysis:
Our lab. is investigating noise reduction methods by using knowledge of Auditory Scene Analysis (ASA), such as cocktail party effects. The system will be designed to separate a speech wave from noise.
(3) Normalization of talker individuality:
It is generally believed that the main physical characteristics we use to perceive who speaks are pitch frequency and spectrum envelope. We study methods for manipulating characteristics concerned with talker individuality from the pitch frequency and the spectrum envelope, for applications to multi-speaker speech recognition systems, speech synthesis with speaker individualities and speaker verification/identification.
(4) Extraction of Phoneme Characteristics:
Akagi Lab. is constructing new low-bit rate coding methods for multimedia communication. These methods are based on extraction of key-point spectra and spectrum interpolation along with experimental evidence about phoneme perception.
(1) Recovery from Co-articulation:
(2) Auditory Scene Analysis:
(3) Normalization of talker individuality:
(4) Extraction of Phoneme Characteristics: