PAPERS

Masato AKAGI's Home Page

PAPERS　　　　　　　　　　　　　　　　　　　　　　(01/Jul/2017)

1. Non-linguistic Information
1-1 Singing Voice
1-2 Speaker Individuality
1-3 Emotional Speech
1-4 Voice Conversion
1-5 Speech Coding

2. Speech recovery
2-1 Noise Reduction
2-2 F0 Extraction
2-3 De-reverberation
2-4 Bone-conducted Speech
2-5 Speech Recognition
2-6 DOA

3. Cocktail-party Effect Modeling
3-1 Sound Segregation
3-2 Privacy Protection
3-3 Noisy Sound Perception

4. Psychoacoustics
4-1 Auditory Model
4-2 Contextual Effect
4-3 Auditory Filter
4-4 Phase Perception
4-5 Speech Perception
4-6 Noise Evaluation
4-7 Pitch Perception
4-8 Direction Perception

5. Physiological Auditory Modeling

6. Abnormal Speech
6-1 Abnormal Speech Perception
6-2 3D Vocal Tract Modeling

7. Interaction between Perception and Production

8. Signal Analysis

9. Others
9-1 NTT & ATR
9-2 Tokyo Institute of Technology

If you are interested in my research topics and want to see my publications, why don't you visit the JAIST Repositry and download my papers?

1. Non-linguistic Information

[1] Akagi, M. (2010/11/29). “Rule-based voice conversion derived from expressive speech perception model: How do computers sing a song joyfully?” Tutorial, ISCSLP2010, Tainan, Taiwan.
[2] Akagi, M. (2009/10/6). "Analysis of production and perception characteristics of non-linguistic information in speech and its application to inter-language communications," Proc. APSIPA2009, Sapporo, 513-519.
[3] Akagi, M. (2009/08/14). “Multi-layer model for expressive speech perception and its application to expressive speech synthesis,” Plenary lecture, NCMMSC2009, Lanzhou, China.
[4] Akagi, M. (2009/02/20). "Introduction of SCOPE project: Analysis of production and perception characteristics of non-linguistic information in speech and its application to inter-language communications," International symposium on biomechanical and physiological modeling and speech science, 51-62.

1-1 Singing Voice

[1] Motoda, H. and Akagi, M. (2013/03/5). “A singing voices synthesis system to characterize vocal registers using ARX-LF model,” Proc. NCSP2013, Hawaii, USA, 93-96.
[2] Saitou, T., Goto, M., Unoki, M., and Akagi, M. (2009/08/15). “Speech-to-Singing Synthesis System: Vocal Conversion from Speaking Voices to Singing Voices by Controlling Acoustic Features Unique to Singing Voices,” NCMMSC2009, Lanzhou, China.
[3] Nakamura, T., Kitamura, T. and Akagi, M. (2009/03/01). "A study on nonlinguistic feature in singing and speaking voices by brain activity measurement," Proc. NCSP'09, 217-220.
[4] Saitou, T., Goto, M., Unoku, M., and Akagi, M. (2007). "Speech-to-singing synthesis: converting speaking voices to singing voices by controlling acoustic features unique to singing voices," Proc. WASPAA2007, New Paltz, NY, pp.215-218
[5] Saitou, T., Goto, M., Unoki, M., and Akagi, M. (2007). "Vocal conversion from speaking voice to singing voice using STRAIGHT," Proc. Interspeech2007, Singing Challenge.
[6] Saitou, T., Unoki, M., and Akagi, M. (2006). "Analysis of acoustic features affecting singing-voice perception and its application to singing-voice synthesis from speaking-voice using STRAIGHT," J. Acoust. Soc. Am., 120, 5, Pt. 2, 3029.
[7] Saitou, T., Unoki, M. and Akagi, M. (2005). "Development of an F0 control model based on F0 dynamic characteristics for singing-voice synthesis," Speech Communication 46, 405-417.
[8] Saitou, T., Tsuji, N., Unoki, M. and Akagi, M. (2004). “Analysis of acoustic features affecting “singing-ness” and its application to singing-voice synthesis from speaking-voice,” Proc. ICSLP2004, Cheju, Korea.
[9] Saitou, T., Unoki, M., and Akagi, M. (2004). “Control methods of acoustic parameters for singing-voice synthesis,” Proc. ICA2004, 501-504.
[10] Saitou, T., Unoki, M., and Akagi, M. (2004). “Development of the F0 control method for singing-voices synthesis,” Proc. SP2004, Nara, 491-494.
[11] Akagi, M. (2002). "Perception of fundamental frequency fluctuation," HEA-02-003-IP, Forum Acousticum Sevilla 2002 (Invited).
[12] Saitou, T., Unoki, M., and Akagi, M. (2002). "Extraction of F0 dynamic characteristics and development of F0 control model in singing voice," Proc. ICAD2002, Kyoto.
[13] Unoki, M., Saitou, T., and Akagi, M. (2002). "Effect of F0 fluctuations and development of F0 control model in singing voice perception," NATO Advanced Study Institute 2002 Dynamics of Speech Production and Perception.
[14] Akagi, M. and Kitakaze, H. (2000). "Perception of synthesized singing voices with fine fluctuations in their fundamental frequency contours," Proc. ICSLP2000, Beijing, III-458-461.

1-2 Speaker Individuality

[1] Izumida, T. and Akagi, M. (2012/03/05). “Study on hearing impression of speaker identification focusing on dynamic features,” Proc. NCSP2012, Honolulu, HW, 401-404.
[2] Akagi, M., Iwaki, M. and Minakawa, T. (1998). “Fundamental frequency fluctuation in continuous vowel utterance and its perception,” ICSLP98, Sydney, Vol.4, 1519-1522.
[3] Akagi, M. and Ienaga, T. (1997). "Speaker individuality in fundamental frequency contours and its control", J. Acoust. Soc. Jpn. (E), 18, 2 73-80.
[4] Kitamura, T. and Akagi, M. (1996). "Relationship between physical characteristics and speaker individualities in speech spectral envelopes", Proc ASA-ASJ Joint Meeting, 833-838.
[5] Akagi, M. and Ienaga, T. (1995). "Speaker individualities in fundamental frequency contours and its control", Proc. EUROSPEECH95, 439-442.
[6] Kitamura, T. and Akagi, M. (1995). "Speaker individualities in speech spectral envelopes", J. Acoust. Soc. Jpn. (E), 16, 5, 283-289.
[7] Kitamura, T. and Akagi, M. (1994). "Speaker Individualities in speech spectral envelopes", Proc. Int. Conf. Spoken Lang. Process. 94, 1183-1186.

1-3 Emotional Speech

[1] Asai, T., Suemitsu, A., and Akagi, M. (2017/03/02). “Articulatory Characteristics of Expressive Speech in Activation-Evaluation Space,” Proc. NCSP2017, Guam, USA, 313-316.
[2] Xue, Y., Hamada, Y., and Akagi, M. (2016/12/15). “Voice Conversion to Emotional Speech based on Three-layered Model in Dimensional Approach and Parameterization of Dynamic Features in Prosody,” Proc. APSIPA2016, Cheju, Korea
[3] Akagi, M. (2016/12/13). “Toward Affective Speech-to-Speech Translation,” Keynote Speech, International Conference on Advances in Information and Communication Technology 2016, Thai Nguyen, Vietnam, DOI 10.1007/978-3-319-49073-1 3.
[4] Li, Y., Morikawa, D., and Akagi, M. (2016/11/28). “A method to estimate glottal source waves and vocal tract shapes for widely pronounced types using ARX-LF model,” 2016 ASA-ASJ Joint meeting, Honolulu, Hawaii, JASA 140, 2963. doi: http://dx.doi.org/10.1121/1.4969159
[5] Xue, Y., Hamada, Y., and Akagi, M. (2016/11/28). “Emotional voice conversion system for multiple languages based on three-layered model in dimensional space,” 2016 ASA-ASJ Joint meeting, Honolulu, Hawaii, JASA 140, 2960. doi: http://dx.doi.org/10.1121/1.4969141
[6] Xue, Y., Hamada, Y., Elbarougy, R., and Akagi, M. (2016/10/27). “Voice conversion system to emotional speech in multiple languages based on three-layered model for dimensional space,” O-COCOSDA2016, Bali, Indonesia, 122-127.
[7] Elbarougy, R. and Akagi, M. (2016/10/24). “Optimizing Fuzzy Inference Systems for Improving Speech Emotion Recognition,” The 2nd International Conference on Advanced Intelligent Systems and Informatics (AISI2016), Cairo, Egypt, 85-95.
[8] Li, X. and Akagi, M. (2016/09/12). “Multilingual Speech Emotion Recognition System Based on a Three-Layer Model,” Proc. InterSpeech2016, San Francisco, 3608-3612.
[9] Xue, Y. and Akagi, M. (2016/03/07). “A study on applying target prediction model to parameterize power envelope of emotional speech,” Proc. NCSP2016, Honolulu, HW, USA, 157-160.
[10] Li, X. and Akagi, M. (2016/03/07). “Automatic Speech Emotion Recognition in Chinese Using a Three-layered Model in Dimensional Approach,” Proc. NCSP2016, Honolulu, HW, USA, 17-20.
[11] Xue, Y., Hamada, Y., and Akagi, M. (2015/12/19). “Emotional speech synthesis system based on a three-layered model using a dimensional approach,” Proc. APSIPA2015, Hong Kong, 505-514.
[12] Hamada, Y., Elbarougy, R., Xue, Y., and Akagi, M. (2015/12/9). “Study on method to control fundamental frequency contour related to a position on Valence-Activation space,” Proc. WESPAC2015, Singapore, 519-522.
[13] Li, X. and Akagi, M. (2015/10/28). “Toward Improving Estimation Accuracy of Emotion Dimensions in Bilingual Scenario Based on Three-layered Model,” Proc. O-COCOSDA2015, Shanghai, 21-26.
[14] Xiao Han, Reda Elbarougy, Masato Akagi,Junfeng Li, Thi Duyen Ngo, and The Duy Bui (2015/02/28). “A study on perception of emotional states in multiple languages on Valence-Activation approach,” Proc NCSP2015, Kuala Lumpur, Malaysia.
[15] Yasuhiro Hamada, Reda Elbarougy and Masato Akagi (2014/12/12). “A Method for Emotional Speech Synthesis Based on the Position of Emotional State in Valence-Activation Space,” Proc. APSIPA2014, Siem Reap, Cambodia.
[16] Masato AKAGI, Xiao HAN, Reda ELBAROUGY, Yasuhiro HAMADA, and Junfeng LI (2014/12/10). “Toward Affective Speech-to-Speech Translation: Strategy for Emotional Speech Recognition and Synthesis in Multiple Languages,” Proc. APSIPA2014, Siem Reap, Cambodia.
[17] Elbarougy. R., Han.X., Akagi, M., and Li, J. (2014/09/10). “Toward relaying an affective speech-to-speech translator: Cross-language perception of emotional state represented by emotion dimensions,” Proc. O-COCOSDA2014, Phuket, Thailand, 48-53.
[18] Akagi, M., Han, X., El-Barougy, R., Hamada, Y. and Li, J. (2014/08/27). “Emotional Speech Recognition and Synthesis in Multiple Languages toward Affective Speech-to-Speech Translation System,” Proc. IIHMSP2014, Kitakyushu, Japan, 574-577.
[19] Akagi, M. and Elbarougy, R. (2014/07/16). “Toward Relaying Emotional State for Speech-To-Speech Translator: Estimation of Emotional State for Synthesizing Speech with Emotion,” Proc. ICSV2014. Beijing.
[20] Li, Y. and Akagi, M. (2014/03/02). “Glottal source analysis of emotional speech,” Proc. NCSP2014, Hawaii, USA, 513-516.
[21] Elbarougy, R. and Akagi, M. (2014/03/01). “Improving Speech Emotion Dimensions Estimation Using a Three-Layer Model for Human Perception,” Acoustical Science and Technology, 35, 2, 86-98.
[22] Elbarougy, R. and Akagi, M. (2013/11/01). “Cross-lingual speech emotion recognition system based on a three-layer model for human perception,” Proc. APSIPA2013, Kaohsiung, Taiwan.
[23] Elbarougy, R. and Akagi, M. (2012/12/04). “Speech Emotion Recognition System Based on a Dimensional Approach Using a Three-Layered Model,” Proc. APSIPA2012, Hollywood, USA.
[24] Dang, J., Li, A., Erickson, D., Suemitsu, A., Akagi, M., Sakuraba, K., Mienmatasu, N., and Hirose, K. (2010/11/01). “Comparison of emotion perception among different cultures,” Acoust. Sci. & Tech. 31, 6, 394-402.
[25] Zhou, Y., Li, J., Sun, Y., Zhang, J., Yan, Y., and Akagi, M. (2010/10). “A hybrid speech emotion recognition system based on spectral and prosodic features,” IEICE Trans. Info. & Sys., E93D (10): 2813-2821.
[26] Hamada, Y., Kitamura, T., and Akagi, M. (2010/08/24). “A study on brain activities elicited by emotional voices with various F0 contours,” Proc. ICA2010, Sydney, Australia.
[27] Hamada, Y., Kitamura, T., and Akagi, M. (2010/07/01). “A study of brain activities elicited by synthesized emotional voices controlled with prosodic features,” Journal of Signal Processing, 14, 4, 265-268.
[28] Hamada, Y., Kitamura, T., and Akagi, M. (2010/03/04). "A study on brain activities elicited by synthesized emotional voices controlled with prosodic features," Proc. NCSP10, Hawaii, USA, 472-475.
[29] Dang, J., Li, A., Erickson, D., Suemitsu, A., Akagi, M., Sakuraba, K., Minematsu, N., and Hirose, K. (2009/10/6). "Comparison of emotion perception among different cultures," Proc. APSIPA2009, Sapporo, 538-544.
[30] Aoki, Y., Huang, C-F., and Akagi, M. (2009/03/01). "An emotional speech recognition system based on multi-layer emotional speech perception model," Proc. NCSP'09, 133-136.
[31] Huang, C. F. and Akagi, M. (2008/10) "A three-layered model for expressive speech perception," Speech Communication 50, 810-828.
[32] Huang, C. F., Erickson, D., and Akagi, M. (2008/07/01). "Comparison of Japanese expressive speech perception by Japanese and Taiwanese listeners," Acoustics2008, Paris, 2317-2322.
[33] Huang, C. F. and Akagi, M. (2007). "A rule-based speech morphing for verifying an expressive speech perception model," Proc. Interspeech2007, 2661-2664.
[34] Sawamura K., Dang J., Akagi M., Erickson D., Li, A., Sakuraba, K., Minematsu, N., and Hirose, K. (2007). "Common factors in emotion perception among different cultures," Proc. ICPhS2007, 2113-2116.
[35] Huang, C. F. and Akagi, M. (2007). "The building and verification of a three-layered model for expressive speech perception," Proc. JCA2007, CD-ROM.
[36] Huang, C. F. and Akagi, M. (2005). "Toward a rule-based synthesis of emotional speech on linguistic description of perception," Affective Computing and Intelligent Interaction, Springer LNCS 3784, 366-373.
[37] Huang, C. F. and Akagi, M. (2005). "A Multi-Layer fuzzy logical model for emotional speech Perception," Proc. EuroSpeech2005, Lisbon, Portugal, 417-420.
[38] Ito, S., Dang, J., and Akagi, M. (2004). “Investigation of the acoustic features of emotional speech using physiological articulatory model,” Proc. ICA2004, 2225-2226.

1-4 Voice Conversion

[1] Dinh, A. T., Phan, T. S., and Akagi, M. (2016/12/12). “Quality Improvement of Vietnamese HMM-Based Speech Synthesis System Based on Decomposition of Naturalness and Intelligibility Using Non-negative Matrix Factorization,” International Conference on Advances in Information and Communication Technology 2016, Thai Nguyen, Vietnam, 490-499, DOI 10.1007/978-3-319-49073-1 53.
[2] Dinh, A. T. and Akagi, M. (2016/10/26). “Quality Improvement of HMM-based Synthesized Speech Based on Decomposition of Naturalness and Intelligibility using Non-Negative Matrix Factorization,” O-COCOSDA2016, Bali, Indonesia, 62-67.
[3] Dinh, T. A, Morikawa, D., and Akagi, M. (2016/07/01), “Study on quality improvement of HMM-based synthesized voices using asymmetric bilinear model,” Journal of Signal Processing, 20, 4, 205-208.
[4] Dinh, T. A, Morikawa, D., and Akagi, M. (2016/03/07), “A study on quality improvement of HMM-based synthesized voices using asymmetric bilinear model,” Proc. NCSP2016, Honolulu, HW, USA, 13-16.
[5] Phung Trung Nghia, Luong Chi Mai, Masato Akagi (2015). “Improving the naturalness of concatenative Vietnamese speech synthesis under limited data conditions,” Journal of Computer Science and Cybernetics, V.31, N.1, 1-16.
[6] Phung, T. N., Phan, T. S., Vu, T. T., Loung, M. C., and Akagi, M. (2013/11/01). “Improving naturalness of HMM-based TTS trained with limited data by temporal decomposition,” IEICE Trans. Inf. & Syst., E96-D, 11, 2417-2426.
[7] Phung, T. N., Luong, M. C., and Akagi, M. (2013/09/02). “A Hybrid TTS between Unit Selection and HMM-based TTS under limited data conditions,” Proc. 8th ISCA Speech Synthesis Workshop, Barcelona, Spain 281-284.
[8] Phung, T. N., Luong, M. C., and Akagi, M. (2012/12/12). “Transformation of F0 contours for lexical tones in concatenative speech synthesis of tonal languages,” Proc. O-COCOSDA2012, Macau, 129-134.
[9] Phung, T. N., Luong, M. C., and Akagi, M. (2012/12/06). “A concatenative speech synthesis for monosyllabic languages with limited data,” Proc. APSIPA2012, Hollywood, USA.
[10] Phung, T. N., Luong, M. C., and Akagi, M. (2012/08). “On the stability of spectral targets under effects of coarticulation,” International Journal of Computer and Electrical Engineering, Vol. 4, No. 4, 537-541.
[11] Phung, T. N., Luong, M. C., and Akagi, M. (2012/08). “An investigation on speech perception under effects of coarticulation,” International Journal of Computer and Electrical Engineering, Vol. 4, No. 4, 532-536.
[12] Trung-Nghia Phung, Mai Chi Luong, and Masato Akagi (2011/02). “An investigation on perceptual line spectral frequency (PLP-LSF) target stability against the vowel neutralization phenomenon,” Proc. ICSAP2011, VI, 512-514.
[13] Trung-Nghia Phung, Mai Chi Luong, and Masato Akagi (2011/02). “An investigation on speech perception over coarticulation,” Proc. ICSAP2011, VI, 507-511.
[14] Nguyen B. P. and Akagi M. (2009/9/9). “Efficient modeling of temporal structure of speech for applications in voice transformation,” Proc. InterSpeech2009, Brighton, 1631-1634.
[15] Nguyen, B. P. and Akagi, M. (2009/5/1) "A flexible spectral modification method based on temporal decomposition and Gaussian mixture model," Acoust. Sci. & Tech., 30, 3, 170-179.
[16] Nguyen, B. P. and Akagi, M. (2009/02/20). "Applications of Temporal Decomposition to Voice Transformation," International symposium on biomechanical and physiological modeling and speech science, 19-24.
[17] Nguyen, B. P., Shibata, T., and Akagi, M. (2008/09/24). "High-quality analysis/synthesis method based on Temporal decomposition for speech modification," Proc. InterSpeech2008, Brisbane, 662-665.
[18] Nguyen B. P. and Akagi M. (2008/6/6). "Phoneme-based spectral voice conversion using temporal decomposition and Gaussian mixture model," Proc. ICCE2008, 224-229.
[19] Nguyen B. P. and Akagi M. (2008/3/7). "Control of spectral dynamics using temporal decomposition in voice conversion and concatenative speech synthesis," Proc. NCSP08, 279-282.
[20] Shibata, T. and Akagi, M. (2008/3/6). "A study on voice conversion method for synthesizing stimuli to perform gender perception experiments of speech," Proc. NCSP08, 180-183.
[21] Nguyen B. P. and Akagi M. (2007). "A flexible spectral modification method based on temporal decomposition and Gaussian mixture model," Proc. Interspeech2007, 538-541.
[22] Nguyen B. P. and Akagi M. (2007). "Spectral Modification for Voice Gender Conversion using Temporal Decomposition," Journal of Signal Processing, 11, 4, 333-336.
[23] Akagi, M., Saitou, T., and Huang, C. F. (2007). "Voice conversion to add non-linguistic information into speaking voices," Proc. JCA2007, CD-ROM.
[24] Nguyen B. P. and Akagi M. (2007). "Spectral Modification for Voice Gender Conversion using Temporal Decomposition," Proc. NCSP2007, 481-484.
[25] Takeyama, Y., Unoki, M., Akagi, M., and Kaminuma, A. (2006). "Synthesis of mimic speech sounds uttered in noisy car environments," Proc. NCSP2006, 118-121.

1-5 Speech Analysis and Coding

[1] Kosugi, T., Haniu, A., Miyauchi, R., Unoki, M., and Akagi, M. (2011/03/01). “Study on suitable-architecture of IIR all-pass filter for digital-audio watermarking technique based on cochlear-delay characteristics,”Proc. NCSP2011, Tianjin, China, 135-138.
[2] Tomoike, S. and Akagi, M. (2008). "Estimation of local peaks based on particle filter in adverse environments," Journal of Signal Processing, 12, 4, 303-306.
[3] Tomoike, S. and Akagi, M. (2008/3/7). "Estimation of local peaks based on particle filter in adverse environments," Proc. NCSP08, 391-394.
[4] Nguyen, P. C., Akagi, M., and Nguyen, P. B. (2007). "Limited error based event localizing temporal decomposition and its application to variable-rate speech coding," Speech Communication, 49, 292-304.
[5] Akagi, M., Nguyen, P. C., Saitou, T., Tsuji, N., and Unoki, M. (2004). “Temporal decomposition of speech and its application to speech coding and modification,” Proc. KEST2004, 280-288.
[6] Akagi, M. and Nguyen, P. C. (2004). “Temporal decomposition of speech and its application to speech coding and modification,” Proc. Special Workshop in MAUI (SWIM), 1-4, 2004.
[7] Nguyen, P. C. and Akagi, M. (2003). “Efficient quantization of speech excitation parameters using temporal decomposition,” Proc. EUROSPEECH2003, Geneva, 449-452.
[8] Nguyen, P. C., Akagi, M., and Ho, T. B. (2003). "Temporal decomposition: A promising approach to VQ-based speaker identification," Proc. ICME2003, Baltimore, V.III, 617-620.
[9] Nguyen, P. C., Akagi, M., and Ho, T. B. (2003). "Temporal decomposition: A promising approach to VQ-based speaker identification," Proc. ICASSP2003, Hong Kong, I-184-187.
[10] Nguyen, P. C., Ochi, T., and Akagi, M. (2003). “Modified Restricted Temporal Decomposition and its Application of Low Rate Speech Coding,” IEICE Trans. Inf. & Syst., E86-D, 3, 397-405.
[11] Nguyen, P. C. and Akagi, M. (2002). "Variable rate speech coding using STRAIGHT and temporal decomposition," Proc. SCW2002, Tsukuba, 26-28.
[12] Nguyen, P. C. and Akagi, M. (2002). "Coding speech at very low rates using STRAIGHT and temporal decomposition," Proc. ICSLP2002, Denver, 1849-1852.
[13] Nguyen, P. C. and Akagi, M. (2002). "Limited error based event localizing temporal decomposition," Proc. EUSIPCO2002, Toulouse, 190.
[14] Nguyen, P. C. and Akagi, M. (2002). "Improvement of the restricted temporal decomposition method for line spectral frequency parameters," Proc. ICASSP2002, Orlando, I-265-268.
[15] Nandasena, A. C. R., Nguyen, P. C. and Akagi, M. (2001). " Spectral stability based event localizing temporal decomposition", Computer Speech & Language, Vol. 15, No. 4, 381-401
[16] Nandasena, A.C.R. and Akagi, M. (1998). “Spectral stability based event localizing temporal decomposition,” Proc. ICASSP98, II, 957-960

1-6 Announce System

[1] Ngo, T. V., Kubo, R., Morikawa, D., and Akagi, M. (2017/03/02). “Acoustical analyses of Lombard speech by different background noise levels for tendencies of intelligibility,” Proc. NCSP2017, Guam, USA, 309-312.
[2] Kubo, R, Morikawa, D., and Akagi, M. (2016/08/22). “Effects of speaker’s and listener’s acoustic environments on speech intelligibility and annoyance”, Proc. Inter-Noise2016, Hamburg, Germany, 171-176.

2. Speech recovery

2-1 Noise Reduction

[1] Junfeng Li, Risheng Xia, Dongwen Ying, Yonghong Yan, and Masato Akagi (2014/12/05). “Investigation of objective measures for intelligibility prediction of noise-reduced speech for Chinese, Japanese, and English,” J. Acoust. Soc. Am. 136 (6), 3301-3312.
[2] Li, J, Chen, F., Akagi, M., and Yan, Y. (2013/08/27), “Comparative investigation of objective speech intelligibility prediction measures for noise-reduced signals in Mandarin and Japanese,” Proc. InterSpeech2013, Lyon, 1184-1187.
[3] Li, J., Akagi, M., and Yan, Y. (2013/07/09). “Objective Japanese intelligibility prediction for noisy speech signals before and after noise-reduction processing,” Proc. ChinaSIP2013, Beijing, 352-355.
[4] Xia, R., Li, J., Akagi, M., and Yan, Y. (2012/03/28). “Evaluation of objective intelligibility prediction measures for noise-reduced signals in Mandarin,” Proc. ICASSP2012, Kyoto, 4465-4468.
[5] Li, J., Sakamoto, S., Hongo, S., Akagi, M., and Suzuki, Y. (2011/06). “Two-stage binaural speech enhancement with Wiener filter for high-quality speech communication,” Speech Communication 53 677-689.
[6] Li, J., Yang, L., Zhang, J., Yan, Y., Hu, Y., Akagi, M., and Loizou, P. C. (2011/05). “Comparative intelligibility investigation of single-channel noise-reduction algorithms for Chinese, Japanese, and English,” J. Acoust. Soc. Am., 129, 3291-3301.
[7] Li, J., Chau, D. T., Akagi, M., Yang, L., Zhang, J., and Yan, Y. (2010/11/30). “Intelligibility Investigation of Single-Channel Noise Reduction Algorithms for Chinese and Japanese,” in: Proc. ISCSLP2010, Tainan, Taiwan.
[8] Li, J., Sakamoto, S., Hongo, S., Akagi, M., and Suzuki, Y. (2009/10/20). "Two-stage binaural speech enhancement with Wiener filter based on equalization-cancellation model," Proc. WASPAA, New Palts, NY, 133-136.
[9] Li, J., Sakamoto, S., Hongo, S., Akagi, M., and Suzuki, Y. (2009/9/21). "Advancement of two-stage binaural speech enhancement (TS-BASE) for high-quality speech communication," Proc. WESPAC2009, Beijing, CD-ROM.
[10] Yang, L., Li, J., Zhang, J., Yan, Y., and Akagi, M. (2009/6/26). "Effects of single-channel enhancement algorithms on Mandarin speech intelligibility," IEICE Tech. Report, EA2009-32.
[11] Li, J., Fu, Q-J., Jiang, H., and Akagi, M. (2009/04/24). "Psychoacoustically-motivated adaptive β-order generalized spectral subtraction for cochlear implant patients," Proc ICASSP2009, 4665-4668.
[12] Li, J., Jiang, H., and Akagi, M. (2008/09/23). "Psychoacoustically-motivated adaptive β-order generalized spectral subtraction based on data-driven optimization," Proc. InterSpeech2008, Brisbane, 171-174.
[13] Li, J., Sakamoto, S., Hongo, S., Akagi, M., and Suzuki, Y. (2008). “Adaptive -order generalized spectral subtraction for speech enhancement,” Signal Processing, vol. 88, no. 11, pp. 2764-2776, 2008.
[14] Li, J., Sakamoto, S., Hongo, S., Akagi, M., and Suzuki, Y. (2008/08/16). "Improved two-stage binaural speech enhancement based on accurate interference estimation for hearing aids," IHCON2008
[15] Li, J., Sakamoto, S., Hongo, S., Akagi, M., and Suzuki, Y. (2008/0630). "A two-stage binaural speech enhancement approach for hearing aids with preserving binaural benefits in noisy environments," Acoustics2008, Paris, 723-727.
[16] Li, J., Akagi, M., and Suzuki, Y. (2008). "A two-microphone noise reduction method in highly non-stationary multiple-noise-source environments," IEICE Trans. Fundamentals, E91-A, 6, 1337-1346.
[17] Li, J. and Akagi, M. (2008). "A hybrid microphone array post-filter in a diffuse noise field," Applied Acoustics 69, 546-557.
[18] Li, J., Sakamoto, S., Hongo, S., Akagi, M., and Suzuki, Y. (2007). "A speech enhancement approach for binaural hearing aids," Proc. 22th SIP Symposium, Sendai, 263-268.
[19] Li, J., Sakamoto, S., Hongo, S., Akagi, M., and Suzuki, Y. (2007). "Noise reduction based on adaptive beta-order generalized spectral subtraction for speech enhancement," Proc. Interspeech2007, 802-805.
[20] Li, J., Akagi, M., and Suzuki, Y. (2006). "Multi-channel noise reduction in noisy environments," Chinese Spoken Language Processing, Proc. ISCSLP2006, Springer LNCS 4274, 258-269.
[21] Li, J., Akagi, M., and Suzuki, Y. (2006). "Noise reduction based on microphone array and post-filtering for robust speech recognition," Proc. ICSP, Guilin.
[22] Li, J. and Akagi, M. (2006). "Noise reduction method based on generalized subtractive beamformer," Acoust. Sci. & Tech., 27, 4, 206-215.
[23] Li, J, Akagi, M., and Suzuki, Y. (2006). "Improved hybrid microphone array post-filter by integrating a robust speech absence probability estimator for speech enhancement," Proc. ICSLP2006, Pittsburgh, USA, 2130-2133.
[24] Li, J. and Akagi, M. (2006). "A noise reduction system based on hybrid noise estimation technique and post-filtering in arbitrary noise environments," Speech Communication, 48, 111-126.
[25] Li, J., Akagi, M., and Suzuki, Y. (2006). "Noise reduction based on generalized subtractive beamformer for speech enhancement," WESPAC2006, Seoul
[26] Li, J. and Akagi, M. (2005). "Theoretical analysis of microphone arrays with postfiltering for coherent and incoherent noise suppression in noisy environments," Proc. IWAENC2005, Eindhoven, The Netherlands, 85-88.
[27] Li, J. and Akagi, M. (2005). "A hybrid microphone array post-filter in a diffuse noise field," Proc. EuroSpeech2005, Lisbon, Portugal, 2313-2316.
[28] Li, J., Lu, X., and Akagi, M. (2005). "Noise reduction based on microphone array and post-filtering for robust speech recognition in car environments," Proc. Workshop DSPinCar2005, S2-9
[29] Li, J., Lu, X., and Akagi, M. (2005). “A noise reduction system in arbitrary noise environments and its application to speech enhancement and speech recognition,” Proc. ICASSP2005, Philadelphia, III-277-280.
[30] Li, J. and Akagi, M. (2005). “Suppressing localized and non-localized noises in arbitrary noise environments,” Proc. HSCMA2005, Piscataway.
[31] Li, J. and Akagi, M. (2004). “Noise reduction using hybrid noise estimation technique and post-filtering,” Proc. ICSLP2004, Cheju, Korea.
[32] Akagi, M. and Kago, T. (2002). " Noise reduction using a small-scale microphone array in multi noise source environment," Proc. ICASSP2002, Orlando, I-909-912.
[33] Mizumachi, M., Akagi, M. and Nakamura, S. (2000). "Design of robust subtractive beamformer for noisy speech recognition," Proc. ICSLP2000, Beijing, IV-57-60.
[34] Mizumachi, M. and Akagi, M. (2000). "Noise reduction using a small-scale microphone array under non-stationary signal conditions," Proc. WESTPRAC7, 421-424.
[35] Mizumachi, M. and Akagi, M. (1999). "Noise reduction method that is equipped for robust direction finder in adverse environments," Proc. Workshop on Robust Methods for Speech Recognition in Adverse Conditions, Tampere, Finland, 179-182.
[36] Mizumachi, M. and Akagi, M. (1998). “Noise reduction by paired-microphones using spectral subtraction,” Proc. ICASSP98, II, 1001-1004
[37] Akagi, M. and Mizumachi, M. (1997). "Noise Reduction by Paired Microphones", Proc. EUROSPEECH97, 335-338.

2-2 F0 Extraction

[1] Ishimoto, Y., Akagi, M., Ishizuka, K., and Aikawa, K. (2004). “Fundamental frequency estimation for noisy speech using entropy-weighted periodic and harmonic features,” IEICE Trans. Inf. & Syst., E87-D, 1, 205-214.
[2] Ishimoto, Y., Unoki, M., and Akagi, M. (2001). "A fundamental frequency estimation method for noisy speech based on instantaneous amplitude and frequency", Proc. EUROSPEECH2001, Aalborg, 2439-2442.
[3] Ishimoto, Y., Unoki, M., and Akagi, M. (2001). "A fundamental frequency estimation method for noisy speech based on instantaneous amplitude and frequency ", Proc. CRAC, Aalborg.
[4] Ishimoto, Y., Unoki, M., and Akagi, M. (2001). "A fundamental frequency estimation method for noisy speech based on periodicity and harmonicity", Proc. ICASSP2001, SPEECH-SF3, Salt Lake City.
[5] Ishimoto, Y. and Akagi, M. (2000). "A fundamental frequency estimation method for noisy speech," Proc. WESTPRAC7, 161-164.

2-3 De-reverberation

[1] Morita, S., Unoki, M., Lu, X., and Akagi, M. (2015/06/11). “Robust Voice Activity Detection Based on Concept of Modulation Transfer Function in Noisy Reverberant Environments,” Journal of Signal Processing Systems, DOI 10.1007/s11265-015-1014-4
[2] Morita, S., Unoki, M., Lu, X., and Akagi, M. (2014/09/13). “Robust Voice Activity Detection Based on Concept of Modulation Transfer Function in Noisy Reverberant Environments,” Proc. ISCSLP2014, Singapore, 108-112.
[3] Unoki, M., Ikeda, T., Sasaki, K., Miyauchi, R., Akagi, M., and Kim, N-S. (2013/07/08). “Blind method of estimating speech transmission index in room acoustics based on concept of modulation transfer function,” Proc. ChinaSIP2013, Beijing, 308-312.
[4] Sasaki, Y. and Akagi, M. (2012/03/05), “Speech enhancement technique in noisy reverberant environment using two microphone arrays,” Proc. NCSP2012, Honolulu, HW, 333-336.
[5] Unoki, M., Lu, X., Petrick, R., Morita, S., Akagi, M., and Hoffmann R. (2011/08/30). “Voice activity detection in MTF-based power envelope restoration,” Proc. INTERSPEECH 2011, Florence, Italy, 2609-2612.
[6] Unoki, M., Ikeda, T., and Akagi, M. (2011/06/27). “Blind estimation method of speech transmission index in room acoustics,” Proc. Forum Acousticum 2011, Aalborg, Denmark, 1973-1978.
[7] Morita, S., Lu, X., Unoki, M., and Akagi, M. (2011/03/02). “Study on MTF-based power envelope restoration in noisy reverberant environments,” Proc. NCSP2011, Tianjin, China, 247-250.
[8] Ikeda, T., Unoki, M., and Akagi, M. (2011/03/02). “Study on blind estimation of Speech Transmission Index in room acoustics,”Proc. NCSP2011, Tianjin, China, 235-238.
[9] Morita, S., Unoki, M., and Akagi, M. (2010/07/01). “A study on the IMTF-based filtering on the modulation spectrum of reverberant signal,” Journal of Signal Processing, 14, 4, 269-272.
[10] Li, J. Sasaki, Y., Akagi, M. and Yan, Y. (2010/03/04). "Experimental evaluations of TS-BASE/WF in reverberant conditions," Proc. NCSP10, Hawaii, USA, 269-272.
[11] Morita, S., Unoki, M., and Akagi, M. (2010/03/04). "A study on the MTF-based inverse filtering for the modulation spectrum of reverberant speech," Proc. NCSP10, Hawaii, USA, 265-268.
[12] Unoki, M., Yamasaki, Y., and Akagi, M. (2009/08/25). “MTF-based power envelope restoration in noisy reverberant environments,” Proc. EUSIPCO2009, Glasgow, Scotland, 228-232.
[13] Petric, R., Lu, X., Unoki, M., Akagi, M., and Hoffmann, R. (2008/09/24). "Robust front end processing for speech recognition in reverberant environments: Utilization of speech characteristics," Proc. InterSpeech2008, Brisbane, 658-661.
[14] Uniki, M., Toi, M., Shibano, Y., and Akagi, M. (2006). "Suppression of speech intelligibility loss through a modulation transfer function-based speech dereverberation method," J. Acoust. Soc. Am., 120, 5, Pt. 2, 3360.
[15] Unoki, M., Toi, M., and Akagi, M. (2006). "Refinement of an MTF-based speech dereverberation method using an optimal inverse-MTF filter," SPECOM2006, St. Petersburg, 323-326.
[16] Unoki, M., Toi, M., and Akagi, M. (2005). “Development of the MTF-based speech dereverberation method using adaptive time-frequency division,” Proc. Forum Acousticum 2005, 51-56.
[17] Toi, M., Unoki, M. and Akagi, M. (2005). “Development of adaptive time-frequency divisions and a carrier reconstruction in the MTF-based speech dereverberation method,” Proc. NCSP05, Hawaii, 355-358.
[18] Unoki, M., M., Sakata, Furukawa, K. and Akagi, M. (2004). “A speech dereverberation method based on the MTF concept in power envelope restoration,” Acoust. Sci. & Tech., 25, 4, 243-254.
[19] Unoki, M., Furukawa, M., Sakata, K. and Akagi, M. (2004). “An improved method based on the MTF concept for restoring the power envelope from a reverberant signal,” Acoust. Sci. & Tech., 25, 4, 232-242.
[20] Unoki, M., Toi, M., and Akagi, M. (2004). “A speech dereverberation method based on the MTF concept using adaptive time-frequency divisions,” Proc. EUSIPCO2004, 1689-1692.
[21] Unoki, M., Sakata, K., Toi, M., and Akagi, M. (2004). “Speech dereverberation based on the concept of the modulation transfer function,” Proc. NCSP2004, Hawaii, 423-426.
[22] Unoki, M., Sakata, K. and Akagi, M. (2003). “A speech dereverberation method based on the MTF concept,” Proc. EUROSPEECH2003, Geneva, 1417-1420.
[23] Unoki, M., Furukawa, M., Sakata, K., and Akagi, M. (2003). "A method based on the MTF concept for dereverberating the power envelope from the reverberant signal," Proc. ICASSP2003, Hong Kong, I-840-843.
[24] Unoki, M., Furukawa, M., and Akagi, M. (2002). "A method for recovering the power envelope from reverberant speech," SPA-Gen-002, Forum Acousticum Sevilla 2002.

2-4 Bone-conducted Speech

[1] Phung, T. N., Unoki, M., and Akagi, M. (2012/09/01). “A study on restoration of bone-conducted speech in noisy environments with LP-based model and Gaussian mixture model,” Journal of Signal Processing, 16, 5, 409-417.
[2] Phung, T. N., Unoki, M. and Akagi, M. (2010/12/14). “Improving Bone-Conducted Speech Restoration in noisy environment based on LP scheme,” Proc. APSIPA2010, Student Symposium, 12.
[3] Kinugasa, K., Unoki, M., and Akagi, M. (2009/07/01). "An MTF-based method for Blind Restoration for Improving Intelligibility of Bone-conducted Speech," Journal of Signal Processing, 13, 4, 339-342.
[4] Kinugasa, K., Unoki, M., and Akagi, M. (2009/6/24). "An MTF-based method for blindly restoring bone-conducted speech," Proc. SPECOM2009, St. Petersburg, Russia, 199-204.
[5] Kinugasa, K., Unoki, M., and Akagi, M. (2009/03/01). "An MTF-based Blind Restoration Method for Improving Intelligibility of Bone-conducted Speech," Proc. NCSP'09, 105-108.
[6] Vu, T. T. Unoki, M. and Akagi, M. (2008/6/5). "An LP-based blind model for restoring bone-conducted speech," Proc. ICCE2008, 212-217.
[7] Vu, T. T., Unoki, M., and Akagi, M. (2008/3/7). "A study of blind model for restoring bone-conducted speech based on liner prediction scheme," Proc. NCSP08, 287-290.
[8] Vu, T. T. Unoki, M. and Akagi, M. (2007). “The Construction of Large-scale Bone-conducted and Air-conducted Speech Databases for Speech Intelligibility Tests,” Proc. Oriental COCOSDA2007, 88-91.
[9] Vu, T. T., Unoki, M., and Akagi, M. (2007). "A blind restoration model for bone-conducted speech based on a linear prediction scheme," Proc. NOLTA2007, Vancouver, 449-452.
[10] Vu, T. T., Seide, G., Unoki, M., and Akagi, M. (2007). "Method of LP-based blind restoration for improving intelligibility of bone-conducted speech," Proc. Interspeech2007, 966-969.
[11] Vu, T., Unoki, M., and Akagi, M. (2006). "A Study on Restoration of Bone-Conducted Speech with MTF-Based and LP-based Models," Journal of Signal Processing, 10, 6, 407-417.
[12] Vu, T., Unoki, M., and Akagi, M. (2006). "A study on an LP-based model for restoring bone-conducted speech," Proc. HUT-ICCE2006, Hanoi.
[13] Vu, T. T., Unoki, M., and Akagi, M. (2006). "A study on an LPC-based restoration model for improving the voice-quality of bone-conducted speech," Proc. NCSP2006, 110-113.
[14] Kimura, K., Unoki, M. and Akagi, M. (2005). “A study on a bone-conducted speech restoration method with the modulation filterbank,” Proc. NCSP05, Hawaii, 411-414.

2-5 Speech Recognition

[1] Du, Y. and Akagi, M. (2014/03/02). “Speech recognition in noisy conditions based on speech separation using Non-negative Matrix Factorization,” Proc. NCSP2014, Hawaii, USA, 429-432.
[2] Haniu, A., Unoki, M., and Akagi, M. (2009/9/21). "A psychoacoustically-motivated conceptual model for automatic speech recognition," Proc. WESPAC2009, Beijing, CD-ROM.
[3] Lu, X., Unoki, M., and Akagi, M. (2008/11/1). "Comparative evaluation of modulation-transfer-function-based blind restoration of sub-band power envelopes of speech as a front-end processor for automatic speech recognition systems," Acoustical Science and Technology, 29, 6, 351-361.
[4] Lu, X., Unoki, M., and Akagi, M. (2008/07/01). "An MTF-based blind restoration for temporal power envelopes as a front-end processor for automatic speech recognition systems in reverberant environments," Acoustics2008, Paris, 1419-1424.
[5] Haniu, A., Unoki, M., and Akagi, M. (2008/3/6). "A speech recognition method based on the selective sound segregation in various noisy environments," Proc. NCSP08, 168-171.
[6] Haniu, A., Unoki, M. and Akagi, M. (2007). " A study on a speech recognition method based on the selective sound segregation in various noisy environments," Proc. NOLTA2007, Vancouver, 445-448.
[7] Haniu, A., Unoki, M. and Akagi, M. (2007). "A study on a speech recognition method based on the selective sound segregation in noisy environment," Proc. JCA2007, CD-ROM.
[8] Lu, X., Unoki, M., and Akagi, M. (2006). "A robust feature extraction based on the MTF concept for speech recognition in reverberant environment," Proc. ICSLP2006, Pittsburgh, USA, 2546-2549.
[9] Lu, X., Unoki, M., and Akagi, M. (2006). "MTF-based sub-band power envelope restoration in reverberant environment for robust speech recognition, " Proc. NCSP2006, 162-165.
[10] Haniu, A., Unoki, M. and Akagi, M. (2005). “A study on a speech recognition method based on the selective sound segregation in noisy environment,” Proc. NCSP05, Hawaii, 403-406.

2-6 DOA

[1] Chau, D. T., Li, J., and Akagi, M. (2014/10/01). “Binaural sound source localization in noisy reverberant environments based on Equalization-Cancellation Theory,” IEICE Trans. Fund., E97-A, 10, 2011-2020.
[2] Chau, D. T., Li, J., and Akagi, M. (2013/07/08). “Improve equalization-cancellation-based sound localization in noisy reverberant environments using direct-to-reverberant energy ratio,” Proc. ChinaSIP2013, Beijing, 322-326.
[3] Chau, D. T., Li, J., and Akagi, M. (2011/07/01). “Towards intelligent binaural speech enhancement by meaningful sound extraction,” Journal of Signal Processing, 15, 4, 291-294.
[4] Chau, D. T., Li, J., and Akagi, M. (2011/03/03). “Towards an intelligent binaural speech enhancement system by integrating meaningful signal extraction,”Proc. NCSP2011, Tianjin, China, 344-347.
[5] Chau, D. T., Li, J., and Akagi, M. (2010/09/30). "A DOA estimation algorithm based on equalization-cancellation theory," Proc. INTERSPEECH2010, Makuhari, 2770-2773.

3. Cocktail-party Effect Modeling

3-1 Sound Segregation

[1] Unoki, M., Kubo, M., Haniu, A., and Akagi, M. (2006). "A Model-Concept of the Selective Sound Segregation: — A Prototype Model for Selective Segregation of Target Instrument Sound from the Mixed Sound of Various Instruments —," Journal of Signal Processing, 10, 6, 419-431.
[2] Unoki, M., Kubo, M., Haniu, A., and Akagi, M. (2005). "A model for selective segregation of a target instrument sound from the mixed sound of various instruments," Proc. EuroSpeech2005, Lisbon, Portugal, 2097-2100.
[3] Unoki, M., Kubo, M., and Akagi, M. (2003). “A model for selective segregation of a target instrument sound from the mixed sound of various instruments,” Proc. ICMC2003, Singapore, 295-298.
[4] Akagi, M., Mizumachi, M., Ishimoto, Y., and Unoki, M. (2002). "Speech enhancement and segregation based on human auditory mechanisms", in Enabling Society with Information Technology, Q. Jin, J. Li, N. Zhang, J. Cheng, C. Yu, and S. Noguchi (Eds.), Springer Tokyo, 186-196
[5] Akagi, M., Mizumachi, M.,Ishimoto, Y., and Unoki, M. (2000). "Speech enhancement and segregation based on human auditory mechanisms", Proc. IS2000, Aizu, 246-253.
[6] Unoki, M. and Akagi, M. (1999). "Segregation of vowel in background noise using the model of segregating two acoustic sources based on auditory scene analysis", Proc. EUROSPEECH99, 2575-2578.
[7] Unoki, M. and Akagi, M. (1999). "Segregation of vowel in background noise using the model of segregating two acoustic sources based on auditory scene analysis", Proc. CASA99, IJCAI-99, Stockholm, 51-60.
[8] Akagi, M., Iwaki, M. and Sakaguchi, N. (1998). “Spectral sequence compensation based on continuity of spectral sequence,” Proc. ICSLP98, Sydney, Vol.4, 1407-1410.
[9] Unoki, M. and Akagi, M. (1998). “Signal extraction from noisy signal based on auditory scene analysis,” ICSLP98, Sydney, Vol.5, 2115-2118.
[10] Unoki, M. and Akagi, M. (1998). “A method of signal extraction from noisy signal based on auditory scene analysis,” Speech Communication, 27, 3-4, 261-279.
[11] Unoki, M. and Akagi, M. (1998). “A method of signal extraction from noisy signal based on auditory scene analysis,” JAIST Tech. Report, IS-RR-98-0005P.
[12] Unoki, M. and Akagi, M. (1997). "A method of signal extraction from noisy signal", Proc. EUROSPEECH97, 2587-2590.
[13] Unoki, M. and Akagi, M. (1997). "A method of signal extraction from noisy signal based on auditory scene analysis", Proc. CASA97, IJCAI-97, Nagoya, 93-102.
[14] Unoki, M. and Akagi, M. (1997). “A method for signal extraction from noise-added signals”, Electronics and Communications in Japan, Part 3, 80, 11, 1-11.

3-2 Privacy Protection

[1] Akagi, M. and Irie, Y. (2012/08/22). “Privacy protection for speech based on concepts of auditory scene analysis,” Proc. INTERNOISE2012, New York, 485.
[2] Tezuka, T. and Akagi, M. (2008/3/6). "Influence of spectrum envelope on phoneme perception," Proc. NCSP08, 176-179.
[3] Minowa A., Unoki M., and Akagi M. (2007). "A study on physical conditions for auditory segregation/integration of speech signals based on auditory scene analysis," Proc. NCSP2007, 313-316.

3-3 Noisy Sound Perception

[1] Yano, Y., Miyauchi, R., Unoki, M., and Akagi, M. (2012/03/06). “Study on detectability of signals by utilizing differences in their amplitude modulation,” Proc. NCSP2012, Honolulu, HW, 611-614.
[2] Kuroda, N., Li, J., Iwaya, Y., Unoki, M., and Akagi, M. (2011). “Effects of spatial cues on detectability of alarm signals in noisy environments,” In Principles and applications of spatial hearing (Eds. Suzuki, Y., Brungart, D., Iwaya, Y., Iida, K., Cabrera, D., and Kato, H.), World Scientific, 484-493.
[3] Yano, Y., Miyauchi, R., Unoki, M., and Akagi, M. (2011/03/02). “Study on detectability of target signal by utilizing differences between movements in temporal envelopes of target and background signals,” Proc. NCSP2011, Tianjin, China, 231-234.
[4] Mizukawa, S. and Akagi, M. (2011/03/02). “A binaural model accounting for spatial masking release,” Proc. NCSP2011, Tianjin, China, 179-182.
[5] Naoki Kuroda, Junfeng Li, Yukio Iwaya, Masashi Unoki, and Masato Akagi, “Effects of spatial cues on detectability of alarm signals in noisy environments,” Proc. IWPASH2009, P7. Zaou, Japan, Nov. 2009 (CDROM).
[6] Kuroda, N., Li, J., Iwaya, Y., Unoki, M., and Akagi, M. (2009/03/01). "Effects from Spatial Cues on Detectability of Alarm Signals in Car Environments," Proc. NCSP'09, 45-48.
[7] Kusaba, M., Unoki, M., and Akagi, M. (2008/3/6). "A study on detectability of target signal in background noise by utilizing similarity of temporal envelopes in auditory search," Proc. NCSP08, 13-16.
[8] Uchiyama, H., Unoku, M., and Akagi, M. (2007). "Improvement in detectability of alarm signals in noisy environments by utilizing spatial cues," Proc. WASPAA2007, New Paltz, NY, pp.74-77.
[9] Uchiyama H., Unoki M., and Akagi M. (2007). "A study on perception of alarm signal in car environments," Proc. NCSP2007, 389-392.
[10] Nakanishi, J., Unoki, M., and Akagi, M. (2006). "Effect of ITD and component frequencies on perception of alarm signals in noisy environments," Journal of Signal Processing, 10, 4, 231-234.
[11] Nakanishi, J., Unoki, M., and Akagi, M. (2006). "Effect of ITD and component frequencies on perception of alarm signals in noisy environments," Proc. NCSP2006, 37-40.

4. Psychoacoustics

4-1 Auditory Model

[1] Unoki, M. and Akagi, M. (2001). "A computational model of co-modulation masking release," in Computational Models of Auditory Function, (Eds. Greenberg, S. and Slaney, M.), NATO ASI Series, IOS Press, Amsterdam, 221-232.
[2] Unoki, M. and Akagi, M. (1998). “A computational model of co-modulation masking release,” Computational Hearing, Italy, 129-134.
[3] Unoki, M. and Akagi, M. (1998). “A computational model of co-modulation masking release,” JAIST Tech. Report, IS-RR-98-0006P.

4-2 Contextual Effect

[1] Yonezawa, Y. and Akagi, M. (1996). "Modeling of contextual effects and its application to word spotting", Proc. Int. Conf. Spoken Lang. Process. 96, 2063-2066.
[2] Akagi, M., van Wieringen, A. and Pols, L. C. W. (1994). "Perception of central vowel with pre- and post-anchors", Proc. Int. Conf. Spoken Lang. Process. 94, 503-506.

4-3 Auditory Filter

Nothing in English

4-4 Phase Perception

[1] Akagi, M., and Nishizawa, M. (2001). "Detectability of phase change and its computational modeling," J. Acoust. Soc. Am., 110, 5, Pt. 2, 2680.

4-5 Speech Perception

[1] Kubo, R., Akagi, M., and Akahane-Yamada, R. (2015/9/1). “Dependence on age of interference with phoneme perception by first- and second-language speech maskers,” Acoustical Science and Technology, 36, 5, 397-407.
[2] Kubo, R., Akagi, M., and Akahane-Yamada, R. (2014/05/09). “Perception of second language phoneme masked by first- or second-language speech in 20 – 60 years old listeners,” 167th ASA, Providence, RI
[3] Kubo, R. and Akagi, M. (2013/06/04). “Exploring auditory aging can exclusively explain Japanese adults′ age-related decrease in training effects of American English /r/-/l/,” Proc. ICA2013, 2aSC34, Montreal.

4-6 Noise Evaluation

[1] Akagi, M., Kakehi, M., Kawaguchi, M., Nishinuma, M., and Ishigami, A. (2001). "Noisiness estimation of machine working noise using human auditory model", Proc. Internoise2001, 2451-2454.
[2] Mizumachi, M. and Akagi, M. (2000). "The auditory-oriented spectral distortion for evaluating speech signals distorted by additive noises," J. Acoust. Soc. Jpn. (E), 21, 5 251-258.
[3] Mizumachi, M. and Akagi, M. (1999). "An objective distortion estimator for hearing aids and its application to noise reduction," Proc. EUROSPEECH99, 2619-2622.

4-7 Pitch Perception

[1] Ishida, M. and Akagi, M. (2010/03/04). "Pitch perception of complex sounds with varied fundamental frequency and spectral tilt," Proc. NCSP10, Hawaii, USA, 480-483.

4-8 Direction Perception

[1] Akagi, M. and Hisatsune, H. (2013/10/17). “Admissible range for individualization of head-related transfer function in median plane,” Proc. IIHMSP2013, Beijing.
[2] Hisatsune, H. and Akagi, M. (2013/03/6). “A Study on individualization of Head-Related Transfer Function in the median plane,” Proc. NCSP2013, Hawaii, USA, 161-164.

5. Physiological Auditory Modeling

[1] Ito, K. and Akagi, M. (2005). "Study on improving regularity of neural phase locking in single neurons of AVCN via a computational model," In Auditory Signal Processing, Springer, 91-99.
[2] Maki, K. and Akagi, M. (2005). "A computational model of cochlear nucleus neurons," In Auditory Signal Processing, Springer, 84-90.
[3] Ito, K. and Akagi, M. (2003). “Study on improving regularity of neural phase locking in single neuron of AVCN via computational model,” Proc. ISH2003, 77-83.
[4] Maki, K. and Akagi, M. (2003). “A computational model of cochlear nucleus neurons,” Proc. ISH2003, 70-76.
[5] Itoh, K. and Akagi, M. (2001). “A computational model of auditory sound localization,” in Computational Models of Auditory Function (Eds. Greenberg, S. and Slaney, M.), NATO ASI Series, IOS Press, Amsterdam, 97-111.
[6] Ito, K. and Akagi, M. (2000). "A computational model of binaural coincidence detection using impulses based on synchronization index." Proc, ISA2000 (BIS2000), Wollongong, Australia.
[7] Maki, K., Akagi, M. and Hirota, K. (2000). "Effect of the basilar membrane nonlinearities on rate-place representation of vowel in the cochlear nucleus: A modeling approach," In Recent Developments in Auditory Mechanics, World Scientific Publishing, 490-496.
[8] Ito, K. and Akagi, M. (2000). "A computational model of auditory sound localization based on ITD," In Recent Developments in Auditory Mechanics, World Scientific Publishing, 483-489.
[9] Ito, K. and Akagi, M. (2000). "A study on temporal information based on the synchronization index using a computational model," Proc. WESTPRAC7, 263-266.
[10] Maki, K., Akagi, M. and Hirota, K. (1999). "Effect of the basilar membrane nonlinearities on rate-place representation of vowel in the cochlear nucleus: A modeling approach," Abstracts of Symposium on Recent Developments in Auditory Mechanics, Sendai, Japan, 29P06, 166-167.
[11] Ito, K. and Akagi, M. (1999). "A computational model of auditory sound localization based on ITD," Abstracts of Symposium on Recent Developments in Auditory Mechanics, Sendai, Japan, 29P01, 156-157.
[12] Maki, K., Hirota, K. and Akagi, M. (1998). “A functional model of the auditory peripheral system: Responses to simple and complex stimuli,” Computational Hearing, Italy, 13-18.
[13] Itoh, K. and Akagi, M. (1998). “A computational model of auditory sound localization,” Computational Hearing, Italy, 67-72
[14] Maki, K. and Akagi, M. (1997). "A functional model of the auditory peripheral system", Proc. ASVA97, Tokyo, 703-710.

6. Abnormal Speech

6-1 Abnormal Speech Perception

[1] Kozaki-Yamaguchi, Y., Suzuki, N., Fujita, Y., Yoshimasu, H., Akagi, M., and Amagasa, T. (2005). "Perception of hypernasality and its physical correlates," Oral Science International, 2, 1, 21-35.
[2] Kozaki, Y., Suzuki, N., Amagasa, T., and Akagi, M. (2004). “Perception of hypernasality and its physical correlates,” Proc. ICA2004. 3313-3316.
[3] Akagi, M., Suzuki, N., Hayashi, K., Saito, H., and Michi, K. (2001). " Perception of Lateral Misarticulation and Its Physical Correlates", Folia Phoniatrica et Logopaedica, 53, 6, 291-307
[4] Akagi, M., Kitamura, T., Suzuki, N. and Michi, K. (1996). "Perception of lateral misarticulation and its physical correlates", Proc ASA-ASJ Joint Meeting, 933-936.

6-2 3D Vocal Tract Modeling

[1] Nishimoto, H. and Akagi, M. (2006). "Effects of complicated vocal tract shapes on vocal tract transfer functions," Journal of Signal Processing, 10, 4, 267-270.
[2] Nishimoto, H. and Akagi, M. (2006). "Effects of complicated vocal tract shapes on vocal tract transfer functions," Proc. NCSP2006, 114-117.
[3] Nishimoto, H., Akagi, M., Kitamura, T. and Suzuki, N. (2004). “Estimation of transfer function of vocal tract extracted from MRI data by FEM,” Proc. ICA2004, 1473-1476.
[4] Nishimoto, H., Akagi, M., Kitamura, T., Suzuki, N., and Saito, H. (2001). "FEM analysisof three-dimensional vocal tract models after tongue and mouth floor resection," J. Acoust. Soc. Am., 110, 5, Pt. 2, 2761.
[5] Nishimoto, H., Akagi, M., Kitamura, T., and Suzuki, N. (2002). "FEM analyses of three dimensional vocal tract models after tongue and mouth floor resection," NATO Advanced Study Institute 2002 Dynamics of Speech Production and Perception.

7. Interaction between Perception and Production

[1] Shih, T, Suemitsu, A., and Akagi, M. (2011/03/03). “Influences of transformed auditory feedback with first three formant frequencies,” Proc. NCSP2011, Tianjin, China, 340-343.
[2] Shih, T., Suemitsu, A., and Akagi, M. (2010/10/16). "Influences of real-time auditory feedback on formant perturbations," Proc. Auditory Research Meeting, ASJ, 40, 8, H-2010-121.
[3] Akagi, M., Dang, J., Lu, X., and Uchiyamada, T. (2006). "Investigation of interaction between speech perception and production using auditory feedback," J. Acoust. Soc. Am., 120, 5, Pt. 2, 3253.
[4] Dang, J., Akagi, M., and Honda, K. (2006). "Communication between speech production and perception within the brain - Observation and simulation," J. Comp. Sci. & Tech., 21, 1, 95-105.
[5] Matsuoka, R., Lu, X., Dang, J., and Akagi, M. (2004). “Investigation of interaction between speech perception and speech production,” Proc. KIT Int. Sympo. Brain and Language 2004, 27-28.

8. Signal Analysis

[1] Kobayashi, K., Morikawa, D., and Akagi, M. (2014/03/01). “Study on Analyzing Individuality of Instrurment Sounds Using Non-negative Matrix Factorization,” Proc. NCSP2014, Hawaii, USA, 37-40.
[2] Nishie, S. and Akagi, M. (2013/09/11). “Acoustic sound source tracking for a moving object using precise Doppler-shift measurement,” Proc. EUSIPCO2013, Marrakesh, Morocco.

9. Others

9-1 NTT & ATR

[1] Akagi, M. (1993). "Modeling of contextual effects based on spectral peak interaction", J. of Acoust. Society of America, 93, 2, 1076-1086.
[2] Akagi, M. (1992). "Psychoacoustic evidence for contextual effect models", Speech Perception, Production and Linguistic Structure, IOS Press, Amsterdam, 63-78
[3] Akagi, M. (1990). "Contextual effect models and psychoacoustic evidence for the models", Proc. Int. Conf. Spoken Lang. Process. 90, 569-572.
[4] Akagi, M. and Tohkura, Y. (1990). "Spectrum target prediction model and its application to speech recognition", Computer Speech and Language, 4, Academic Press 325-344.
[5] Akagi, M. (1990). "Psychoacoustic evidence for a contextual effect model", J. Acoust. Soc. Am., Spl 1, 87, MMM2 (119th Meeting of ASA).
[6] Akagi, M. (1990). "Evaluation of a spectrum target prediction model in speech perception", J. of Acoust. Society of America, 87, 2, 858-865.
[7] Ueda, K. and Akagi, M. (1990). "Sharpness and amplitude envelopes of broadband noise", J. of Acoust. Society of America, 87, 2, 814-819.
[8] Akagi, M. (1989). "Modeling of contextual effect based on spectral peak interaction", J. Acoust. Soc. Am., Spl 1, 85, II8 (117th Meeting of ASA).
[9] Akagi, M. and Tohkura, Y. (1988). "On the application of spectrum target prediction model to speech recognition", Proc. Int. Conf. Acoustics Speech and Signal Process., New York, 139-142.
[10] Akagi, M. (1987). "Evaluation of a spectrum target prediction model in speech perception", J. Acoust. Soc. Am., Spl 1, 81, G8 (113th Meeting of ASA).
[11] Furui, S. and Akagi, M. (1985). "On the role of spectral transition in phoneme perception and its modeling", Proc. 12th Int. Conf. Acoustics, A2-6.

9-2 Tokyo Institute of Technology

[1] Akagi, M., and Iijima, T. (1984). "A construction of pole-deviation tracking filter," Electronics and Communications in Japan, 67-A, 5, 28-36.
[2] Akagi, M., and Iijima, T. (1982). "Speech Recognition by polarized linear predictive error coding –POLPEC method," Electronics and Communications in Japan, 65-A, 8, 9-18.