Takashi Fukuda

Title

Senior Technical Staff Member, Master Inventor - Audio, Speech, and Language Processing

Bio

He has been with the Speech Technology group, AI Technologies at IBM Research - Tokyo since he joined IBM in April 2005 after he received a Ph.D in Engineering from Toyohashi University of Technology. His research interests include automatic speech recognition, deep learning, and auditory signal processing. His research contributions span the entire area of speech recognition, resulting in numerous scientific papers, patents, invited talks, and technology development awards as well as major accomplishments for enterprise-grade AI speech technology solutions. He holds multiple technical positions including the team lead of global speech research project, Senior Technical Staff Member at IBM Research, and IBM Global Master Inventor. He is also an elected member of the IEEE Speech and Language Technical Committee (SLTC). Senior Member of IEEE, the Information Processing Society of Japan (IPSJ), and the Institute of Electronics, Information and Communication Engineers (IEICE).

International Conference Papers

Sashi Novitasari, Takashi Fukuda, Gakuto Kurata, 'Voice Activity-based Text Segmentation for ASR Text Denormalization', Proc. of 26th Annual Conference on the International Speech Communication Association (Interspeech 2025), September 2025, Rotterdam, Netherlands.
Sashi Novitasari, Takashi Fukuda, Gakuto Kurata, 'Improving End-to-end Mixed-case ASR with Knowledge Distillation and Integration of Voice Activity Cues', Proc. of 26th Annual Conference on the International Speech Communication Association (Interspeech 2025), September 2025, Rotterdam, Netherlands.
Takashi Fukuda, Gakuto Kurata, and George Saon, 'Knowledge Distillation Based Training of Unified Conformer CTC Models for Multi-form ASR', Proc. of 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025), April 2025, Hyderabad, India.
Takashi Fukuda and Samuel Thomas, 'Effective Training of RNN Transducer Models on Diverse Sources of Speech and Text Data', Proc. of 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023), June 2023, Rhodes, Greece.
Takashi Fukuda, Samuel Thomas, Masayuki Suzuki, Gakuto Kurata, George Saon, Brian Kingsbury, 'Global RNN Transduce Models For Multi-dialect Speech Recognition', Proc. of 23th Annual Conference on the International Speech Communication Association (Interspeech 2022), September 2022, Incheon, Korea, (a hybrid form conference).
Sashi Novitasari, Takashi Fukuda, Gakuto Kurata, 'Improving ASR Robustness in Noisy Condition Through VAD integration', Proc. of 23th Annual Conference on the International Speech Communication Association (Interspeech 2022), September 2022, Incheon, Korea, (a hybrid form conference).
Xiaodong Cui, George Saon, Tohru Nagano, Masayuki Suzuki, Takashi Fukuda, Brian Kingsbury, Gakuto Kurata, "Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing", Proc. of 23th Annual Conference on the International Speech Communication Association (Interspeech 2022), September 2022, Incheon, Korea, (a hybrid form conference).
Takashi Fukuda and Samuel Thomas, 'Knowledge Distillation Based Training of Universal ASR Source Models for Cross-lingual Transfer', Proc. of 22th Annual Conference on the International Speech Communication Association (Interspeech 2021), September 2021, Brno, Czech Republic, (a hybrid form conference).
Takashi Fukuda and Gakuto Kurata, 'Generalized Knowledge Distillation from an Ensemble of Specialized Teachers Leveraging Unsupervised Neural Clustering', Proc. of 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2021), June 2021, Toronto, Canada, (a fully virtual conference).
Takashi Fukuda and Samuel Thomas, 'Implicit Transfer of Privileged Acoustic Information in a Generalized Knowledge Distillation Framework', Proc. of 21th Annual Conference on the International Speech Communication Association (Interspeech 2020), October 2020, Shanghai, China, (a fully virtual conference).
Takashi Fukuda and Samuel Thomas, 'Mixed bandwidth acoustic modeling leveraging knowledge distillation', Proc. of IEEE Workshop on Automatic Speech Recognition and Unterstanding (ASRU 2019), December 2019, Sentosa, Singapore.
Tohru Nagano, Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata, 'Data augmentation based on vowel stretch for improving children’s speech recognition', Proc. of IEEE Workshop on Automatic Speech Recognition and Unterstanding (ASRU 2019), December 2019, Sentosa, Singapore.
Takashi Fukuda, Masayuki Suzuki, and Gakuto Kurata, 'Direct Neuron-wise Fusion of Cognate Neural Networks,' Proc. of 20th Annual Conference on the International Speech Communication Association (Interspeech 2019), September 2019, Graz, Austria.
Futoshi Iwama and Takashi Fukuda, 'Automated Testing of Basic Recognition Capability for Speech Recognition Systems,' Proc. IEEE International Conference on Software Testing, Verification and Validatation (ICST), April 2019, Xi'an, China.
Takashi Fukuda, Raul Fernandez, Andrew Rosenberg, Samuel Thomas, Bhuvana Ramabhadran, Alexander Sorin, and Gakuto Kurata, 'Data Augmentation Improves Recognition of Foreign Accented Speech,' Proc. of 19th Annual Conference on the International Speech Communication Association (Interspeech 2018), pp. 2409-2413, September 2018, hyderabad, India.
Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata, and Samuel Thomas, Jia Cui, and Bhuvana Ramabhadran, 'Efficient Knowledge Distillation from an Ensemble of Teachers,' Proc. of 18th Annual Conference on the International Speech Communication Association (Interspeech 2017), pp. 3697-3701, August 2017, Stockholm, Sweden.
Michael Heck, Masayuki Suzuki, Takashi Fukuda, Gakuto Kurata, and Satoshi Nakamura, 'Ensembles of Multi-scale VGG Acoustic Models,' Proc. of 18th Annual Conference on the International Speech Communication Association (Interspeech 2017), pp. 1616-1620, August 2017, Stockholm, Sweden.
Osamu Ichikawa, Takashi Fukuda, Gakuto Kurata, and Steven J. Rennie, 'Factorial Modeling for Effective Suppression of Directional Noise,' Proc. of 18th Annual Conference on the International Speech Communication Association (Interspeech 2017), pp. 389-393, August 2017, Stockholm, Sweden.
Takashi Fukuda, Osamu Ichikawa, Gakuto Kurata, Ryuki Tachibana, Samuel Thomas, and Bhvana Ramabhadran, 'Effective joint training of denoising feature space transforms and neural network based acoustic models,' Proc. of 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2017), pp. 5190-5194, March 2017, New Orleans, Louisiana, USA.
Osamu Ichikawa, Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata, and Bhuvana Ramabhadran, 'Harmonic feature fusion for robust neural network-based acoustic modeling,' Proc. of 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2017), pp. 5195-5199, March 2017, New Orleans, Louisiana, USA.
Takashi Fukuda, Osamu Ichikawa, and Ryuki Tachibana, 'Convolutional Neural Network Pre-trained with Projection Matrices on Linear Discriminant Analysis,' Proc. of 2016 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2016), pp. 5345-5349, March 2016, Shanghai, China.
Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura, Steven J. Rennie, and Vaibhava Goel, 'Regularized Feature-space Discriminative Adaptation for Robust ASR,' Proc. of 15th Annual Conference on the International Speech Communication Association (Interspeech 2014), pp.2185-2188, September 2014, Singapore.
Osamu Ichikawa, Steven J. Rennie, Takashi Fukuda, and Masafumi Nishimura, 'Channel-mapping for speech corpus recycling,' Proc. of 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2013), pp.7160-7164, May 2013, Vancouver, Canada.
Takashi Fukuda, Ryuki Tachibana, Upendra Chaudhari, Bhuvana Ramabhadran, and Puming Zhan, 'Constructing Ensembles of Dissimilar Acoustic Models using Hidden Attributes of Training Data,' Proc. of 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012), pp.4141-4144, March 2012, Kyoto, Japan.
Osamu Ichikawa, Steven Rennie, Takashi Fukuda, and Masafumi Nishimura, 'Model-based Noise Reduction Reveraging Frequency-wise Confidence Metric for In-car Speech Recognition,' Proc. of 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012), pp.4921-4924, March 2012, Kyoto, Japan.
Ryuki Tachibana, Takashi Fukuda, Upendra Chaudhari, Bhuvana Ramabhadran, and Puming Zhan, 'Frame-level AnyBoost for LVCSR with the MMI Criterion,' Proc. of IEEE Workshop on Automatic Speech Recognition and Unterstanding (ASRU 2011), pp.12-17, December 2011, Hawaii, USA.
Takashi Fukuda, Osamu Ichikawa, and Masafumi Nishimura, 'Combining Feature Space Discriminative Training with Long-term Spectro-temporal Features for Noise-robust Speech Recognition,' Proc. of 12th Annual Conference on the International Speech Communication Association (Interspeech 2011), pp.229-232, August 2011, Florence, Italy.
Takashi Fukuda, Osamu Ichikawa, and Masafumi Nishimura, 'Breath-detection-based Telephony Speech Phrasing,' Proc. of 12th Annual Conference on the International Speech Communication Association (Interspeech 2011), pp.2625-2628, August 2011, Florence, Italy.
Takashi Fukuda, Osamu Ichikawa, and Masafumi Nishimura, 'Improved Voice Activity Detection Using Static Harmonic Features,' Proc. of 2010 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2010), pp.4482-4485, March 2010, Dallas, Texas, USA.
Osamu Ichikawa, Takashi Fukuda, and Masafumi Nishimura, 'Dynamic Features in the Linear Domain for Robust Automatic Speech Recognition in a Reverberant Environment,' Proc. of 11th European Conference on Speech Communication and Technology (Eurospeech 2009 / Interspeech 2009), pp.44-47, September 2009, Brighton, U.K.
Takashi Fukuda, Osamu Ichikawa, and Masafumi Nishimura, 'Short- and Long-term Dynamic Features for Robust Speech Recognition,' Proc of 10th International Conference on Spoken Language Processing (ICSLP 2008 / Interspeech 2008), pp.2262-2265, September 2008, Brisbane, Australia.
Takashi Fukuda, Osamu Ichikawa, and Masafumi Nishimura, 'Phone-duration-dependent Long-term Dynamic Features for Stochastic Model-based Voice Activity Detection,' Proc of 10th International Conference on Spoken Language Processing (ICSLP 2008 / Interspeech 2008), pp.1293-1296, September 2008, Brisbane, Australia.
Osamu Ichikawa, Takashi Fukuda, and Masafumi Nishimura, 'Local Peak Enhancement Combined with Noise Reduction Algorithms for Robust Automatic Speech Recognition in Automobiles,' Proc. of 2008 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2008), pp.4865-4868, April 2008, Las Vegas, Nevada, USA.
Takashi Fukuda and Tsuneo Nitta, 'Designing Multiple Distinctive Phonetic Feature Extractors for Canonicalization by Using Clustering Technique,' Proc. of 9th European Conference on Speech Communication and Technology (Eurospeech 2005 / Interspeech 2005), pp.3141-3144,September 2005, Lisbon, Portugal.
Muhammad Ghulam, Takashi Fukuda, Junsei Horikawa, and Tsuneo Nitta, 'Pitch-Synchronous ZCPA (PS-ZCPA)-Based Feature Extraction with Auditory Masking,' Proc. 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2005), Vol. I, pp.517-520, March 2005, Philadelphia, Pennsylvania, USA.
Takashi Fukuda and Tsuneo Nitta, 'Canonicalization of Feature Parameters for Automatic Speech Recognition,' Proc. of 8th International Conference on Spoken Language Processing (ICSLP 2004 / Interspeech 2004), Vol.IV, pp.2537-2540, October 2004, Korea.
Muhammad Ghulam, Takashi Fukuda, Junsei Horikawa, and T. Nitta, 'A Noise-Robust Feature Extraction Method Based on Pitch-Synchronous ZCPA for ASR,' Proc. 8th International Conference on Spoken Language Processing (ICSLP 2004 / Interspeech 2004), Vol.I, pp.133-136, October 2004, Jeju, Korea.
Takashi Fukuda and Tsuneo Nitta, 'Noise-robust Automatic Speech Recognition Using Orthogonalized Distinctive Phonetic Feature Vectors,' Proc. of 8th European Conference on Speech Communication and Technology (Eurospeech 2003 / Interspeech 2003), Vol.III, pp.2189-2192, September 2003, Geneva, Switzerland.
Takashi Fukuda and Tsuneo Nitta, 'Noise-robust ASR by Using Distinctive Phonetic Features Approximated with Logarithmic Normal Distribution of HMM,' Proc. of 8th European Conference on Speech Communication and Technology (Eurospeech 2003 / Interspeech 2003), Vol.III, pp.2185-2188,September 2003, Geneva, Switzerland.
Muhammad Ghulam, Takashi Fukuda, and Tsuneo Nitta, 'Voice Quality Normalization in an Utterance for Robust ASR,' Proc. 8th European Conference on Speech Communication and Technology (Eurospeech 2003 / Interspeech 2003), Vol.III, pp.2173-2176, September 2003, Geneva, Switzerland.
Tsuneo Nitta, Shingo Iseji, Takashi Fukuda, Hirobumi Yamada, and Katsurada Katsurada, 'Key-word Spotting Using Phonetic Distinctive Features Extracted from Output of an LVCSR Engine,' Proc. ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition (SSPR 2003), pp.99-102, April 2003, Tokyo, Japan.
Takashi Fukuda, Wataru Yamamoto and Tsuneo Nitta, 'Distinctive Phonetic Feature Extraction for Robust Speech Recognition,' Proc. of 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), Vol.Ⅱ, pp.25-28,April 2003, Hong Kong, China.
Muhammad Ghulam, Takaharu Sato, Takashi Fukuda, and Tsuneo Nitta, 'Improving Performance of an HMM-based ASR System By Using Monophone-Level Normalized Confidence Measure,' Proc. 7th International Conference on Spoken Language Processing (ICSLP 2002 / Interspeech 2002), Vol.IV, pp.2453-2456, September 2002, Denver, Colorado, USA.
Takaharu Sato, Muhammad Ghulam, Takashi Fukuda, and Tsuneo Nitta, 'Confidence Scoring for Accurate HMM-based Word Recognition By Using SM-based Monophone Score Normalization,' Proc. 2002 IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP 2002), Vol.I, pp.217-220, May 2002, Orlando, Florida, USA.
Takashi Fukuda, Masashi Takigawa and Tsuneo Nitta, 'Peripheral Features for HMM-based Speech Recognition,' Proc. of 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2001), Vol.I, pp.129-132, May 2001, Salt Lake City, Utah, USA.
Tsuneo Nitta, Masashi Takigawa, and Takashi Fukuda, 'A Novel Feature Extraction Using Multiple Acoustic Feature Planes for HMM-based Speech Recognition,' Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000 / Interspeech 2000), Vol.I, pp.385-388, October 2000, Beijing, China.

Journal Papers

Takashi Fukuda, Osamu Ichikawa, and Masafumi Nishimura, 'Detecting Breathing Sounds in Realistic Japanese Telephone Conversations and Its Application to Automatic Speech Recognition,' Speech Communication, Vol. 98, pp.95-103, April 2018.
Takashi Fukuda, Osamu Ichikawa, and Ryuki Tachibana, 'Improving Feature-space Discriminative Training and Adaptation Using Regularization Process,' Journal of Information Processing Society of Japan (IPSJ), Vol.58, No.1, pp.288-296, In Japanese, January 2017.
Osamu Ichikawa, Steven J. Rennie, Takashi Fukuda, and Daniel Willett, 'Speech corpus recycling for acoustically cross-domain environments for automatic speech recognition,' Acoustical Science and Technoloby, Vol.37, No.2, pp.55-65, In Japanese, March 2016.
Osamu Ichikawa, Takashi Fukuda, and Ryuki Tachibana, 'Effective speech suppression using a two-channel microphone array for privacy protection in face-to-face sales monitoring, ' Acoustical Science and Technoloby, Vol.36, No.6, pp.507-515, November 2015.
Takashi Fukuda, Ryuki Tachibana, Daniel Willett, and Zhan Puming, 'Ensembles of Dissimilar Acoustic Models Based on Big Data for Large Vocabulary Continuous Speech Recognition, ' The Institute of Electronics, Information and Communication Engineers (IEICE) Transactions on Information and Systems, Vol.J98-D, No.8, pp.1162-1170, In Japanese, August 2015.
Takashi Fukuda, Osamu Ichikawa, and Masafumi Nishimura, 'Long-term Spectro-temporal and Static Harmonic Features for Voice Activity Detection,' IEEE Journal of Selected Topics in Signal Processing, Vol.4, No.5, pp.834-844, 2010.
Osamu Ichikawa, Takashi Fukuda, and Masafumi Nishimura, 'Dynamic Features in the Linear-Logarithmic Hybrid Domain for Automatic Speech Recognition in a Reverberant Environment,' IEEE Journal of Selected Topics in Signal Processing，Vol.4, No.5, pp.816-823, 2010.
Osamu Ichikawa, Takashi Fukuda, and Masafumi Nishimura, 'DOA Estimation with Local-Peak-Weighted CSP,' EURASIP Journal on Advances in Signal Processing，Volume 2010, Article ID 358729, 9 pages, 2010.
Osamu Ichikawa, Takashi Fukuda, and Masafumi Nishimura, 'Local Peak Enhancement for In-Car Speech Recognition in Noisy Environment,' The Institute of Electronics, Information and Communication Engineers (IEICE) Transactions on Information and Systems, Vol. E91-D, No.3, pp.635-639, March 2008.
Mohammad Nurul Huda, Muhammad Ghulam, Takashi Fukuda，Kouichi Katsurada, and Tsuneo Nitta, 'Canonicalization of Feature Parameters for Robust Speech Recognition Based on Distinctive Phonetic Feature (DPF) Vectors,' The Institute of Electronics, Information and Communication Engineers (IEICE) Transactions on Information and Systems，Vol. E91-D, No.3, pp.488-498, March 2008.
Muhammad GHULAM, Takashi Fukuda, Kohichi Katsurada, Junsei Horikawa, and Tsuneo Nitta, 'PS-ZCPA based features extraction with auditory masking, modulation enhancement and noise reduction for robust ASR,' The Institute of Electronics, Information and Communication Engineers (IEICE) Transactions on Information and Systems, Vol.E89-D, No.3, pp.1015-1023, March 2005.
Takashi Fukuda and Tsuneo Nitta, 'Orthogonalized Distinctive Phonetic Feature Extraction for Noise-robust Automatic Speech Recognition,' The Institute of Electronics, Information and Communication Engineers (IEICE) Transactions on Information and Systems, Vol.E87-D, No.5, pp.1110-1118, May 2004.
Muhammad Ghulam, Takaharu Sato, Takashi Fukuda, and Tsuneo Nitta, 'Confidence Scoring for Accurate HMM-based Speech Recognition by Using Monophone-Level Normalization Based on Subspace Method,' The Institute of Electronics, Information and Communication Engineers (IEICE) Transactions on Information and Systems, Vol.E86-D, No.3, pp.430-437, March 2003.
Takashi Fukuda and Tsuneo Nitta, 'Improvement in both Tasks of LVCSR and ISWR by using Peripheral Feature Extraction and CMN Control,' Journal of Information Processing Society of Japan (IPSJ), Vol.43，No.7，pp.2022-2029，In Japanese, July 2002.

IBM Internal Papers

Takashi Fukuda, Masafumi Nishimura, 'Speech Phrasing Based on Breath Detection for Telephone Conversations in Call Center', IBM Professional paper, In Japanese, December 2010. (Best paper award of the year)
Takashi Fukuda, Masafumi Nishimura, 'Speech Phrasing Based on Breath Detection for Telephone Conversations in Call Center', IBM PROVISION, No.68, pp.80-87, In Japanese, February 2011.

External Awards

IPSJ Industrial Achievement Award 2020, Information Processing Society of Japan, 2021
The 26th Technology Development Award, Acoustical Society of Japan, 2018
FOSE Full paper category Contribution Award, Japan Society for Software Science and Technology, 2015
IPSJ Yamashita SIG Research Award, Information Processing Society of Japan, 2013
IEICE ISS Young Researcher's Award in Speech Field, The Institute of Electronics, Information and Communication Engineers(IEICE), 2012
IBM professionals' papers, Best paper award, 2010
The 28th Awaya Kiyoshi Academic Encouraging Award, The Acoustical Society of Japan(ASJ), 2010
Research Grant , The Hori Sciences and Arts Foundation, 2003
Overseas Travel Grant, The Telecommunications Advancement Foundation, 2003
Overseas Travel Grant, Inoue Foundation for Science, 2003.
Overseas Travel Grant, International Information Science Foundation, 2001.

Professional Activities/Membership

Senior Member, Institute of Electrical and Electronics Engineers (IEEE)
Member, International Speech and Communication Association (ISCA)
Senior Member, Information Processing Society of Japan (IPSJ)
Senior Member, Institute of Electronics, Information, and Communication Engineers of Japan (IEICE)
Member, Acoustical Society of Japan (ASJ)

Committee Member

Committee Member, IEEE Speech and Language Technical Committee (SLTC), Jan 2025 -
Editorial Committee Member, Intelligence Group, Journal of Information Processing Society of Japan / Journal of Information Processing, Jun 2019 - May 2023
Board Member, SIG Spoken Language Processing, Information Processing Society of Japan, Apr.2017 - Mar.2019.
Committee Member (reappointed), SIG Spoken Language Processing, Information Processing Society of Japan, Apr.2015 - Mar.2017, Apr.2019 - Mar. 2023
Committee Member, Speech Processing, Institute of Electronics, Information and Communication Engineers, Jun.2015 - May 2019, Jun.2021 - May 2023
Part-time Lecturer, Toyohashi University of Technology, Jul.2015
Professional Investigator, National Institute of Science and Technology Policy (NISTEP), Ministry of Education, Culture, Sports, Science and Technology, Apr.2014 -
Committee Member, SIG Spoken Language Processing, Information Processing Society of Japan, Apr.2008 - Mar.2012

Reviewer

IEEE/ACM Transactions on Audio, Speech, and Language Processing
IEEE Journal of Selected Topics in Signal Processing
IEEE Journal of Biomedical and Health Informatics
IEEE Access
IEEE Signal Processing Society Letter
Speech Communication
Computer Speech and Language
Digital Signal Processing
Expert Systems With Applications
EURASIP Journal on Advances in Signal Processing
Electronics and Telecommunications Research Institute (ETRI) Journal
Institute of Electronics, Information and Communication Engineers (IEICE) Transactions
Journal of Information Processing Society of Japan
Journal of Acoustical Society of Japan
Journal of Signal Processing
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)
IEEE Spoken Language Technology (SLT) Workshop
ISCA Interspeech
European Signal Processing Conference (EUSIPCO)
Asia-Pacific Signal and Information Processing Association (APSIPA) ASC

Japanese Conference Papers

福田隆，“会話音声を対象とした実環境向けニューラル音声認識技術，” 情報処理学会研究報告（音声言語情報処理），Vol.2020-SLP-132 (7)，pp.1-6, June 2020.（招待講演）
高木信二，安藤厚志，越智景子，沢田慶，塩田さやか，鈴木雅之，玉森聡，俵直弘，福田隆，増村亮，'国際会議Interspeech2018参加報告,”　情報処理学会研究報告（音声言語情報処理），Vol.2019-SLP-126 (7)，pp.1-6, February 2019.
福田隆，ラウルフェルナンデス，サミュエルトーマス，アレキサンダーソリン，倉田岳人，“データ拡張処理の非ネイティブ英語音声認識への効果，” 情報処理学会研究報告（音声言語情報処理），Vol.2018-SLP-124 (2)，pp.1-6, October 2018.
福田隆，Samuel Thomas，“異なる決定木に基づくニューラルネットワーク音響モデルからの知識蒸留，” 電子情報通信学会技術研究報告（音声），SP2018-20，pp. 21-24，June 2018.
福田隆，鈴木雅之，倉田岳人，Samuel Thomas，Bhuvana Ramabhadran，“広帯域用ニューラルネットワーク音響モデル群から狭帯域用音響モデルへの知識蒸留，”　情報処理学会研究報告（音声言語情報処理），Vol.2018-SLP-120 (15)，pp.1-6，February 2018．
Michael Heck, Masayuki Suzuki, Takashi Fukuda, Gakuto Kurata, Satoshi Nakamura, 'Distilling Knowledge from a Multi-scale Deep CNN Ensemble for Robust and Light-weight Acoustic Modeling', 情報処理学会研究報告（音声言語情報処理），Vol.2018-SLP-120 (1)，pp. 1-4，February 2018．
高木信二，倉田岳人，郡山知樹，塩田さやか，鈴木雅之，玉森聡，俵直弘，中鹿亘，福田隆，増村亮，森勢将雅，山岸順一，山本克彦，'国際会議Interspeech2017参加報告,”　情報処理学会研究報告（音声言語情報処理），Vol.2018-SLP-120 (14)，pp.1-9, February 2018.
Michael Heck, Masayuki Suzuki, Takashi Fukuda, Gakuto Kurata, Satoshi Nakamura, 'Knowledge Distillation from a Multi-scale VGG Ensemble for Acoustic Modeling', 日本音響学会2017年秋季研究発表会講演論文集，1-10-5, September 2017.
市川治，福田隆，鈴木雅之，倉田岳人，'狭帯域と広帯域の両方をサポートする CNN ベースのミックスバンド音響モデル'，日本音響学会2017年秋季研究発表会講演論文集，1-R-10，September 2017.
浅見太一，大谷大和，岡本拓磨，小川哲司，落合翼，亀岡弘和，駒谷和範，高木信二，高道慎之介，俵直弘，南條浩輝，橋本佳，福田隆，増村亮，松田繁樹，李晃伸，渡部晋治，'国際会議ICASSP2017参加報告,”　情報処理学会研究報告（音声言語情報処理），Vol.2017-SLP-117 (3)，pp.1-8, July 2017.
市川治，福田隆，鈴木雅之，倉田岳人，'ロバスト音声認識のための調波構造情報を取り込んだニューラルネットワーク音響モデル'，日本音響学会2017年春季研究発表会講演論文集，1-5-3，March 2017.
峯松信明，秋田祐哉，浅見太一，伊藤信貴，落合翼，郡山知樹，齋藤大輔，塩田さやか，篠崎隆宏，鈴木雅之，高木信二，俵直弘，橋本佳，樋口卓哉，福田隆，'国際会議ICASSP2016参加報告,”　情報処理学会研究報告（音声言語情報処理），Vol.2016-SLP-112 (5)，pp.1-6, July 2016.
市川治，福田隆，立花隆輝，“ファクトリアルモデルによるビームフォーマのエイリアシング対策，”　日本音響学会2016年春季研究発表会講演論文集，1-P-2，March 2016.
岩間太，福田隆，“音声認識システムの語彙列受理可能性テスト自動化，” 日本ソフトウェア科学会, 第22回ソフトウェア工学の基礎ワークショップ (FOSE 2015), ソフトウェア工学の基礎XXII (近代科学社出版), pp.87-96, November 2015. （フルペーパーセッション採録，貢献賞フルペーパー部門受賞）
市川治，福田隆，立花隆輝，“エイリアシング重み付きポストフィルタとバイナリマスクによる指向性雑音抑圧，”　日本音響学会2015年秋季研究発表会講演論文集，1-P-30，September 2015.
福田隆，市川治，立花隆輝，“正則化処理を有する識別的特徴変換行列の音響環境適応，”　日本音響学会2015年春季研究発表会講演論文集，1-1-6，pp.15-18，March 2015.
市川治，福田隆，立花隆輝，“対面販売モニタリングのための小規模マイクロフォンアレイによる効果的なスピーチサプレッション，”　日本音響学会2015年春季研究発表会講演論文集，2-10-4，pp.587-590，March 2015.
浅見太一，岩野公司，小川哲司，駒谷和範，齋藤大輔，篠田浩一，太刀岡勇気，東中竜一郎，福田隆，増村亮，渡部晋治，'国際会議INTERSPEECH2014, SLT2014参加報告,”　情報処理学会研究報告（音声言語情報処理），Vol.2014-SLP-105 (7)，pp.1-6, Feburary 2015.
市川治，福田隆，立花隆輝，“大規模音声データを異なる音響環境向けの音響モデル学習データに変換するオーディオマッピング技術，” 日本音響学会講2014年秋季研究発表会講演論文集，1-8-4, pp.11-14, September 2014．
福田隆，立花隆輝，西村雅史, Upendra Chaudhari, Bhuvana Ramabhadran, Puming Zhan，“音声データの隠れ属性を利用した異種音響モデル群の構築，” 情報処理学会研究報告（音声言語情報処理），Vol.2012-SLP-93 (3)，pp.1-6, October 2012. （音声言語情報処理研究会 SIG-SLP 山下記念研究賞）
市川治，福田隆，西村雅史，“メルバンドごとの信頼性指標を組み込んだ因子モデルに基づくモデルベース雑音補正，” 日本音響学会講2012年春季研究発表会講演論文集，1-7-14, pp.33-36, March 2012．
福田隆，市川治，西村雅史，“息継ぎ音を利用した電話音声の発話分割，” 電子情報通信学会技術研究報告（音声），SP2011-153, pp.243-248, February 2012. （電子情報通信学会／日本音響学会音声研究会研究奨励賞）
**福田隆，**市川治，西村雅史，“特徴空間における長時間スペクトル変動成分の識別学習，” 情報処理学会研究報告（音声言語情報処理），Vol.2012-SLP-90 (21)，pp.1-6, February 2012.
福田隆，'音声特徴抽出の基礎と最近の研究動向，” 電子情報通信学会技術研究報告（音声），SP2011-30，pp.1-6，June 2011．（招待講演）
福田隆，市川治，西村雅史，“音声認識のための長時間変動量と線形判別分析の比較検討，” 日本音響学会2010年秋季研究発表会講演論文集，1-9-2，pp.3-6，September 2010.
市川治，福田隆，西村雅史，“音声認識における母音区間の位相の安定性の利用，” 日本音響学会講2010年秋季研究発表会講演論文集，1-Q-7，pp.127-130, September 2010.
福田隆，市川治，西村雅史，“頑健な音声認識のための線形－対数ハイブリッド領域における長時間動的特徴量，” 日本音響学会2010年春季研究発表会講演論文集，1-6-2，pp.5-8，March 2010．（日本音響学会粟屋潔学術奨励賞）
市川治，福田隆，西村雅史，“残響にロバストな音声認識のための動的特徴量と調波構造重み付けメルフィルタバンク，” 日本音響学会講2010年春季研究発表会講演論文集，1-6-1，pp.1-4, March 2010．
福田隆，市川治，西村雅史，“長時間スペクトル変動と調波構造に基づく発話区間検出法の音声認識による評価，” 情報処理学会研究報告（音声言語情報処理），2009-SLP-78 (1)，pp.1-6，October 2009.
福田隆，市川治，西村雅史，“長時間スペクトル変動情報と調波構造特徴量を併用した発話区間検出法の評価と考察，” 日本音響学会2009年秋季研究発表会講演論文集，1-1-13，pp.39-42，September 2009．
市川治，福田隆，西村雅史，“残響にロバストな音声認識のための動的特徴量，” 日本音響学会講2009年秋季研究発表会講演論文集，1-1-9, pp.27-30，September 2009．
福田隆，市川治，西村雅史，“短・長スペクトル変動を考慮した雑音に頑健な音声認識，” 日本音響学会2009年春季研究発表会講演論文集，1-5-3，pp.7-10，March 2009．
福田隆，市川治，西村雅史，“長時間スペクトル変動情報と調波構造特徴量を併用した発話区間検出法，” 情報処理学会研究報告（音声言語情報処理），2008-SLP-73 (1)，pp.1-6，October 2008．
福田隆，市川治，西村雅史，“耐雑音性の高い発話区間検出のための調波構造に基づく音声特徴量，” 日本音響学会2008年秋季研究発表会講演論文集，1-1-11，pp.25-26，September 2008．
市川治，福田隆，西村雅史，“Local Peak Weighted CSP による方向推定の改善，” 日本音響学会講2008年秋季研究発表会講演論文集，3-P-26, pp.821-822，September 2008．
福田隆，市川治，西村雅史，“長時間スペクトル変動を考慮した低S/N環境下における発話区間検出法，” 日本音響学会2008年春季研究発表会講演論文集，1-10-6，pp. 19-20，March 2008．
市川治，福田隆，西村雅史，“調波構造のローカルピーク強調によるF0抽出不要な音声強調法，” 日本音響学会講2007年秋季研究発表会講演論文集，1-P-24，pp.185-186, September 2007．
福田隆，市川治，西村雅史，“長時間スペクトル変動を考慮した音声特徴量の検討，” 日本音響学会2007年春季研究発表会講演論文集，1-P-1，pp.125-126, March 2007．
福田隆，市川治，西村雅史，“発話末尾残響区間推定に基づく低コストなフィルタ係数決定法，” 日本音響学会2006年秋季研究発表会講演論文集，2-P-1，pp.95-96, September 2006．
福田隆，新田恒雄，“声質差と背景雑音に起因する音声パターン変動の正準化方式,” 日本音響学会2005年春季研究発表会講演論文集，Vol. I，1-5-12，pp.23-24，March 2005．
毛呂良寛，池谷春生，福田隆，山田博文，桂田浩一，新田恒雄，“キーワード検出に基づく対話音声認識用言語モデルの比較” 日本音響学会講2005年春季研究発表会講演論文集，Vol. I，2-5-7，pp.67-68，March 2005．
福田隆，新田恒雄，“背景雑音を対象とした特徴パラメータ正準化法，” 電子情報通信学会技術研究報告（音声），SP2004-118, pp.133-138, December 2004．
池谷春生，福田隆，山田博文，桂田浩一，新田恒雄，“意味属性を利用したクラスN-gram言語モデルの評価，” 電子情報通信学会技術研究報告（音声），SP2004-101，pp.31-36，December 2004．
Muhammad Ghulam, Takashi Fukuda, Junsei Horikawa, Tsuneo Nitta, “Embedding Auditory Masking into the Pitch-Synchronous ZCPA (PS-ZCPA)-based Feature Extractor,” 電子情報通信学会技術研究報告（音声），SP2004-80，pp.53-58，November 2004.
福田隆，新田恒雄，“音声認識のための特徴パラメータ正準化法” 日本音響学会2004年秋季研究発表会講演論文集，Vol. I，2-1-14，pp.63-64，September 2004.
池谷春生，福田隆，山田博文，桂田浩一，新田恒雄，“意味属性を利用したクラスN-gram言語モデルの検討，” 日本音響学会講2004年秋季研究発表会講演論文集，Vol. I，2-1-6，pp.47-48，September 2004．
福田隆，新田恒雄，“音声認識のための特徴パラメータ正準化法の検討” 電子情報通信学会技術研究報告（音声），SP2004-13，pp.19-24，May 2004.
福田隆，新田恒雄，“直交化音素弁別特徴のAURORA-2Jによる評価，” 日本音響学会2004年春季研究発表会講演論文集，Vol. I，2-8-16，pp.91-92，March 2004．
福田隆，新田恒雄，“音声認識のための音素弁別特徴抽出器の改良，” 日本音響学会2004年春季研究発表会講演論文集，Vol. I，1-8-5，pp.9-10，March 2004．
伊勢路真吾，福田隆，山田博文，桂田浩一，新田恒雄，“日本語短・長音節単位の認識結果を用いた対話音声中のキーワード検出，” 日本音響学会2004年春季研究発表会講演論文集，3-Q-34，pp.211-212，March 2004．
福田隆，新田恒雄，“頑健な音声認識のための音素弁別特徴ベクトル直交化方式の検討，” 電子情報通信学会技術研究報告（音声），SP2003-133，pp.121-126，December 2003．
伊勢路真吾，福田隆，山田博文，桂田浩一，新田恒雄，“音素弁別特徴間距離に基づくキーワード検出におけるモーラ単位サブワード言語モデルの検討，” 電子情報通信学会技術研究報告（音声），SP2003-140，pp.163-168，December 2003-12．
福田隆，新田恒雄，“頑健な音声認識のためのバランスを考慮した日本語音素弁別特徴セットの検討，” 日本音響学会 2003年秋季研究発表会講演論文集，1-6-5, pp.9-10，September 2003．
伊勢路真吾，福田隆，山田博文，桂田浩一，新田恒雄，“N-best出力と音素弁別特徴を利用した対話音声認識の検討，” 日本音響学会2003年秋季研究発表会講演論文集，Vol. I，1-6-27，pp.53-54，September 2003．
福田隆，新田恒雄，“直交化音素弁別特徴ベクトルを利用した雑音に頑健な音声認識，” 情報処理学会研究報告（音声言語情報処理），2003-SLP-47 (15)，pp.77-82，July 2003．
福田隆，新田恒雄，“音素弁別特徴ベクトルの対数正規分布近似を利用した雑音環境下音声認識，” 電子情報通信学会技術研究報告（音声），SP2003-23，pp.19-24，May 2003．
伊勢路真吾，福田隆，山田博文，桂田浩一，新田恒雄，“音素弁別特徴を用いた頑健な対話音声認識－　モーラ単位サブワードモデルの検討，” 電子情報通信学会技術研究報告（音声），SP2003-24，pp.25-30，May 2003-5．
福田隆，山本航，新田恒雄，“音素弁別特徴ベクトルの対数正規分布を利用した頑健な音声認識の検討，” 日本音響学会2003年春季研究発表会講演論文集，Vol. I，3-Q-2，pp.155-156，March 2003．
伊勢路真吾，福田隆，山田博文，桂田浩一，新田恒雄，“音素弁別特徴ベクトルを利用した自由発話音声認識における距離補正の役割，” 日本音響学会2003年春季研究発表会講演論文集，Vol. I，2-4-12，pp.81-82，March 2003．
福田隆，山本航，新田恒雄，“音素弁別特徴ベクトルを用いた頑健な音声認識に関する検討，” 電子情報通信学会技術研究報告（音声），SP2002-121，pp.1-6，December 2002．
伊勢路真吾，福田隆，桂田浩一，新田恒雄，“0-gram汎用LVCSRと音素弁別特徴ベクトルを利用した対話音声認識の検討，” 電子情報通信学会技術研究報告（音声），SP2002-156，pp.49-54，December 2002-12．
Muhammad Ghulam，Takashi Fukuda，Tsuneo Nitta，“Normalizing acoustic qualities of mono-phones in an utterance，” 電子情報通信学会技術研究報告（音声），SP2002-122，pp.7-12，December 2002-12．
福田隆，山本航，新田恒雄，“弁別的特徴ベクトルを用いた音声認識に関する検討，” 日本音響学会2002年秋季研究発表会講演論文集，Vol. I，1-9-1，pp.1-2，September 2002．
伊勢路真吾，福田隆，桂田浩一，新田恒雄，“0-gram汎用LVCSRと音素弁別特徴ベクトルを利用した対話音声認識の検討，” 日本音響学会2002年秋季研究発表会講演論文集，Vol. I，2-9-11，pp.83-84，September 2002-9．
Muhammad Ghulam, Takashi Fukuda, Tsuneo Nitta, “An HMM-SM Based Speaker-Independent Connected Digit Recognition System by Using Normalized Confidence Measure,” 日本音響学会2002年秋季研究発表会講演論文集，Vol. I, 1-9-31, pp.61-62, September 2002-9.
福田隆，新田恒雄，“音声認識の前処理としてのCMNと修正CMNの性能比較，” 電子情報通信学会技術研究報告（音声），SP2002-43，pp.7-12，June 2002．
Muhammad Ghulam, Takaharu Sato, Takashi Fukuda, Tsuneo Nitta, “Confidence Scoring for Accurate HMM-based Speech Recognition by Using Monophone-Level Normalization based on Subspace Method,” 電子情報通信学会技術研究報告（音声），SP2002-41, pp.31-36, June 2002-6.
新田恒雄，浅見弘道，伊勢路真吾，福田隆，桂田浩一，“汎用LVCSRを用いた対話音声の認識，” 情報処理学会研究報告（音声言語情報処理），2002-SLP-41，pp.69-74，2002-5．
福田隆，新田恒雄，“音韻的偏りに対する推定信頼度を用いたCMN制御，” 日本音響学会2002年春季研究発表会講演論文集，Vol. I，1-5-1，pp.1-2，March 2002-3．
福田隆，石川恵美子，新田恒雄，“交差する複数話者音声分離に関する検討，” 日本音響学会2002年春季研究発表会講演論文集，Vol. I，2-2-8，pp.71-72，March 2002．
佐藤隆治，Muhammad Ghulam，福田隆，新田恒雄，“尤度正規化手法を用いたHMM－SMハイブリッド音声認識の検討，” 日本音響学会2002年春季研究発表会講演論文集，Vol. I，2-5-7，pp.87-88，March 2002-3．
浅見弘道，福田隆，桂田浩一，新田恒雄，“汎用LVCSRを用いた対話音声の認識について，” 日本音響学会2002年春季研究発表会講演論文集，Vol. I，1-5-25，pp.49-50，March 2002．
福田隆，新田恒雄，“単語・文音声双方に高い認識性能を持つ周辺特徴抽出方式，” 電子情報通信学会技術研究報告（音声），SP2001-85，pp.7-12，December 2001-12．
福田隆，新田恒雄，“単語および文音声認識における周辺特徴の適用比較，” 日本音響学会2001年秋季研究発表会講演論文集，Vol. I，1-1-1，pp.1-2，October 2001．
福田隆，新田恒雄，“周辺特徴と音声認識における役割，” 日本音響学会2001年春季研究発表会講演論文集，Vol. I，3-3-18，pp.129-130，March 2001．
新田恒雄，福田隆，“音声認識のための局所特徴とケプストラム領域表現について，” 日本音響学会2001年春季研究発表会講演論文集，Vol. I，3-3-19，pp.131-132，March 2001-3．
福田隆，瀧川正史，新田恒雄，“音声認識のための周辺特徴の検討，” 電子情報通信学会技術研究報告（音声），SP2000-76，pp.7-12，December 2000．
瀧川正史，福田隆，新田恒雄，“音声認識用周辺特徴パラメータの検討，” 日本音響学会2000年秋季研究発表会講演論文集，Vol. I，2-5-5，pp.59-60，October 2000．

Publications

Improving End-to-end Mixed-case ASR with Knowledge Distillation and Integration of Voice Activity Cues
- - Sashi Novitasari
  - Takashi Fukuda
  - et al.
- 2025
- INTERSPEECH 2025
Voice Activity-based Text Segmentation for ASR Text Denormalization
- - Sashi Novitasari
  - Takashi Fukuda
  - et al.
- 2025
- INTERSPEECH 2025
Knowledge Distillation Based Training of Unified Conformer CTC Models for Multi-form ASR
- - Takashi Fukuda
  - Gakuto Kurata
  - et al.
- 2025
- ICASSP 2025
Effective Training of RNN Transducer Models on Diverse Sources of Speech and Text Data
- - Takashi Fukuda
  - Samuel Thomas
- 2023
- ICASSP 2023
Improving ASR Robustness in Noisy Condition Through VAD Integration
- - Sashi Novitasari
  - Takashi Fukuda
  - et al.
- 2022
- INTERSPEECH 2022
Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing
- - Xiaodong Cui
  - George Saon
  - et al.
- 2022
- INTERSPEECH 2022
Global RNN Transducer Models For Multi-dialect Speech Recognition
- - Takashi Fukuda
  - Samuel Thomas
  - et al.
- 2022
- INTERSPEECH 2022
Knowledge distillation based training of universal ASR source models for cross-lingual transfer
- - Takashi Fukuda
  - Samuel Thomas
- 2021
- INTERSPEECH 2021
Generalized Knowledge Distillation from An Ensemble of Specialized Teachers Leveraging Unsupervised Neural Clustering
- - Takashi Fukuda
  - Gakuto Kurata
- 2021
- ICASSP 2021
Implicit transfer of privileged acoustic information in a generalized knowledge distillation framework
- - Takashi Fukuda
  - Samuel Thomas
- 2020
- INTERSPEECH 2020

Projects

Top collaborators

Takashi Fukuda

Title

Bio

Publications

Improving End-to-end Mixed-case ASR with Knowledge Distillation and Integration of Voice Activity Cues

Voice Activity-based Text Segmentation for ASR Text Denormalization

Knowledge Distillation Based Training of Unified Conformer CTC Models for Multi-form ASR

Effective Training of RNN Transducer Models on Diverse Sources of Speech and Text Data

Improving ASR Robustness in Noisy Condition Through VAD Integration

Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing

Global RNN Transducer Models For Multi-dialect Speech Recognition

Knowledge distillation based training of universal ASR source models for cross-lingual transfer

Generalized Knowledge Distillation from An Ensemble of Specialized Teachers Leveraging Unsupervised Neural Clustering

Implicit transfer of privileged acoustic information in a generalized knowledge distillation framework

Patents

Speech Collecting Method, System And Program Product

Feature Extractor For Robust Automatic Speech Recognition In Reverberant And Noisy Environment

Speech Feature Extractor Apparatus, Speech Feature Extraction Method, And Speech Feature Extraction Program

Feature Extractor For Robust Automatic Speech Recognition In Reverberant And Noisy Environment

Virtual Space System, Method And Program

System, Method And Program For Speech Processing

Virtual Space System, Method And Program

Method For Synchronizing Data Stream Of Content With Meta Data And Apparatus For The Same

Recording System With Improved Suppression Of Interfering Talker

An Automatic Method To Synchronize The Time-line Of Video With Its Audio Feature Quantity

Projects

AI in Tokyo

Speech Technologies

Top collaborators

Gakuto Kurata

Samuel Thomas

Xiaodong Cui

George Saon