Abstract

Accents are an integral aspect of speech which reflects the cultural, regional, and linguistic diversity. However, differences in accent can make conversation and speech communication difficult, and in global contexts like language learning and human computer interaction. Accent conversion tends to tackle these problems by converting the accent of a speaker into the one of other speakers while preserving original linguistic content and speaker identity. Accent conversion involves various steps such as speech analysis, feature extraction, feature mapping, and speech synthesis. In this study, we present a comprehensive overview of state-of-the-art technologies in accent conversion, from traditional statistical models to modern deep learning approaches. Additionally, we propose a deep learning approach towards accent conversion based on Mel-Frequency Cepstral Coefficients Generative Adversarial Networks (MFCCGAN). Using MFCC as a feature, a feedforward neural network is used to convert the source MFCCs into target MFCCs and a GAN-based vocoder (MFCCGAN) for speech reconstruction. The accent conversion task is applied on Indian English and Scottish English speech using American English speech as target. The evaluation results demonstrated that the converted speech was very much similar to target American English speech, and the identity of source speaker was preserved. The Indian to US English conversion was slightly better than the Scottish to US English conversion. Future improvements will focus on using multiple speakers (with different accents), incorporating different features (pitch, energy, formants) of speech signal, exploring advanced neural network techniques, and development of real-time accent conversion system.

Date of publication

Fall 2025

Document Type

Thesis

Language

english

Persistent identifier

http://hdl.handle.net/10950/4910

Available for download on Friday, December 17, 2027

Share

COinS