Montesinos Silva, Luis ArturoVillicaña Ibargüengoyti, José Rubén2025-01-072024-12-03Villicaña Ibargüengoyti, J. R. (2024). Voice fraud mitigation: developing a deep learning system for detecting cloned voices in telephonic communications [Tesis maestría]. Instituto Tecnológico y de Estudios Superiores de Monterrey. Recuperado de: https://hdl.handle.net/11285/702987https://hdl.handle.net/11285/702987https://doi.org/10.60473/ritec.63https://orcid.org/0000-0003-3976-4190This study addresses the increasing threat in recent years of voice fraud by cloned voices in phone calls. This problem can compromise personal security in many aspects. The primary goal of this work is to develop a deep learning-based detection system for distinguishing between real and cloned voices in Spanish, focusing on calls made over telephone lines. To achieve this, a dataset was generated from real and cloned audio samples in Spanish. The audios captured were simulated under various telephone codecs and noise levels. Two deep learning models, a convolutional neural network (which in this project is named Vanilla CNN) and a transfer learning (MobileNetV2) approach, were trained using spectrograms derived from the audio data. The results indicate a high accuracy in identifying real and cloned voices, reaching up to 99.97% accuracy. Also, many validations were performed under different types of noise and codecs included in the dataset. These findings highlight the effectiveness of the proposed architectures. Additionally, an ESP32 audio kit was integrated with Amazon Web Services to implement voice detection during phone calls. This study contributes to voice fraud detection research focused on the Spanish language.TextoengopenAccesshttp://creativecommons.org/licenses/by/4.0CIENCIAS FÍSICO MATEMÁTICAS Y CIENCIAS DE LA TIERRA::MATEMÁTICAS::CIENCIA DE LOS ORDENADORES::LENGUAJES DE PROGRAMACIÓNTechnologyVoice fraud mitigation: developing a deep learning system for detecting cloned voices in telephonic communicationsTesis de maestríahttps://orcid.org/0009-0007-1061-075XVoice CloningDeep LearningTelephonic CommunicationsFraud DetectionAudio Codification1276822