Voice fraud mitigation: developing a deep learning system for detecting cloned voices in telephonic communications
Citation
Share
Abstract
This study addresses the increasing threat in recent years of voice fraud by cloned voices in phone calls. This problem can compromise personal security in many aspects. The primary goal of this work is to develop a deep learning-based detection system for distinguishing between real and cloned voices in Spanish, focusing on calls made over telephone lines. To achieve this, a dataset was generated from real and cloned audio samples in Spanish. The audios captured were simulated under various telephone codecs and noise levels. Two deep learning models, a convolutional neural network (which in this project is named Vanilla CNN) and a transfer learning (MobileNetV2) approach, were trained using spectrograms derived from the audio data. The results indicate a high accuracy in identifying real and cloned voices, reaching up to 99.97% accuracy. Also, many validations were performed under different types of noise and codecs included in the dataset. These findings highlight the effectiveness of the proposed architectures. Additionally, an ESP32 audio kit was integrated with Amazon Web Services to implement voice detection during phone calls. This study contributes to voice fraud detection research focused on the Spanish language.
Description
https://orcid.org/0000-0003-3976-4190
Collections
Document viewer
Since the file exceeds 25 MB, to view the file it must be downloaded.