Ciencias Exactas y Ciencias de la Salud

Permanent URI for this collectionhttps://hdl.handle.net/11285/551014

Pertenecen a esta colección Tesis y Trabajos de grado de los Doctorados correspondientes a las Escuelas de Ingeniería y Ciencias así como a Medicina y Ciencias de la Salud.

Browse

Search Results

Now showing 1 - 1 of 1
  • Tesis de doctorado
    Unsupervised Deep Learning Recurrent Model for Audio Fingerprinting
    (Instituto Tecnológico y de Estudios Superiores de Monterrey, 2020-04-16) Báez Suárez, Abraham; BAEZ SUAREZ, ABRAHAM; 328083; Nolazco Flores, Juan Arturo; Vargas Rosales, César Vargas; Gutiérrez Rodríguez, Andrés Eduardo; Rodríguez Dagnino, Ramón Martín; Loyola González, Octavio; Escuela de Ingeniería y Ciencias; Campus Monterrey
    Audio fingerprinting techniques were developed to index and retrieve audio samples by comparing a content-based compact signature of the audio instead of the entire audio sample, thereby reducing memory and computational expense. Different techniques have been applied to create audio fingerprints, however, with the introduction of deep learning, new data-driven unsupervised approaches are available. This doctoral dissertation presents a Sequence-to-Sequence Autoencoder Model for Audio Fingerprinting (SAMAF) which improved hash generation through a novel loss function composed of terms: Mean Square Error, minimizing the reconstruction error; Hash Loss, minimizing the distance between similar hashes and encouraging clustering; and Bitwise Entropy Loss, minimizing the variation inside the clusters. The performance of the model was assessed with a subset of VoxCeleb1 dataset, a "speech in-the-wild" dataset. Furthermore, the model was compared against three baselines: Dejavu, a Shazam-like algorithm; Robust Audio Fingerprinting System (RAFS), a Bit Error Rate (BER) methodology robust to time-frequency distortions and coding/decoding transformations; and Panako, a constellation algorithm-based adding time-frequency distortion resilience. Extensive empirical evidence showed that our approach outperformed all the baselines in the audio identification task and other classification tasks related to the attributes of the audio signal with an economical hash size of either 128 or 256 bits for one second of audio. Additionally, the developed technology was deployed into two 9-1-1 Emergency Operation Centers (EOCs), located in Palm Beach County (PBC) and Greater Harris County (GH), allowing us to evaluate the performance in real-time in an industrial environment.
En caso de no especificar algo distinto, estos materiales son compartidos bajo los siguientes términos: Atribución-No comercial-No derivadas CC BY-NC-ND http://www.creativecommons.mx/#licencias
logo

El usuario tiene la obligación de utilizar los servicios y contenidos proporcionados por la Universidad, en particular, los impresos y recursos electrónicos, de conformidad con la legislación vigente y los principios de buena fe y en general usos aceptados, sin contravenir con su realización el orden público, especialmente, en el caso en que, para el adecuado desempeño de su actividad, necesita reproducir, distribuir, comunicar y/o poner a disposición, fragmentos de obras impresas o susceptibles de estar en formato analógico o digital, ya sea en soporte papel o electrónico. Ley 23/2006, de 7 de julio, por la que se modifica el texto revisado de la Ley de Propiedad Intelectual, aprobado

DSpace software copyright © 2002-2026

Licencia