Unsupervised Deep Learning Recurrent Model for Audio Fingerprinting

dc.audience.educationlevelInvestigadores/Researcherses_MX
dc.contributor.advisorNolazco Flores, Juan Arturo
dc.contributor.authorBáez Suárez, Abraham
dc.contributor.committeememberVargas Rosales, César Vargas
dc.contributor.committeememberGutiérrez Rodríguez, Andrés Eduardo
dc.contributor.committeememberRodríguez Dagnino, Ramón Martín
dc.contributor.committeememberLoyola González, Octavio
dc.contributor.departmentEscuela de Ingeniería y Cienciases_MX
dc.contributor.institutionCampus Monterreyes_MX
dc.creatorBAEZ SUAREZ, ABRAHAM; 328083es_MX
dc.date.accessioned2020-04-17T16:43:34Z
dc.date.available2020-04-17T16:43:34Z
dc.date.created2020-04-16
dc.date.issued2020-04-16
dc.description.abstractAudio fingerprinting techniques were developed to index and retrieve audio samples by comparing a content-based compact signature of the audio instead of the entire audio sample, thereby reducing memory and computational expense. Different techniques have been applied to create audio fingerprints, however, with the introduction of deep learning, new data-driven unsupervised approaches are available. This doctoral dissertation presents a Sequence-to-Sequence Autoencoder Model for Audio Fingerprinting (SAMAF) which improved hash generation through a novel loss function composed of terms: Mean Square Error, minimizing the reconstruction error; Hash Loss, minimizing the distance between similar hashes and encouraging clustering; and Bitwise Entropy Loss, minimizing the variation inside the clusters. The performance of the model was assessed with a subset of VoxCeleb1 dataset, a "speech in-the-wild" dataset. Furthermore, the model was compared against three baselines: Dejavu, a Shazam-like algorithm; Robust Audio Fingerprinting System (RAFS), a Bit Error Rate (BER) methodology robust to time-frequency distortions and coding/decoding transformations; and Panako, a constellation algorithm-based adding time-frequency distortion resilience. Extensive empirical evidence showed that our approach outperformed all the baselines in the audio identification task and other classification tasks related to the attributes of the audio signal with an economical hash size of either 128 or 256 bits for one second of audio. Additionally, the developed technology was deployed into two 9-1-1 Emergency Operation Centers (EOCs), located in Palm Beach County (PBC) and Greater Harris County (GH), allowing us to evaluate the performance in real-time in an industrial environment.es_MX
dc.description.degreeDoctorado en Tecnologías de la Información y Comunicacioneses_MX
dc.format.mediumTextoes_MX
dc.identificator7||33||3325es_MX
dc.identifier.citationBáez Suárez, A. (2020). Unsupervised Deep Learning Recurrent Model for Audio Fingerprinting (Doctoral Dissertation). Instituto Tecnológico y de Estudios Superiores de Monterrey (ITESM), Monterrey, México. https://hdl.handle.net/11285/636319es_MX
dc.identifier.cvu328083
dc.identifier.doihttps://dl.acm.org/doi/10.1145/3380828
dc.identifier.orcidhttps://orcid.org/0000-0001-8729-0781
dc.identifier.urihttps://hdl.handle.net/11285/636319
dc.language.isoeng
dc.language.isoenges_MX
dc.publisherInstituto Tecnológico y de Estudios Superiores de Monterrey
dc.relationDepartment of Homeland Security (DHS) D15PC00185es_MX
dc.relationConsejo Nacional de Ciencia y Tecnología (CONACYT) 328083es_MX
dc.relationNorth Atlantic Treaty Organization (NATO) G4919es_MX
dc.relation.impreso2020-04-15
dc.relation.isFormatOfversión publicadaes_MX
dc.relation.isreferencedbyREPOSITORIO NACIONAL CONACYT
dc.rightsopenAccesses_MX
dc.rightsopenAccess
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0*
dc.subject.classificationINGENIERÍA Y TECNOLOGÍA::CIENCIAS TECNOLÓGICAS::TECNOLOGÍA DE LAS TELECOMUNICACIONESes_MX
dc.subject.keywordArtificial Intellligencees_MX
dc.subject.keywordMachine Learninges_MX
dc.subject.keywordDeep Learninges_MX
dc.subject.keywordUnsupervised Learninges_MX
dc.subject.keywordSequence-to-Sequence Autoencoderes_MX
dc.subject.keywordAudio Fingerprintinges_MX
dc.subject.keywordAudio Identificationes_MX
dc.subject.keywordMusic Information Retrievales_MX
dc.subject.lcshTechnologyes_MX
dc.titleUnsupervised Deep Learning Recurrent Model for Audio Fingerprintinges_MX
dc.typeTesis de doctorado

Files

Original bundle

Now showing 1 - 4 of 4
Loading...
Thumbnail Image
Name:
DecAcuerdoUsoObra.pdf
Size:
64.41 KB
Format:
Adobe Portable Document Format
Description:
Declaración de Acuerdo para Uso de Obra
Loading...
Thumbnail Image
Name:
BaezSuarezTesisDoctoradoPDFA.pdf
Size:
1.93 MB
Format:
Adobe Portable Document Format
Description:
Tesis Doctorado
Loading...
Thumbnail Image
Name:
BaezSuarez_HojaDeFirmasPDFA.pdf
Size:
182.2 KB
Format:
Adobe Portable Document Format
Description:
Hoja de Firmas
Loading...
Thumbnail Image
Name:
BaezSuarezDeclaracionAutoriaPDFA.pdf
Size:
82.55 KB
Format:
Adobe Portable Document Format
Description:
Declaracion de autoria

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.3 KB
Format:
Item-specific license agreed upon to submission
Description:
logo

El usuario tiene la obligación de utilizar los servicios y contenidos proporcionados por la Universidad, en particular, los impresos y recursos electrónicos, de conformidad con la legislación vigente y los principios de buena fe y en general usos aceptados, sin contravenir con su realización el orden público, especialmente, en el caso en que, para el adecuado desempeño de su actividad, necesita reproducir, distribuir, comunicar y/o poner a disposición, fragmentos de obras impresas o susceptibles de estar en formato analógico o digital, ya sea en soporte papel o electrónico. Ley 23/2006, de 7 de julio, por la que se modifica el texto revisado de la Ley de Propiedad Intelectual, aprobado

DSpace software copyright © 2002-2026

Licencia