An explainable autoencoder integrating regression and classification trees for anomaly detection

Caballero Dominguez, Zoe

An explainable autoencoder integrating regression and classification trees for anomaly detection

dc.audience.educationlevel	Estudiantes/Students
dc.contributor.advisor	Raúl Monroy Borja, Raúl
dc.contributor.author	Caballero Dominguez, Zoe
dc.contributor.cataloger	mtyahinojosa, emipsanchez
dc.contributor.committeemember	Graff Guerrero, Mario
dc.contributor.committeemember	García Ceja, Enrique Alejandro
dc.contributor.committeemember	González Mendoza, Miguel
dc.contributor.department	Escuela de Ingeniería y Ciencias
dc.contributor.institution	Campus Estado de México
dc.contributor.mentor	Medina Pérez, Miguel Angel
dc.date.accepted	2025-09-05
dc.date.accessioned	2025-12-14T02:24:46Z
dc.date.embargoenddate	2026-12-30
dc.date.issued	2025
dc.description	https://orcid.org/0000-0002-3465-995X
dc.description.abstract	Anomaly detection, or outlier detection, is a critical field since anomalies are data points that deviate from normal patterns and are used to represent critical information, such as fraud, diseases, or cyber-attacks. These applications are considered high-risk scenarios which involve high-stakes decision-making. Therefore, understanding the reasoning behind machine learning models used in this area has become an essential requirement. Despite its growing importance, explainable outlier detection remains a challenge since improving model accuracy while maintaining explainability creates a significant trade-off. Furthermore, anomaly detection models are mostly designed for one type of data, either numerical or categorical. This represents a disadvantage when both data types are present in the dataset's attributes, as real-world applications often contain, since transforming categorical values to numerical ones, or vice-versa, can produce information loss and reduced performance. In this thesis, we seek to address both challenges by proposing a novel explainable semi-supervised anomaly detection model that integrates classification and regression trees into an autoencoder architecture. We named our proposal: Explainable Outlier Tree-based Encoder (EOTE). EOTE is able to detect anomalies by creating a reconstruction of the input instance based on the relationships between attributes learned from normal samples. The harder it is for EOTE to reconstruct the instance correctly, the higher the probability of being an outlier is given to the instance. We evaluate EOTE against 12 anomaly detection and one-class classifiers across 110 datasets containing attributes of one data type (numerical or nominal) and a mix of both. Our experiments reveal that EOTE is one of the top-performing algorithms at detecting outliers in datasets with only numerical and nominal attributes, as well as datasets with mixed data attributes. Therefore, without sacrificing performance, EOTE is capable of producing interpretable outputs for its classification. This combination makes EOTE a suitable classifier for anomaly detection in high-risk applications.
dc.description.degree	Master of Science in Computer Science
dc.format.medium	Texto
dc.identificator	120304\|\|120306\|\|120312\|\|120903\|\|331101
dc.identifier.cvu	1157038
dc.identifier.orcid	https://orcid.org/0009-0003-5472-1516
dc.identifier.uri	https://hdl.handle.net/11285/705236
dc.language.iso	eng
dc.publisher	Instituto Tecnológico y de Estudios Superiores de Monterrey
dc.relation.isFormatOf	acceptedVersion
dc.rights	openAccess
dc.rights.embargoreason	Por política las tesis de Ciencias Exactas y Ciencias de la Salud estarán en embargo por 1 año
dc.rights.uri	http://creativecommons.org/licenses/by/4.0
dc.subject.classification	INGENIERÍA Y TECNOLOGÍA::CIENCIAS TECNOLÓGICAS::TECNOLOGÍA DE LOS ORDENADORES::INTELIGENCIA ARTIFICIAL
dc.subject.classification	CIENCIAS FÍSICO MATEMÁTICAS Y CIENCIAS DE LA TIERRA::MATEMÁTICAS::ESTADÍSTICA::ANÁLISIS DE DATOS
dc.subject.classification	INGENIERÍA Y TECNOLOGÍA::CIENCIAS TECNOLÓGICAS::TECNOLOGÍA DE LOS ORDENADORES
dc.subject.keyword	Anomaly detection
dc.subject.keyword	Outlier detection
dc.subject.keyword	One-class classification
dc.subject.keyword	Explainable artificial intelligence
dc.subject.lcsh	Technology
dc.title	An explainable autoencoder integrating regression and classification trees for anomaly detection
dc.type	Tesis de maestría