Tesis de maestría

An explainable autoencoder integrating regression and classification trees for anomaly detection

Loading...
Thumbnail Image

Citation

View formats

Share

Bibliographic managers

Abstract

Anomaly detection, or outlier detection, is a critical field since anomalies are data points that deviate from normal patterns and are used to represent critical information, such as fraud, diseases, or cyber-attacks. These applications are considered high-risk scenarios which involve high-stakes decision-making. Therefore, understanding the reasoning behind machine learning models used in this area has become an essential requirement. Despite its growing importance, explainable outlier detection remains a challenge since improving model accuracy while maintaining explainability creates a significant trade-off. Furthermore, anomaly detection models are mostly designed for one type of data, either numerical or categorical. This represents a disadvantage when both data types are present in the dataset's attributes, as real-world applications often contain, since transforming categorical values to numerical ones, or vice-versa, can produce information loss and reduced performance. In this thesis, we seek to address both challenges by proposing a novel explainable semi-supervised anomaly detection model that integrates classification and regression trees into an autoencoder architecture. We named our proposal: Explainable Outlier Tree-based Encoder (EOTE). EOTE is able to detect anomalies by creating a reconstruction of the input instance based on the relationships between attributes learned from normal samples. The harder it is for EOTE to reconstruct the instance correctly, the higher the probability of being an outlier is given to the instance. We evaluate EOTE against 12 anomaly detection and one-class classifiers across 110 datasets containing attributes of one data type (numerical or nominal) and a mix of both. Our experiments reveal that EOTE is one of the top-performing algorithms at detecting outliers in datasets with only numerical and nominal attributes, as well as datasets with mixed data attributes. Therefore, without sacrificing performance, EOTE is capable of producing interpretable outputs for its classification. This combination makes EOTE a suitable classifier for anomaly detection in high-risk applications.

Description

https://orcid.org/0000-0002-3465-995X

Collections

Loading...

logo

El usuario tiene la obligación de utilizar los servicios y contenidos proporcionados por la Universidad, en particular, los impresos y recursos electrónicos, de conformidad con la legislación vigente y los principios de buena fe y en general usos aceptados, sin contravenir con su realización el orden público, especialmente, en el caso en que, para el adecuado desempeño de su actividad, necesita reproducir, distribuir, comunicar y/o poner a disposición, fragmentos de obras impresas o susceptibles de estar en formato analógico o digital, ya sea en soporte papel o electrónico. Ley 23/2006, de 7 de julio, por la que se modifica el texto revisado de la Ley de Propiedad Intelectual, aprobado

Licencia