Citation
Share
Abstract
Currently, there is a growing interest in the development of classifiers based on contrast patterns (CPs); this is partly due to the advantage of them being able to explain a classification result in a language that is easy to understand for an expert. Thorough experiments show that CP- based classifiers, when using contrast patterns extracted by miners based on decision trees, attain accuracies comparable with state-of-the-art classifiers like SVM, k-NN, C4.5, Bagging and Boosting. Existing decision tree-based miners use Univariate Decision Trees (UDTs) to extract CPs. For tree-based classification classifiers based on Multivariate Decision Trees (MDTs) achieve better accuracy than those based on UDTs. This result might be attributable to that MDTs use multivariate relations (e.g., 2height + 3weight > 40) which, in some cases, separate better the classes than the univariate relations (e.g., age > 40) that UDTs use. Our hypothesis runs parallel, but for CP-based classification: using CPs extracted from MDT-based miners, which we call multivariate contrast patterns, a CP-based classifier shall significantly improve on the performance of others based on UDTs. We propose an algorithm to extract, simplify and filter multivariate CPs. We make an empirical study of our proposed algorithm. We use 112 datasets, taking half of the datasets for tuning the parameters of our algorithm. To validate our hypotheses, we use the other half of the datasets as a testing set to compare our algorithm against other state-of-the-art CP miners in terms of quality, and against other state-of-the-art classifiers, in terms of classification performance. The results obtained in the testing set show that the quality of multivariate CPs, in terms of Jaccard, is significantly higher than that of CPs extracted through UDTs (univariate CPs). We also show that the classification results for CP-based classifiers are significantly better when using multivariate CPs than when using univariate CPs; which could be explained by the higher quality of multivariate CPs. The classification results for multivariate CP-based classifiers are also competitive with non-pattern-based state-of-the-art classifiers. Yet, the plus is that multivariate CP-based classifiers provide contrast patterns, which are abstract-level explanations that could help an expert to gain insights in the problem under investigation.
Description
https://orcid.org/0000-0002-3465-995X