2020 |
Rodríguez, Juan José; Díez-Pastor, José Francisco; Arnaiz-González, Álvar; Kuncheva, Ludmila I Random Balance ensembles for multiclass imbalance learning Journal Article In: Knowledge-Based Systems, 2020, ISSN: 0950-7051. Abstract | Links | BibTeX | Tags: Classifier ensembles, Imbalanced data, Multiclass classification @article{Rodríguez2019, title = {Random Balance ensembles for multiclass imbalance learning}, author = {Juan José Rodríguez and José Francisco Díez-Pastor and Álvar Arnaiz-González and Ludmila I Kuncheva}, url = {https://www.sciencedirect.com/science/article/pii/S0950705119306598}, doi = {10.1016/j.knosys.2019.105434}, issn = {0950-7051}, year = {2020}, date = {2020-04-06}, journal = {Knowledge-Based Systems}, abstract = {Random Balance strategy (RandBal) has been recently proposed for constructing classifier ensembles for imbalanced, two-class data sets. In RandBal, each base classifier is trained with a sample of the data with a random class prevalence, independent of the a priori distribution. Hence, for each sample, one of the classes will be undersampled while the other will be oversampled. RandBal can be applied on its own or can be combined with any other ensemble method. One particularly successful variant is RandBalBoost which integrates Random Balance and boosting. Encouraged by the success of RandBal, this work proposes two approaches which extend RandBal to multiclass imbalance problems. Multiclass imbalance implies that at least two classes have substantially different proportion of instances. In the first approach proposed here, termed Multiple Random Balance (MultiRandBal), we deal with all classes simultaneously. The training data for each base classifier are sampled with random class proportions. The second approach we propose decomposes the multiclass problem into two-class problems using one-vs-one or one-vs-all, and builds an ensemble of RandBal ensembles. We call the two versions of the second approach OVO-RandBal and OVA-RandBal, respectively. These two approaches were chosen because they are the most straightforward extensions of RandBal for multiple classes. Our main objective is to evaluate both approaches for multiclass imbalanced problems. To this end, an experiment was carried out with 52 multiclass data sets. The results suggest that both MultiRandBal, and OVO/OVA-RandBal are viable extensions of the original two-class RandBal. Collectively, they consistently outperform acclaimed state-of-the art methods for multiclass imbalanced problems.}, keywords = {Classifier ensembles, Imbalanced data, Multiclass classification}, pubstate = {published}, tppubtype = {article} } Random Balance strategy (RandBal) has been recently proposed for constructing classifier ensembles for imbalanced, two-class data sets. In RandBal, each base classifier is trained with a sample of the data with a random class prevalence, independent of the a priori distribution. Hence, for each sample, one of the classes will be undersampled while the other will be oversampled. RandBal can be applied on its own or can be combined with any other ensemble method. One particularly successful variant is RandBalBoost which integrates Random Balance and boosting. Encouraged by the success of RandBal, this work proposes two approaches which extend RandBal to multiclass imbalance problems. Multiclass imbalance implies that at least two classes have substantially different proportion of instances. In the first approach proposed here, termed Multiple Random Balance (MultiRandBal), we deal with all classes simultaneously. The training data for each base classifier are sampled with random class proportions. The second approach we propose decomposes the multiclass problem into two-class problems using one-vs-one or one-vs-all, and builds an ensemble of RandBal ensembles. We call the two versions of the second approach OVO-RandBal and OVA-RandBal, respectively. These two approaches were chosen because they are the most straightforward extensions of RandBal for multiple classes. Our main objective is to evaluate both approaches for multiclass imbalanced problems. To this end, an experiment was carried out with 52 multiclass data sets. The results suggest that both MultiRandBal, and OVO/OVA-RandBal are viable extensions of the original two-class RandBal. Collectively, they consistently outperform acclaimed state-of-the art methods for multiclass imbalanced problems. |
2015 |
Díez-Pastor, José Francisco; Rodríguez, Juan José; García-Osorio, César; Kuncheva, Ludmila I Random Balance: Ensembles of variable priors classifiers for imbalanced data Journal Article In: Knowledge-Based Systems, 85 , pp. 96-111, 2015, ISSN: 0950-7051. Abstract | Links | BibTeX | Tags: AdaBoost, Bagging, Class-imbalanced problems, Classifier ensembles, Data Mining, Ensemble methods, SMOTE, Undersampling @article{RandomBalance, title = {Random Balance: Ensembles of variable priors classifiers for imbalanced data}, author = {José Francisco Díez-Pastor and Juan José Rodríguez and César García-Osorio and Ludmila I Kuncheva}, url = {http://www.sciencedirect.com/science/article/pii/S0950705115001720}, doi = {10.1016/j.knosys.2015.04.022}, issn = {0950-7051}, year = {2015}, date = {2015-01-01}, journal = {Knowledge-Based Systems}, volume = {85}, pages = {96-111}, abstract = {Abstract In Machine Learning, a data set is imbalanced when the class proportions are highly skewed. Class-imbalanced problems sets arise routinely in many application domains and pose a challenge to traditional classifiers. We propose a new approach to building ensembles of classifiers for two-class imbalanced data sets, called Random Balance. Each member of the Random Balance ensemble is trained with data sampled from the training set and augmented by artificial instances obtained using SMOTE. The novelty in the approach is that the proportions of the classes for each ensemble member are chosen randomly. The intuition behind the method is that the proposed diversity heuristic will ensure that the ensemble contains classifiers that are specialized for different operating points on the ROC space, thereby leading to larger AUC compared to other ensembles of classifiers. Experiments have been carried out to test the Random Balance approach by itself, and also in combination with standard ensemble methods. As a result, we propose a new ensemble creation method called RB-Boost which combines Random Balance with AdaBoost.M2. This combination involves enforcing random class proportions in addition to instance re-weighting. Experiments with 86 imbalanced data sets from two well known repositories demonstrate the advantage of the Random Balance approach.}, keywords = {AdaBoost, Bagging, Class-imbalanced problems, Classifier ensembles, Data Mining, Ensemble methods, SMOTE, Undersampling}, pubstate = {published}, tppubtype = {article} } Abstract In Machine Learning, a data set is imbalanced when the class proportions are highly skewed. Class-imbalanced problems sets arise routinely in many application domains and pose a challenge to traditional classifiers. We propose a new approach to building ensembles of classifiers for two-class imbalanced data sets, called Random Balance. Each member of the Random Balance ensemble is trained with data sampled from the training set and augmented by artificial instances obtained using SMOTE. The novelty in the approach is that the proportions of the classes for each ensemble member are chosen randomly. The intuition behind the method is that the proposed diversity heuristic will ensure that the ensemble contains classifiers that are specialized for different operating points on the ROC space, thereby leading to larger AUC compared to other ensembles of classifiers. Experiments have been carried out to test the Random Balance approach by itself, and also in combination with standard ensemble methods. As a result, we propose a new ensemble creation method called RB-Boost which combines Random Balance with AdaBoost.M2. This combination involves enforcing random class proportions in addition to instance re-weighting. Experiments with 86 imbalanced data sets from two well known repositories demonstrate the advantage of the Random Balance approach. |
Díez-Pastor, José Francisco; Rodríguez, Juan José; García-Osorio, César; Kuncheva, Ludmila I Diversity techniques improve the performance of the best imbalance learning ensembles Journal Article In: Information Sciences, 325 , pp. 98 - 117, 2015, ISSN: 0020-0255. Abstract | Links | BibTeX | Tags: Class-imbalanced problems, Classifier ensembles, Data Mining, Diversity, Ensemble methods, Rotation forest, SMOTE, Undersampling @article{DiezPastor201598, title = {Diversity techniques improve the performance of the best imbalance learning ensembles}, author = {José Francisco Díez-Pastor and Juan José Rodríguez and César García-Osorio and Ludmila I Kuncheva}, url = {http://www.sciencedirect.com/science/article/pii/S0020025515005186}, doi = {10.1016/j.ins.2015.07.025}, issn = {0020-0255}, year = {2015}, date = {2015-01-01}, journal = {Information Sciences}, volume = {325}, pages = {98 - 117}, abstract = {Abstract Many real-life problems can be described as unbalanced, where the number of instances belonging to one of the classes is much larger than the numbers in other classes. Examples are spam detection, credit card fraud detection or medical diagnosis. Ensembles of classifiers have acquired popularity in this kind of problems for their ability to obtain better results than individual classifiers. The most commonly used techniques by those ensembles especially designed to deal with imbalanced problems are for example Re-weighting, Oversampling and Undersampling. Other techniques, originally intended to increase the ensemble diversity, have not been systematically studied for their effect on imbalanced problems. Among these are Random Oracles, Disturbing Neighbors, Random Feature Weights or Rotation Forest. This paper presents an overview and an experimental study of various ensemble-based methods for imbalanced problems, the methods have been tested in its original form and in conjunction with several diversity-increasing techniques, using 84 imbalanced data sets from two well known repositories. This paper shows that these diversity-increasing techniques significantly improve the performance of ensemble methods for imbalanced problems and provides some ideas about when it is more convenient to use these diversifying techniques.}, keywords = {Class-imbalanced problems, Classifier ensembles, Data Mining, Diversity, Ensemble methods, Rotation forest, SMOTE, Undersampling}, pubstate = {published}, tppubtype = {article} } Abstract Many real-life problems can be described as unbalanced, where the number of instances belonging to one of the classes is much larger than the numbers in other classes. Examples are spam detection, credit card fraud detection or medical diagnosis. Ensembles of classifiers have acquired popularity in this kind of problems for their ability to obtain better results than individual classifiers. The most commonly used techniques by those ensembles especially designed to deal with imbalanced problems are for example Re-weighting, Oversampling and Undersampling. Other techniques, originally intended to increase the ensemble diversity, have not been systematically studied for their effect on imbalanced problems. Among these are Random Oracles, Disturbing Neighbors, Random Feature Weights or Rotation Forest. This paper presents an overview and an experimental study of various ensemble-based methods for imbalanced problems, the methods have been tested in its original form and in conjunction with several diversity-increasing techniques, using 84 imbalanced data sets from two well known repositories. This paper shows that these diversity-increasing techniques significantly improve the performance of ensemble methods for imbalanced problems and provides some ideas about when it is more convenient to use these diversifying techniques. |
2014 |
Díez-Pastor, José Francisco; García-Osorio, César; Rodríguez, Juan José Tree ensemble construction using a GRASP-based heuristic and annealed randomness Journal Article In: Information Fusion, 20 (0), pp. 189–202, 2014, ISSN: 1566-2535. Abstract | Links | BibTeX | Tags: Boosting, Classifier ensembles, Data Mining, Decision trees, Ensemble methods, GRASP metahuristic, Random forest @article{DiezPastor2014, title = {Tree ensemble construction using a GRASP-based heuristic and annealed randomness}, author = {José Francisco Díez-Pastor and César García-Osorio and Juan José Rodríguez}, url = {http://www.sciencedirect.com/science/article/pii/S1566253514000141}, doi = {10.1016/j.inffus.2014.01.009}, issn = {1566-2535}, year = {2014}, date = {2014-01-01}, journal = {Information Fusion}, volume = {20}, number = {0}, pages = {189--202}, abstract = {Abstract Two new methods for tree ensemble construction are presented: G-Forest and GAR-Forest. In a similar way to Random Forest, the tree construction process entails a degree of randomness. The same strategy used in the GRASP metaheuristic for generating random and adaptive solutions is used at each node of the trees. The source of diversity of the ensemble is the randomness of the solution generation method of GRASP. A further key feature of the tree construction method for GAR-Forest is a decreasing level of randomness during the process of constructing the tree: maximum randomness at the root and minimum randomness at the leaves. The method is therefore named ``GAR'', GRASP with annealed randomness. The results conclusively demonstrate that G-Forest and GAR-Forest outperform Bagging, AdaBoost, MultiBoost, Random Forest and Random Subspaces. The results are even more convincing in the presence of noise, demonstrating the robustness of the method. The relationship between base classifier accuracy and their diversity is analysed by application of kappa-error diagrams and a variant of these called kappa-error relative movement diagrams.}, keywords = {Boosting, Classifier ensembles, Data Mining, Decision trees, Ensemble methods, GRASP metahuristic, Random forest}, pubstate = {published}, tppubtype = {article} } Abstract Two new methods for tree ensemble construction are presented: G-Forest and GAR-Forest. In a similar way to Random Forest, the tree construction process entails a degree of randomness. The same strategy used in the GRASP metaheuristic for generating random and adaptive solutions is used at each node of the trees. The source of diversity of the ensemble is the randomness of the solution generation method of GRASP. A further key feature of the tree construction method for GAR-Forest is a decreasing level of randomness during the process of constructing the tree: maximum randomness at the root and minimum randomness at the leaves. The method is therefore named ``GAR'', GRASP with annealed randomness. The results conclusively demonstrate that G-Forest and GAR-Forest outperform Bagging, AdaBoost, MultiBoost, Random Forest and Random Subspaces. The results are even more convincing in the presence of noise, demonstrating the robustness of the method. The relationship between base classifier accuracy and their diversity is analysed by application of kappa-error diagrams and a variant of these called kappa-error relative movement diagrams. |
2012 |
Maudes, Jesús; Rodríguez, Juan José; García-Osorio, César; García-Pedrajas, Nicolás Random Feature Weights for Decision Tree Ensemble Construction Journal Article In: Information Fusion, 13 (1), pp. 20-30, 2012, ISSN: 1566-2535. Links | BibTeX | Tags: Bagging, Boosting, Classifier ensembles, Data Mining, Decision trees, Ensemble methods, Random forest @article{RFW2012, title = {Random Feature Weights for Decision Tree Ensemble Construction}, author = {Jesús Maudes and Juan José Rodríguez and César García-Osorio and Nicolás García-Pedrajas}, doi = {10.1016/j.inffus.2010.11.004}, issn = {1566-2535}, year = {2012}, date = {2012-01-01}, journal = {Information Fusion}, volume = {13}, number = {1}, pages = {20-30}, keywords = {Bagging, Boosting, Classifier ensembles, Data Mining, Decision trees, Ensemble methods, Random forest}, pubstate = {published}, tppubtype = {article} } |
2009 |
Maudes, Jesús; Rodríguez, Juan José; García-Osorio, César Disturbing Neighbors Ensembles for Linear SVM Inproceedings In: Benediktsson, Jon Atli; Kittler, Josef; Roli, Fabio (Ed.): 8th International Workshop on Multiple Classifier Systems, MCS 2009, pp. 191–200, Springer-Verlag, Reykjavik, Iceland, 2009, ISBN: 978-3-642-02325-5. Links | BibTeX | Tags: Classifier ensembles, Data Mining, Decision trees, Disturbing neighbors, Ensemble methods, Support vector machines @inproceedings{MRG09a, title = {Disturbing Neighbors Ensembles for Linear SVM}, author = {Jesús Maudes and Juan José Rodríguez and César García-Osorio}, editor = {Jon Atli Benediktsson and Josef Kittler and Fabio Roli}, doi = {10.1007/978-3-642-02326-2_20}, isbn = {978-3-642-02325-5}, year = {2009}, date = {2009-01-01}, booktitle = {8th International Workshop on Multiple Classifier Systems, MCS 2009}, volume = {5519}, pages = {191--200}, publisher = {Springer-Verlag}, address = {Reykjavik, Iceland}, series = {Lecture Notes in Computer Science}, keywords = {Classifier ensembles, Data Mining, Decision trees, Disturbing neighbors, Ensemble methods, Support vector machines}, pubstate = {published}, tppubtype = {inproceedings} } |
2008 |
García-Osorio, César; García-Pedrajas, Nicolás Constructing ensembles of classifiers using linear projections based on misclassified instances Inproceedings In: Verleysen, Michel (Ed.): 16th European Symposium on Artificial Neural Networks (ESANN 2008), pp. 283–288, d-side publications, Bruges, Belgium, 2008, ISBN: 2-930307-08-0. BibTeX | Tags: Classifier ensembles, Data Mining, Ensemble methods, Linear projections @inproceedings{ESANN08, title = {Constructing ensembles of classifiers using linear projections based on misclassified instances}, author = {César García-Osorio and Nicolás García-Pedrajas}, editor = {Michel Verleysen}, isbn = {2-930307-08-0}, year = {2008}, date = {2008-04-01}, booktitle = {16th European Symposium on Artificial Neural Networks (ESANN 2008)}, pages = {283--288}, publisher = {d-side publications}, address = {Bruges, Belgium}, keywords = {Classifier ensembles, Data Mining, Ensemble methods, Linear projections}, pubstate = {published}, tppubtype = {inproceedings} } |
Maudes-Raedo, Jesús; Rodríguez, Juan José; García-Osorio, César Disturbing Neighbors Diversity for Decision Forest Inproceedings In: Valentini, Giorgio; Okun, Oleg (Ed.): Workshop on Supervised and Unsupervised Ensemble Methods and Their Applications (SUEMA 2008), pp. 67–71, Patras, Grecia, 2008, ISBN: 978-84-612-4475-1. BibTeX | Tags: Classifier ensembles, Data Mining, Decision trees, Disturbing neighbors, Ensemble methods @inproceedings{SUEMA2008:DisturbingNeighbors, title = {Disturbing Neighbors Diversity for Decision Forest}, author = {Jesús Maudes-Raedo and Juan José Rodríguez and César García-Osorio}, editor = {Giorgio Valentini and Oleg Okun}, isbn = {978-84-612-4475-1}, year = {2008}, date = {2008-00-01}, booktitle = {Workshop on Supervised and Unsupervised Ensemble Methods and Their Applications (SUEMA 2008)}, pages = {67--71}, address = {Patras, Grecia}, keywords = {Classifier ensembles, Data Mining, Decision trees, Disturbing neighbors, Ensemble methods}, pubstate = {published}, tppubtype = {inproceedings} } |
2007 |
García-Pedrajas, Nicolás; García-Osorio, César; Fyfe, Colin Nonlinear ``boosting'' projections for ensemble construction Journal Article In: Journal of Machine Learning Research, 8 , pp. 1–33, 2007, ISSN: 1532-4435. Abstract | Links | BibTeX | Tags: Boosting, Classifier ensembles, Data Mining, Ensemble methods, Neural networks, Nonlinear projections @article{cgosorio07boosting, title = {Nonlinear ``boosting'' projections for ensemble construction}, author = {Nicolás García-Pedrajas and César García-Osorio and Colin Fyfe}, url = {http://jmlr.csail.mit.edu/papers/volume8/garcia-pedrajas07a/garcia-pedrajas07a.pdf}, issn = {1532-4435}, year = {2007}, date = {2007-01-01}, journal = {Journal of Machine Learning Research}, volume = {8}, pages = {1--33}, abstract = {In this paper we propose a novel approach for ensemble construction based on the use of nonlinear projections to achieve both accuracy and diversity of individual classifiers. The proposed approach combines the philosophy of boosting, putting more effort on difficult instances, with the basis of the random subspace method. Our main contribution is that instead of using a random subspace, we construct a projection taking into account the instances which have posed most difficulties to previous classifiers. In this way, consecutive nonlinear projections are created by a neural network trained using only incorrectly classified instances. The feature subspace induced by the hidden layer of this network is used as the input space to a new classifier. The method is compared with bagging and boosting techniques, showing an improved performance on a large set of 44 problems from the UCI Machine Learning Repository. An additional study showed that the proposed approach is less sensitive to noise in the data than boosting methods.}, keywords = {Boosting, Classifier ensembles, Data Mining, Ensemble methods, Neural networks, Nonlinear projections}, pubstate = {published}, tppubtype = {article} } In this paper we propose a novel approach for ensemble construction based on the use of nonlinear projections to achieve both accuracy and diversity of individual classifiers. The proposed approach combines the philosophy of boosting, putting more effort on difficult instances, with the basis of the random subspace method. Our main contribution is that instead of using a random subspace, we construct a projection taking into account the instances which have posed most difficulties to previous classifiers. In this way, consecutive nonlinear projections are created by a neural network trained using only incorrectly classified instances. The feature subspace induced by the hidden layer of this network is used as the input space to a new classifier. The method is compared with bagging and boosting techniques, showing an improved performance on a large set of 44 problems from the UCI Machine Learning Repository. An additional study showed that the proposed approach is less sensitive to noise in the data than boosting methods. |
García-Pedrajas, Nicolás; Romero-del-Castillo, Juan; García-Osorio, César Boosting $k$-nearest neighbors classifiers by weighted evolutionary instace selection Inproceedings In: Actas de las I Jornadas sobre Algoritmos Evolutivos y Metaheurísticas (JAEM 2007), pp. 301 – 308, Zaragoza, 2007. BibTeX | Tags: Classifier ensembles, Instance selection @inproceedings{JAEM2007, title = {Boosting $k$-nearest neighbors classifiers by weighted evolutionary instace selection}, author = {Nicolás García-Pedrajas and Juan Romero-del-Castillo and César García-Osorio}, year = {2007}, date = {2007-00-01}, booktitle = {Actas de las I Jornadas sobre Algoritmos Evolutivos y Metaheurísticas (JAEM 2007)}, pages = {301 -- 308}, address = {Zaragoza}, keywords = {Classifier ensembles, Instance selection}, pubstate = {published}, tppubtype = {inproceedings} } |
Publications
2020 |
Random Balance ensembles for multiclass imbalance learning Journal Article In: Knowledge-Based Systems, 2020, ISSN: 0950-7051. |
2015 |
Random Balance: Ensembles of variable priors classifiers for imbalanced data Journal Article In: Knowledge-Based Systems, 85 , pp. 96-111, 2015, ISSN: 0950-7051. |
Diversity techniques improve the performance of the best imbalance learning ensembles Journal Article In: Information Sciences, 325 , pp. 98 - 117, 2015, ISSN: 0020-0255. |
2014 |
Tree ensemble construction using a GRASP-based heuristic and annealed randomness Journal Article In: Information Fusion, 20 (0), pp. 189–202, 2014, ISSN: 1566-2535. |
2012 |
Random Feature Weights for Decision Tree Ensemble Construction Journal Article In: Information Fusion, 13 (1), pp. 20-30, 2012, ISSN: 1566-2535. |
2009 |
Disturbing Neighbors Ensembles for Linear SVM Inproceedings In: Benediktsson, Jon Atli; Kittler, Josef; Roli, Fabio (Ed.): 8th International Workshop on Multiple Classifier Systems, MCS 2009, pp. 191–200, Springer-Verlag, Reykjavik, Iceland, 2009, ISBN: 978-3-642-02325-5. |
2008 |
Constructing ensembles of classifiers using linear projections based on misclassified instances Inproceedings In: Verleysen, Michel (Ed.): 16th European Symposium on Artificial Neural Networks (ESANN 2008), pp. 283–288, d-side publications, Bruges, Belgium, 2008, ISBN: 2-930307-08-0. |
Disturbing Neighbors Diversity for Decision Forest Inproceedings In: Valentini, Giorgio; Okun, Oleg (Ed.): Workshop on Supervised and Unsupervised Ensemble Methods and Their Applications (SUEMA 2008), pp. 67–71, Patras, Grecia, 2008, ISBN: 978-84-612-4475-1. |
2007 |
Nonlinear ``boosting'' projections for ensemble construction Journal Article In: Journal of Machine Learning Research, 8 , pp. 1–33, 2007, ISSN: 1532-4435. |
Boosting $k$-nearest neighbors classifiers by weighted evolutionary instace selection Inproceedings In: Actas de las I Jornadas sobre Algoritmos Evolutivos y Metaheurísticas (JAEM 2007), pp. 301 – 308, Zaragoza, 2007. |