2020 |
Díez-Pastor, José Francisco; Latorre-Carmona, Pedro; Arnaiz-González, Álvar; Ruiz-Pérez, Javier; Zurro, Débora “You Are Not My Type”: An Evaluation of Classification Methods for Automatic Phytolith Identification Journal Article In: Microscopy and Microanalysis, 26 , pp. 1158-1167, 2020, ISSN: 1431-9276. Abstract | Links | BibTeX | Tags: Feature extraction, Machine learning, Microfossils, Morphometry, Proxy @article{Díez-Pastor2020, title = {“You Are Not My Type”: An Evaluation of Classification Methods for Automatic Phytolith Identification}, author = {José Francisco Díez-Pastor and Pedro Latorre-Carmona and Álvar Arnaiz-González and Javier Ruiz-Pérez and Débora Zurro}, url = {https://www.cambridge.org/core/journals/microscopy-and-microanalysis/article/you-are-not-my-type-an-evaluation-of-classification-methods-for-automatic-phytolith-identification/48F88E9407086B797BBE383B8BC15904}, doi = {https://doi.org/10.1017/S1431927620024629}, issn = {1431-9276}, year = {2020}, date = {2020-11-10}, journal = {Microscopy and Microanalysis}, volume = {26}, pages = { 1158-1167}, abstract = {Phytoliths can be an important source of information related to environmental and climatic change, as well as to ancient plant use by humans, particularly within the disciplines of paleoecology and archaeology. Currently, phytolith identification and categorization is performed manually by researchers, a time-consuming task liable to misclassifications. The automated classification of phytoliths would allow the standardization of identification processes, avoiding possible biases related to the classification capability of researchers. This paper presents a comparative analysis of six classification methods, using digitized microscopic images to examine the efficacy of different quantitative approaches for characterizing phytoliths. A comprehensive experiment performed on images of 429 phytoliths demonstrated that the automatic phytolith classification is a promising area of research that will help researchers to invest time more efficiently and improve their recognition accuracy rate.}, keywords = {Feature extraction, Machine learning, Microfossils, Morphometry, Proxy}, pubstate = {published}, tppubtype = {article} } Phytoliths can be an important source of information related to environmental and climatic change, as well as to ancient plant use by humans, particularly within the disciplines of paleoecology and archaeology. Currently, phytolith identification and categorization is performed manually by researchers, a time-consuming task liable to misclassifications. The automated classification of phytoliths would allow the standardization of identification processes, avoiding possible biases related to the classification capability of researchers. This paper presents a comparative analysis of six classification methods, using digitized microscopic images to examine the efficacy of different quantitative approaches for characterizing phytoliths. A comprehensive experiment performed on images of 429 phytoliths demonstrated that the automatic phytolith classification is a promising area of research that will help researchers to invest time more efficiently and improve their recognition accuracy rate. |
Bustillo, Andrés; Reis, Roberto; Machado, Alisson R; Pimenov, Danil Yurievich Improving the accuracy of machine-learning models with data from machine test repetitions Journal Article In: Journal of Intelligent Manufacturing, 2020, ISSN: 0956-5515. Abstract | Links | BibTeX | Tags: Artificial intelligence, Brandsma facing tests, Ensembles, Machine learning, Tool geometry, Turning @article{Bustillo2020, title = {Improving the accuracy of machine-learning models with data from machine test repetitions}, author = {Andrés Bustillo and Roberto Reis and Alisson R. Machado and Danil Yurievich Pimenov}, url = {https://link.springer.com/article/10.1007%2Fs10845-020-01661-3}, doi = {https://doi.org/10.1007/s10845-020-01661-3}, issn = {0956-5515}, year = {2020}, date = {2020-09-17}, journal = {Journal of Intelligent Manufacturing}, abstract = {The modelling of machining processes by means of machine-learning algorithms is still based on principles that are especially adapted to mechanical approaches, in which very few inputs are varied with little repetition of experimental conditions. These principles might not be ideal to achieve accurate machine-learning models and they are certainly not aligned with the practicalities of industrial machining in factories. In this research the effect of a new strategy to improve machine-learning model accuracy is studied: experimental repetition. Tool-life prediction in the face-turning operations of AISI 1045 steel discs, depending on different cooling systems and tool geometries, is selected as a case study. Both the side rake and the relief angles of HSS tools are optimized using the Brandsma facing test under dry, MQL, and flooding conditions. Different machine-learning algorithms, such as regression trees, kNNs, artificial neural networks, and ensembles (bagging and Random Forest) are tested. On the one hand, the results of the study showed that artificial neural networks of Radial Basis Functions presented the highest model accuracy (11.4 mm RMSE), but required a very sensitive and complex tuning process. On the other hand, they demonstrated that ensembles, especially Random Forest, provided models with accuracy in the same range, but with no tuning procedure (12.8 mm RMSE). Secondly, the effect of an increased dataset size, by means of experimental repetition, is evaluated and compared with traditional experimental modelling that used average values. The results showed that some machine-learning techniques, including both ensemble types, significantly improved their accuracy with this strategy, by up to 23%. The results therefore suggested that the use of raw experimental data, rather than their averaged values, can achieve machine-learning models of higher accuracy for tool-wear processes.}, keywords = {Artificial intelligence, Brandsma facing tests, Ensembles, Machine learning, Tool geometry, Turning}, pubstate = {published}, tppubtype = {article} } The modelling of machining processes by means of machine-learning algorithms is still based on principles that are especially adapted to mechanical approaches, in which very few inputs are varied with little repetition of experimental conditions. These principles might not be ideal to achieve accurate machine-learning models and they are certainly not aligned with the practicalities of industrial machining in factories. In this research the effect of a new strategy to improve machine-learning model accuracy is studied: experimental repetition. Tool-life prediction in the face-turning operations of AISI 1045 steel discs, depending on different cooling systems and tool geometries, is selected as a case study. Both the side rake and the relief angles of HSS tools are optimized using the Brandsma facing test under dry, MQL, and flooding conditions. Different machine-learning algorithms, such as regression trees, kNNs, artificial neural networks, and ensembles (bagging and Random Forest) are tested. On the one hand, the results of the study showed that artificial neural networks of Radial Basis Functions presented the highest model accuracy (11.4 mm RMSE), but required a very sensitive and complex tuning process. On the other hand, they demonstrated that ensembles, especially Random Forest, provided models with accuracy in the same range, but with no tuning procedure (12.8 mm RMSE). Secondly, the effect of an increased dataset size, by means of experimental repetition, is evaluated and compared with traditional experimental modelling that used average values. The results showed that some machine-learning techniques, including both ensemble types, significantly improved their accuracy with this strategy, by up to 23%. The results therefore suggested that the use of raw experimental data, rather than their averaged values, can achieve machine-learning models of higher accuracy for tool-wear processes. |
2018 |
Gunn, Iain A D; Arnaiz-González, Álvar; Kuncheva, Ludmila I A taxonomic look at instance-based stream classifiers Journal Article In: Neurocomputing, 286 , pp. 167-178, 2018, ISSN: 0925-2312. Abstract | Links | BibTeX | Tags: Concept drift, Instance selection, Machine learning, Prototype generation, Stream classification @article{Gunn2018, title = {A taxonomic look at instance-based stream classifiers}, author = {Iain A D Gunn and Álvar Arnaiz-González and Ludmila I Kuncheva}, url = {https://www.sciencedirect.com/science/article/pii/S092523121830095X}, doi = {10.1016/j.neucom.2018.01.062}, issn = {0925-2312}, year = {2018}, date = {2018-04-19}, journal = {Neurocomputing}, volume = {286}, pages = {167-178}, abstract = {Large numbers of data streams are today generated in many fields. A key challenge when learning from such streams is the problem of concept drift. Many methods, including many prototype methods, have been proposed in recent years to address this problem. This paper presents a refined taxonomy of instance selection and generation methods for the classification of data streams subject to concept drift. The taxonomy allows discrimination among a large number of methods which pre-existing taxonomies for offline instance selection methods did not distinguish. This makes possible a valuable new perspective on experimental results, and provides a framework for discussion of the concepts behind different algorithm-design approaches. We review a selection of modern algorithms for the purpose of illustrating the distinctions made by the taxonomy. We present the results of a numerical experiment which examined the performance of a number of representative methods on both synthetic and real-world data sets with and without concept drift, and discuss the implications for the directions of future research in light of the taxonomy. On the basis of the experimental results, we are able to give recommendations for the experimental evaluation of algorithms which may be proposed in the future.}, keywords = {Concept drift, Instance selection, Machine learning, Prototype generation, Stream classification}, pubstate = {published}, tppubtype = {article} } Large numbers of data streams are today generated in many fields. A key challenge when learning from such streams is the problem of concept drift. Many methods, including many prototype methods, have been proposed in recent years to address this problem. This paper presents a refined taxonomy of instance selection and generation methods for the classification of data streams subject to concept drift. The taxonomy allows discrimination among a large number of methods which pre-existing taxonomies for offline instance selection methods did not distinguish. This makes possible a valuable new perspective on experimental results, and provides a framework for discussion of the concepts behind different algorithm-design approaches. We review a selection of modern algorithms for the purpose of illustrating the distinctions made by the taxonomy. We present the results of a numerical experiment which examined the performance of a number of representative methods on both synthetic and real-world data sets with and without concept drift, and discuss the implications for the directions of future research in light of the taxonomy. On the basis of the experimental results, we are able to give recommendations for the experimental evaluation of algorithms which may be proposed in the future. |
Güemes-Peña, Diego; López-Nozal, Carlos; Marticorena-Sánchez, Raúl; Maudes-Raedo, Jesús Emerging topics in mining software repositories Journal Article In: Progress in Artificial Intelligence, pp. 1-11, 2018, ISSN: 2192-6360. Abstract | Links | BibTeX | Tags: Data Mining, Machine learning, Software engineering, Software process, Software repository @article{Güemes-Peña2018, title = {Emerging topics in mining software repositories}, author = {Diego Güemes-Peña and Carlos López-Nozal and Raúl Marticorena-Sánchez and Jesús Maudes-Raedo}, url = {https://link.springer.com/content/pdf/10.1007/s13748-018-0147-7.pdf}, doi = {10.1007/s13748-018-0147-7}, issn = {2192-6360}, year = {2018}, date = {2018-01-01}, journal = {Progress in Artificial Intelligence}, pages = {1-11}, abstract = {A software process is a set of related activities that culminates in the production of a software package: specification, design, implementation, testing, evolution into new versions, and maintenance. There are also other supporting activities such as configuration and change management, quality assurance, project management, evaluation of user experience, etc. Software repositories are infrastructures to support all these activities. They can be composed with several systems that include code change management, bug tracking, code review, build system, release binaries, wikis, forums, etc. This position paper on mining software repositories presents a review and a discussion of research in this field over the past decade. We also identify applied machine learning strategies, current working topics, and future challenges for the improvement of company decision-making systems. Machine learning is defined as the process of discovering patterns in data. It can be applied to software repositories, since every change is recorded as data. Companies can then use these patterns as the basis for their decision-making systems and for knowledge discovery.}, keywords = {Data Mining, Machine learning, Software engineering, Software process, Software repository}, pubstate = {published}, tppubtype = {article} } A software process is a set of related activities that culminates in the production of a software package: specification, design, implementation, testing, evolution into new versions, and maintenance. There are also other supporting activities such as configuration and change management, quality assurance, project management, evaluation of user experience, etc. Software repositories are infrastructures to support all these activities. They can be composed with several systems that include code change management, bug tracking, code review, build system, release binaries, wikis, forums, etc. This position paper on mining software repositories presents a review and a discussion of research in this field over the past decade. We also identify applied machine learning strategies, current working topics, and future challenges for the improvement of company decision-making systems. Machine learning is defined as the process of discovering patterns in data. It can be applied to software repositories, since every change is recorded as data. Companies can then use these patterns as the basis for their decision-making systems and for knowledge discovery. |
Publications
Andrews curves Applied Machine Learning Bagging Boosting Business intelligence Chomsky normal form Class-imbalanced problems Classifier ensembles Cocke-Younger-Kasami algorithm Computer Science teaching Data analysis Data Mining Data visualization Decision trees Disturbing neighbors End of studies project Ensemble methods Ensembles Exploratory data analysis Exploratory projection pursuit Face milling Finite automata Imbalanced data Instance selection Linear projections LL parsing Machine learning Multi-label classification Neural networks Random forest Random oracles Regression Regression ensembles Regression trees Regular expressions Rotation forest Self organizing maps SMOTE Support vector machines surface roughness
2020 |
“You Are Not My Type”: An Evaluation of Classification Methods for Automatic Phytolith Identification Journal Article In: Microscopy and Microanalysis, 26 , pp. 1158-1167, 2020, ISSN: 1431-9276. |
Improving the accuracy of machine-learning models with data from machine test repetitions Journal Article In: Journal of Intelligent Manufacturing, 2020, ISSN: 0956-5515. |
2018 |
A taxonomic look at instance-based stream classifiers Journal Article In: Neurocomputing, 286 , pp. 167-178, 2018, ISSN: 0925-2312. |
Emerging topics in mining software repositories Journal Article In: Progress in Artificial Intelligence, pp. 1-11, 2018, ISSN: 2192-6360. |