November 18, 2024

Impact of the learning set’s size

Author:
Go to Source

Abstract

Learning techniques have proven their capacity to treat large amount of data. Most statistical learning approaches use specific size learning sets and create static models. Withal, in certain some situations such as incremental or active learning the learning process can work with only a smal amount of data. In this case, the search for algorithms capable of producing models with only a few examples begin to be necessary. Generally, the literature relative to classifiers are evaluated according to criteria such as their classification performance, their ability to sort data. But this taxonomy of classifiers can singularly evolve if one is interested in their capabilities in the presence of some few examples. From our point of view, few studies have been carried out on this issue. It is in sense that this paper seeks to study a wider range of learning algorithms as well as data sets in order to show the power of every chosen algorithm that manipulates data. It also appears from this study, problem of algorithm’s choice to process small or large amount of data. And in order to resolve this, we will show that there are algorithms able of generating models with little data. In this case we look to select the smallest amount of data allowing the best learning to be achieved. We also wanted to show that some algorithms are capable of making good predictions with little data that is therefore necessary in order to have the least costly labeling procedure possible. And to concretize this, we will talk first about learning speed and typology of the tested algorithms to know the ability of a classifier to obtain an “interesting” solution to a classification problem using a minimum of examples present in learning, and we will know some various families of classification models based on parameter learning. After that, we will test all the classifiers mentioned previously such as linear and Non-linear classifiers. Then, we will seek to study the behavior these algorithms as a function of learning set’s size trough the experimental protocol in which various datasets will be Splited, manipulated and evaluated from the classification field in order to give results that merge from our experimental protocol. After that, we will discuss the obtained results through the global analysis section, and then conclude with recommendations.

Read more