TY - JOUR
T1 - Comparative analysis of instance selection algorithms for instance-based classifiers in the context of medical decision support
AU - Mazurowski, MacIej A.
AU - Malof, Jordan M.
AU - Tourassi, Georgia D.
PY - 2011/1/21
Y1 - 2011/1/21
N2 - When constructing a pattern classifier, it is important to make best use of the instances (a.k.a. cases, examples, patterns or prototypes) available for its development. In this paper we present an extensive comparative analysis of algorithms that, given a pool of previously acquired instances, attempt to select those that will be the most effective to construct an instance-based classifier in terms of classification performance, time efficiency and storage requirements. We evaluate seven previously proposed instance selection algorithms and compare their performance to simple random selection of instances. We perform the evaluation using κ-nearest neighbor classifier and three classification problems: one with simulated Gaussian data and two based on clinical databases for breast cancer detection and diagnosis, respectively. Finally, we evaluate the impact of the number of instances available for selection on the performance of the selection algorithms and conduct initial analysis of the selected instances. The experiments show that for all investigated classification problems, it was possible to reduce the size of the original development dataset to less than 3% of its initial size while maintaining or improving the classification performance. Random mutation hill climbing emerges as the superior selection algorithm. Furthermore, we show that some previously proposed algorithms perform worse than random selection. Regarding the impact of the number of instances available for the classifier development on the performance of the selection algorithms, we confirm that the selection algorithms are generally more effective as the pool of available instances increases. In conclusion, instance selection is generally beneficial for instance-based classifiers as it can improve their performance,
AB - When constructing a pattern classifier, it is important to make best use of the instances (a.k.a. cases, examples, patterns or prototypes) available for its development. In this paper we present an extensive comparative analysis of algorithms that, given a pool of previously acquired instances, attempt to select those that will be the most effective to construct an instance-based classifier in terms of classification performance, time efficiency and storage requirements. We evaluate seven previously proposed instance selection algorithms and compare their performance to simple random selection of instances. We perform the evaluation using κ-nearest neighbor classifier and three classification problems: one with simulated Gaussian data and two based on clinical databases for breast cancer detection and diagnosis, respectively. Finally, we evaluate the impact of the number of instances available for selection on the performance of the selection algorithms and conduct initial analysis of the selected instances. The experiments show that for all investigated classification problems, it was possible to reduce the size of the original development dataset to less than 3% of its initial size while maintaining or improving the classification performance. Random mutation hill climbing emerges as the superior selection algorithm. Furthermore, we show that some previously proposed algorithms perform worse than random selection. Regarding the impact of the number of instances available for the classifier development on the performance of the selection algorithms, we confirm that the selection algorithms are generally more effective as the pool of available instances increases. In conclusion, instance selection is generally beneficial for instance-based classifiers as it can improve their performance,
UR - http://www.scopus.com/inward/record.url?scp=79551657178&partnerID=8YFLogxK
U2 - 10.1088/0031-9155/56/2/012
DO - 10.1088/0031-9155/56/2/012
M3 - Article
C2 - 21191152
AN - SCOPUS:79551657178
SN - 0031-9155
VL - 56
SP - 473
EP - 489
JO - Physics in Medicine and Biology
JF - Physics in Medicine and Biology
IS - 2
ER -