Rouzbahan University College, Sari, Mazandaran, Iran
This paper describes how to classify a data set by using an optimum set of exemplar to determine the label of an instance among a set of data for solving classification run time problem in a large data set.
In this paper, we purposely use these exemplars to classify positive and negative bags in synthetic data set.
There are several methods to implement multi-instance learning (MIL) such as SVM, CNN, and Diverse density. In this paper, optimum set of classifier exemplar (OSCE) is used to recognize positive bag (contains tumor patches).
The goal of this paper is to find a way to speed up the classifier run time by choosing a set of exemplars. We used linear programming problems to optimize a hinge loss cost function, in which estimated label and actual label is used to train the classification. Estimated label is calculated by measuring Euclidean distance of a query point to all of its k nearest neighbors and an actual label value. To select some exemplars with none zero weights, Two solutions is suggested to have a better result. One of them is choosing k closer neighbors. The other one is using LP and thresholding to select some maximum of achieved unknown variable which are more significant in finding a set of exemplar. Also, there is trade-off between classifier run time and accuracy. In large data set, OSCE classifier has better performance than ANN and K-NN cluster. Also, OSCE is faster than NN classifier. After describing OSCE method, we used it to recognize a data set which contains cancer in synthetic data points. In deed, we define OSCE to apply for MIL for cancer detection.