April 23, 2021
Abstract:
One of the standard approaches to the classification problem in machine learning relies on the assumption that classes are linearly separable in the considered feature space.
In this work, we apply this approach to the case of semi-supervised transductive learning which aims to infer the correct labels for a given unlabeled data.
In particular, we prove a new fundamental fact that linearly separable class can be uniquely identified by its mean. We prove that class is linearly separable if and only if
it is maximal by probability among all sets with the same mean. We also show that for any element that appears to be the mean of a certain set, there exists a unique linearly separable set
with a mean in that element. Finally, we prove the continuity of the mapping from means to the corresponding linearly separable classes. We use this theoretical grounding
to design an MPSM (Maximal by Probability with the Same Mean) algorithm for transductive inference of linearly separable class based on the value of its mean.
We also propose a modification of our algorithm (OC-MPSM) for application to the one-class semi-supervised transductive learning problem. We test our approach based on the USPS
digits images dataset. Our results show that the developed theory and proposed methods work well and confirm that class can be successfully identified based on its mean value.
Our analysis demonstrates the advantage of OC-MPSM over the baseline for the cases where only a few labels are available for training.
|
|