I have a pretty basic question about how feature sparseness affects
the SVM. I'm a beginner...
Setup: 100,000 data points, 50,000 class 0. 10,000 class 1. 50
Features, some continuous, some discrete (binary). Classifier=SVM/
RBF.
If I have a binary feature F_i that is only set for say 50 points, but
most (say 45) of these are class 1 points, then won't the SVM "learn"
that if f_i is set, then the class should be predicted as 1?
As a simple test, I tried a simpler problem with two binary features.
I define the following points (third number is label):
170 x (0,0,-1)
30 x (0,0,+1)
170 x (1,0,+1)
30 x (1,0,-1)
4 x (0,1,+1)
1 x (0,1,-1)
4 x (1,1,+1)
1 x (1,1,-1)
Idea: First feature is reasonably discriminative between the two
cl*****. Second feature is very sparse in that it is only set for 10
points. However, it also can discriminate between the two cl*****
within this subset of 10 points.
I'm using an RBF and I've tried several settings of the gamma and
complexity, but I always end up with the 0,0 points classified as
false, and everything else classified as true. Intuitively, I believe
the 0,1 points should all be classified as false also, since the
second feature is only rarely set.
In a naive bayes type classifier, I would have P(Y=1|f_i=1)
= ...P(f_i=1|Y=1)P(Y=1)... The P(f_i=1|Y=1) would have a relatively
low probability, and seems to account for the sparseness of the
feature. I don't understand how the SVM accounts for this - or if it
does. If not, how can I incor****ate this into the classification?


|