

Xu Cui ? SVM (support vector machine) with libsvm

I am learning svm lately and tried libsvm. It’s a good package.

Linear kernel example (support vectors are in circles):


Nonlinear example (radial basis)

Nonlinear, circle

Nonlinear, two circles

Nonlinear, quadrant

3-class example

Linear, 3 classes

Basic procedure to use libsvm:

  1. Preprocess your data. This including normalization (make all values between 0 and 1) and transform non-numeric values to numeric. You can use the following code to normalize (from libsvm webpage):
    (data - repmat(min(data,[],1),size(data,1),1))*spdiags(1./(max(data,[],1)-min(data,[],1))',0,size(data,2),size(data,2))
  2. Find optimal parameter values. For linear kernel, you have 1 parameter C (penalize parameter). For commonly used radial kernel, you have two parameters (C and gamma). Different parameter values will yield different accuracy rate. To avoid over fitting, you use n-fold cross validation. For example, a 5-fold cross validation is to use 4/5 of the data to train the svm model and the rest 1/5 to test. The option -c, -g, and -v controls parameter C, gamma and n-fold cross validation. A piece of code from libsvm website is:
    bestcv = 0;
    for log2c = -1:3,
    for log2g = -4:1,
    cmd = ['-v 5 -c ', num2str(2^log2c), ' -g ', num2str(2^log2g)];
    cv = svmtrain(heart_scale_label, heart_scale_inst, cmd);
    if (cv >= bestcv),
    bestcv = cv; bestc = 2^log2c; bestg = 2^log2g;
    fprintf('%g %g %g (best c=%g, g=%g, rate=%g)\n', log2c, log2g, cv, bestc, bestg, bestcv);
  3. You may have to run the above code several times with different range of parameter values to find the optimal values. For example, you might want to start from a bigger range with coarse resolution; then fine tune to smaller regions with higher resolution.
  4. After finding the optimal parameter values, use all data to train your model with your optimal parameter values.
    cmd = ['-t 2 -c ', num2str(bestc), ' -g ', num2str(bestg)];
    model = svmtrain(l, d, cmd);
  5. If you have new data, you may use this model to classify the new data.
    [predicted_label, accuracy, decision_values] = svmpredict(zeros(size(dd,1),1), dd, model);

Commonly used options

  • -v n: n-fold cross validation
  • -t 0: linear kernel
  • -t 2: radial basis (default)
  • -s 0: SVC type = C-SVC
  • -C: C parameter value, default 1
  • -g: gamma parameter value

libsvm performance

I tested on different data size and record the time spent (in second).

Computer: Processor: 2×2.66G, memory: 12G, OS: Windows XP installed in VMWare in Mac OS 10.5

data size    # features    svmtrain    svmpredict
100    2    0.00    0.00
100    6    0.00    0.00
100    10    0.00    0.00
100    20    0.00    0.00
100    50    0.01    0.00
100    100    0.02    0.01
500    2    0.02    0.01
500    6    0.03    0.02
500    10    0.05    0.03
500    20    0.08    0.03
500    50    0.46    0.07
500    100    0.56    0.12
1000    2    0.07    0.04
1000    6    0.10    0.06
1000    10    0.15    0.10
1000    20    0.36    0.14
1000    50    1.09    0.30
1000    100    3.07    0.50

It’s fairly fast.


MatLab code to generate the plots above:cuixu_test_svm1

SVM basics: http://en.wikipedia.org/wiki/Support_vector_machine

Download libsvm for matlab at: http://www.csie.ntu.edu.tw/~cjlin/libsvm/#matlab

The meaning of libsvm output is at: http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#f804

打开APP,阅读全文并永久保存 查看更多类似文章
libsvm 2.6 的代码注释.
LibSVM for Python 使用
LibSVM学习(六)——easy.py和grid.py的使用 - 东海的日志 - 网易博...
更多类似文章 >>
分享 收藏 导长图 关注 下载文章
