File: tutorial.txt

package info (click to toggle)
mlpy 2.2.0~dfsg1-2
  • links: PTS, VCS
  • area: main
  • in suites: squeeze, wheezy
  • size: 968 kB
  • ctags: 851
  • sloc: ansic: 6,056; python: 3,316; makefile: 101
file content (59 lines) | stat: -rw-r--r-- 1,627 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
Tutorial
========

A Simple Example
----------------

In this example the performance of SVM classifier is evaluated in a
stratified k-fold resampling schema.

First, import NumPy and mlpy modules:

.. code-block:: python
   
   >>> import numpy as np
   >>> import mlpy


Then, load a data file (*data.dat*) containing 30 samples described by
100 features (*x*) and labels (*y*):

.. code-block:: python

   >>> x, y = mlpy.data_fromfile('data.dat') # import data file
   >>> x.shape
   (30, 100)

Initialize SVM classifier, specifying kernel type (*linear*) and
regularization parameter (*C*):

.. code-block:: python

   >>> classifier = mlpy.Svm(kernel = 'linear', C = 1.0)  # initialize the svm classifier 
   
Define a stratified 10-fold resampling schema, where *idx* contains
the sample indexes (list of train/test pairs):

.. code-block:: python
   
   >>> idx = mlpy.kfoldS(cl = y, sets = 10)

Actually build train and test data. Train the model on *xtr* and
test it on *xts*. The performance is evaluated computing the
average prediction error:

.. code-block:: python

   >>> pred_err = 0.0
   >>> for idxtr, idxts in idx:
   ...     xtr, xts = x[idxtr], x[idxts]       # build training data
   ...     ytr, yts = y[idxtr], y[idxts]       # build test data
   ...     ret = classifier.compute(xtr, ytr)  # compute the model
   ...     pred = classifier.predict(xts)      # test the model on test data
   ...     pred_err += mlpy.err(yts, pred)          # compute the prediction error
   >>> av_pred_err = pred_err / len(idx)       # compute the average prediction error
   >>> av_pred_err
   0.17499999999999999