File: tutorial.txt

package info (click to toggle)
mlpy 3.5.0%2Bds-2
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 3,124 kB
  • sloc: ansic: 8,656; cpp: 7,331; python: 2,604; makefile: 156
file content (95 lines) | stat: -rw-r--r-- 2,913 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
.. currentmodule:: mlpy

Tutorial
========

If you are new in Python and NumPy see: http://docs.python.org/tutorial/  http://www.scipy.org/Tentative_NumPy_Tutorial and http://matplotlib.sourceforge.net/.

A learning problem usually considers a set of p-dimensional
samples (observations) of data and tries to predict properties of
unknown data.

Tutorial 1 - Iris Dataset
-------------------------

The well known Iris dataset represents 3 kinds of Iris flowers with 150
observations and 4 attributes: sepal length, sepal width, petal
length and petal width.

A dimensionality reduction and learning tasks can be performed
by the mlpy library with just a few number of commands.

Download :download:`Iris dataset <data/iris.csv>`

Load the modules:

>>> import numpy as np
>>> import mlpy 
>>> import matplotlib.pyplot as plt # required for plotting

Load the Iris dataset:

>>> iris = np.loadtxt('iris.csv', delimiter=',')
>>> x, y = iris[:, :4], iris[:, 4].astype(int) # x: (observations x attributes) matrix, y: classes (1: setosa, 2: versicolor, 3: virginica)
>>> x.shape
(150, 4)
>>> y.shape
(150, )

Dimensionality reduction by Principal Component Analysis (PCA)

>>> pca = mlpy.PCA() # new PCA instance
>>> pca.learn(x) # learn from data
>>> z = pca.transform(x, k=2) # embed x into the k=2 dimensional subspace
>>> z.shape
(150, 2)

Plot the principal components:

>>> plt.set_cmap(plt.cm.Paired)
>>> fig1 = plt.figure(1)
>>> title = plt.title("PCA on iris dataset")
>>> plot = plt.scatter(z[:, 0], z[:, 1], c=y)
>>> labx = plt.xlabel("First component")
>>> laby = plt.ylabel("Second component")
>>> plt.show()

.. image:: images/iris_pca.png

Learning by Kernel Support Vector Machines (SVMs) on principal components:

>>> linear_svm = mlpy.LibSvm(kernel_type='linear') # new linear SVM instance
>>> linear_svm.learn(z, y) # learn from principal components

For plotting purposes, we build the grid where we will compute the
predictions (`zgrid`):

>>> xmin, xmax = z[:,0].min()-0.1, z[:,0].max()+0.1
>>> ymin, ymax = z[:,1].min()-0.1, z[:,1].max()+0.1
>>> xx, yy = np.meshgrid(np.arange(xmin, xmax, 0.01), np.arange(ymin, ymax, 0.01))
>>> zgrid = np.c_[xx.ravel(), yy.ravel()]

Now we perform the predictions on the grid. The `pred()` method
returns the prediction for each point in zgrid:

>>> yp = linear_svm.pred(zgrid)

Plot the predictions:

>>> plt.set_cmap(plt.cm.Paired)
>>> fig2 = plt.figure(2)
>>> title = plt.title("SVM (linear kernel) on principal components")
>>> plot1 = plt.pcolormesh(xx, yy, yp.reshape(xx.shape))
>>> plot2 = plt.scatter(z[:, 0], z[:, 1], c=y)
>>> labx = plt.xlabel("First component")
>>> laby = plt.ylabel("Second component")
>>> limx = plt.xlim(xmin, xmax)
>>> limy = plt.ylim(ymin, ymax)
>>> plt.show()

.. image:: images/iris_svm_linear.png

We can try to use different kernels to obtain:

.. image:: images/iris_svm_gaussian.png
.. image:: images/iris_svm_poly.png