File: README

package info (click to toggle)
liblinear 1.8%2Bdfsg-4
  • links: PTS, VCS
  • area: main
  • in suites: jessie, jessie-kfreebsd
  • size: 484 kB
  • ctags: 331
  • sloc: cpp: 2,266; ansic: 1,432; python: 320; makefile: 127; sh: 9
file content (334 lines) | stat: -rw-r--r-- 10,574 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
-------------------------------------
--- Python interface of LIBLINEAR ---
-------------------------------------

Table of Contents
=================

- Introduction
- Installation
- Quick Start
- Design Description
- Data Structures
- Utility Functions
- Additional Information

Introduction
============

Python (http://www.python.org/) is a programming language suitable for rapid
development. This tool provides a simple Python interface to LIBLINEAR, a library
for support vector machines (http://www.csie.ntu.edu.tw/~cjlin/liblinear). The
interface is very easy to use as the usage is the same as that of LIBLINEAR. The
interface is developed with the built-in Python library "ctypes."

Installation
============

On Unix systems, type

> make

The interface needs only LIBLINEAR shared library, which is generated by
the above command. We assume that the shared library is on the LIBLINEAR
main directory or in the system path.

For windows, the shared library liblinear.dll is ready in the directory
`..\windows'. You can also copy it to the system directory (e.g.,
`C:\WINDOWS\system32\' for Windows XP). To regenerate the shared library,
please follow the instruction of building windows binaries in LIBLINEAR README.

Quick Start
===========

There are two levels of usage. The high-level one uses utility functions
in liblinearutil.py and the usage is the same as the LIBLINEAR MATLAB interface.

>>> from liblinearutil import *
# Read data in LIBSVM format
>>> y, x = svm_read_problem('../heart_scale')
>>> m = train(y[:200], x[:200], '-c 4')
>>> p_label, p_acc, p_val = predict(y[200:], x[200:], m)

# Construct problem in python format
# Dense data
>>> y, x = [1,-1], [[1,0,1], [-1,0,-1]]
# Sparse data
>>> y, x = [1,-1], [{1:1, 3:1}, {1:-1,3:-1}]
>>> prob  = problem(y, x)
>>> param = parameter('-c 4 -B 1')
>>> m = train(prob, param)

# Other utility functions
>>> save_model('heart_scale.model', m)
>>> m = load_model('heart_scale.model')
>>> p_label, p_acc, p_val = predict(y, x, m, '-b 1')
>>> ACC = evaluations(y, p_val)

# Getting online help
>>> help(train)

The low-level use directly calls C interfaces imported by liblinear.py. Note that
all arguments and return values are in ctypes format. You need to handle them
carefully.

>>> from liblinear import *
>>> prob = problem([1,-1], [{1:1, 3:1}, {1:-1,3:-1}])
>>> param = parameter('-c 4')
>>> m = liblinear.train(prob, param) # m is a ctype pointer to a model
# Convet a Python-fromat instance to feature_nodearray, a ctypes structure
>>> x0, max_idx = gen_feature_nodearray({1:1, 3:1})
>>> label = liblinear.predict(m, x0)

Design Description
==================

There are two files liblinear.py and liblinearutil.py, which respectively correspond to
low-level and high-level use of the interface.

In liblinear.py, we adopt the Python built-in library "ctypes," so that
Python can directly access C structures and interface functions defined
in linear.h.

While advanced users can use structures/functions in liblinear.py, to
avoid handling ctypes structures, in liblinearutil.py we provide some easy-to-use
functions. The usage is similar to LIBLINEAR MATLAB interface.

Data Structures
===============

Three data structures derived from linear.h are node, problem, and
parameter. They all contain fields with the same names in
linear.h. Access these fields carefully because you directly use a C structure
instead of a Python object. The following description introduces additional
fields and methods.

Before using the data structures, execute the following command to load the
LIBLINEAR shared library:

    >>> from liblinear import *

- class feature_node:

    Construct an feature_node.

    >>> node = feature_node(idx, val)

    idx: an integer indicates the feature index.

    val: a float indicates the feature value.

- Function: gen_feature_nodearray(xi [,feature_max=None [,issparse=True]])

    Generate a feature vector from a Python list/tuple or a dictionary:

    >>> xi, max_idx = gen_feature_nodearray({1:1, 3:1, 5:-2})

    xi: the returned feature_nodearray (a ctypes structure)

    max_idx: the maximal feature index of xi

    issparse: if issparse == True, zero feature values are removed. The default
              value is True for the sparsity.

    feature_max: if feature_max is assigned, features with indices larger than
                 feature_max are removed.

- class problem:

    Construct an problem instance

    >>> prob = problem(y, x, [bias=-1])

    y: a Python list/tuple of l labels (type must be int/double).

    x: a Python list/tuple of l data instances. Each element of x must be
       an instance of list/tuple/dictionary type.

    bias: if bias >= 0, instance x becomes [x; bias]; if < 0, no bias term 
          added (default -1)

    You can alos modify the bias value by

    >>> prob.set_bias(1)

    Note that if your x contains sparse data (i.e., dictionary), the internal
    ctypes data format is still sparse.

- class parameter:

    Construct an parameter instance

    >>> param = parameter('training_options')

    If 'training_options' is empty, LIBLINEAR default values are applied.

    Set param to LIBLINEAR default values.

    >>> param.set_to_default_values()

    Parse a string of options.

    >>> param.parse_options('training_options')

    Show values of parameters.

    >>> param.show()

- class model:

    There are two ways to obtain an instance of model:

    >>> model_ = train(y, x)
    >>> model_ = load_model('model_file_name')

    Note that the returned structure of interface functions
    liblinear.train and liblinear.load_model is a ctypes pointer of
    model, which is different from the model object returned
    by train and load_model in liblinearutil.py. We provide a
    function toPyModel for the conversion:

    >>> model_ptr = liblinear.train(prob, param)
    >>> model_ = toPyModel(model_ptr)

    If you obtain a model in a way other than the above approaches,
    handle it carefully to avoid memory leak or segmentation fault.

    Some interface functions to access LIBLINEAR models are wrapped as
    members of the class model:

    >>> type = model_.get_type()
    >>> nr_feature =  model_.get_nr_feature()
    >>> nr_class = model_.get_nr_class()
    >>> class_labels = model_.get_labels()
    >>> is_prob_model = model_.is_probability_model()

Utility Functions
=================

To use utility functions, type

    >>> from liblinearutil import *

The above command loads
    train()            : train an linear model
    predict()          : predict testing data
    svm_read_problem() : read the data from a LIBSVM-format file.
    load_model()       : load a LIBLINEAR model.
    save_model()       : save model to a file.
    evaluations()      : evaluate prediction results.

- Function: train

    There are three ways to call train()

    >>> model = train(y, x [, 'training_options'])
    >>> model = train(prob [, 'training_options'])
    >>> model = train(prob, param)

    y: a list/tuple of l training labels (type must be int/double).

    x: a list/tuple of l training instances. The feature vector of
       each training instance is an instance of list/tuple or dictionary.

    training_options: a string in the same form as that for LIBLINEAR command
                      mode.

    prob: an problem instance generated by calling
          problem(y, x).

    param: an parameter instance generated by calling
           parameter('training_options')

    model: the returned model instance. See linear.h for details of this
           structure. If '-v' is specified, cross validation is
           conducted and the returned model is just a scalar: cross-validation
           accuracy for classification and mean-squared error for regression.

    To train the same data many times with different
    parameters, the second and the third ways should be faster..

    Examples:

    >>> y, x = svm_read_problem('../heart_scale')
    >>> prob = problem(y, x)
    >>> param = parameter('-s 3 -c 5 -q')
    >>> m = train(y, x, '-c 5')
    >>> m = train(prob, '-w1 5 -c 5')
    >>> m = train(prob, param)
    >>> CV_ACC = train(y, x, '-v 3')

- Function: predict

    To predict testing data with a model, use

    >>> p_labs, p_acc, p_vals = predict(y, x, model [,'predicting_options'])

    y: a list/tuple of l true labels (type must be int/double). It is used
       for calculating the accuracy. Use [] if true labels are
       unavailable.

    x: a list/tuple of l predicting instances. The feature vector of
       each predicting instance is an instance of list/tuple or dictionary.

    predicting_options: a string of predicting options in the same format as
                        that of LIBLINEAR.

    model: an model instance.

    p_labels: a list of predicted labels

    p_acc: testing accuracy

    p_vals: a list of decision values or probability estimates (if '-b 1' 
            is specified). If k is the number of classes, for decision values,
            each element includes results of predicting k binary-class
            SVMs. if k = 2 and solver is not MCSVM_CS, only one decision value 
            is returned. For probabilities, each element contains k values 
            indicating the probability that the testing instance is in each class.
            Note that the order of classes here is the same as 'model.label'
            field in the model structure.

    Example:

    >>> m = train(y, x, '-c 5')
    >>> p_labels, p_acc, p_vals = predict(y, x, m)

- Functions: svm_read_problem/load_model/save_model

    See the usage by examples:

    >>> y, x = svm_read_problem('data.txt')
    >>> m = load_model('model_file')
    >>> save_model('model_file', m)

- Function: evaluations

    Calculate some evaluations using the true values (ty) and predicted
    values (pv):

    >>> ACC = evaluations(ty, pv)

    ty: a list of true values.

    pv: a list of predict values.

    ACC: accuracy.


Additional Information
======================

This interface was written by Hsiang-Fu Yu from Department of Computer
Science, National Taiwan University. If you find this tool useful, please
cite LIBLINEAR as follows

R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin.
LIBLINEAR: A Library for Large Linear Classification, Journal of
Machine Learning Research 9(2008), 1871-1874. Software available at
http://www.csie.ntu.edu.tw/~cjlin/liblinear

For any question, please contact Chih-Jen Lin <cjlin@csie.ntu.edu.tw>,
or check the FAQ page:

http://www.csie.ntu.edu.tw/~cjlin/liblinear/faq.html