1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334
|
-------------------------------------
--- Python interface of LIBLINEAR ---
-------------------------------------
Table of Contents
=================
- Introduction
- Installation
- Quick Start
- Design Description
- Data Structures
- Utility Functions
- Additional Information
Introduction
============
Python (http://www.python.org/) is a programming language suitable for rapid
development. This tool provides a simple Python interface to LIBLINEAR, a library
for support vector machines (http://www.csie.ntu.edu.tw/~cjlin/liblinear). The
interface is very easy to use as the usage is the same as that of LIBLINEAR. The
interface is developed with the built-in Python library "ctypes."
Installation
============
On Unix systems, type
> make
The interface needs only LIBLINEAR shared library, which is generated by
the above command. We assume that the shared library is on the LIBLINEAR
main directory or in the system path.
For windows, the shared library liblinear.dll is ready in the directory
`..\windows'. You can also copy it to the system directory (e.g.,
`C:\WINDOWS\system32\' for Windows XP). To regenerate the shared library,
please follow the instruction of building windows binaries in LIBLINEAR README.
Quick Start
===========
There are two levels of usage. The high-level one uses utility functions
in liblinearutil.py and the usage is the same as the LIBLINEAR MATLAB interface.
>>> from liblinearutil import *
# Read data in LIBSVM format
>>> y, x = svm_read_problem('../heart_scale')
>>> m = train(y[:200], x[:200], '-c 4')
>>> p_label, p_acc, p_val = predict(y[200:], x[200:], m)
# Construct problem in python format
# Dense data
>>> y, x = [1,-1], [[1,0,1], [-1,0,-1]]
# Sparse data
>>> y, x = [1,-1], [{1:1, 3:1}, {1:-1,3:-1}]
>>> prob = problem(y, x)
>>> param = parameter('-c 4 -B 1')
>>> m = train(prob, param)
# Other utility functions
>>> save_model('heart_scale.model', m)
>>> m = load_model('heart_scale.model')
>>> p_label, p_acc, p_val = predict(y, x, m, '-b 1')
>>> ACC = evaluations(y, p_val)
# Getting online help
>>> help(train)
The low-level use directly calls C interfaces imported by liblinear.py. Note that
all arguments and return values are in ctypes format. You need to handle them
carefully.
>>> from liblinear import *
>>> prob = problem([1,-1], [{1:1, 3:1}, {1:-1,3:-1}])
>>> param = parameter('-c 4')
>>> m = liblinear.train(prob, param) # m is a ctype pointer to a model
# Convet a Python-fromat instance to feature_nodearray, a ctypes structure
>>> x0, max_idx = gen_feature_nodearray({1:1, 3:1})
>>> label = liblinear.predict(m, x0)
Design Description
==================
There are two files liblinear.py and liblinearutil.py, which respectively correspond to
low-level and high-level use of the interface.
In liblinear.py, we adopt the Python built-in library "ctypes," so that
Python can directly access C structures and interface functions defined
in linear.h.
While advanced users can use structures/functions in liblinear.py, to
avoid handling ctypes structures, in liblinearutil.py we provide some easy-to-use
functions. The usage is similar to LIBLINEAR MATLAB interface.
Data Structures
===============
Three data structures derived from linear.h are node, problem, and
parameter. They all contain fields with the same names in
linear.h. Access these fields carefully because you directly use a C structure
instead of a Python object. The following description introduces additional
fields and methods.
Before using the data structures, execute the following command to load the
LIBLINEAR shared library:
>>> from liblinear import *
- class feature_node:
Construct an feature_node.
>>> node = feature_node(idx, val)
idx: an integer indicates the feature index.
val: a float indicates the feature value.
- Function: gen_feature_nodearray(xi [,feature_max=None [,issparse=True]])
Generate a feature vector from a Python list/tuple or a dictionary:
>>> xi, max_idx = gen_feature_nodearray({1:1, 3:1, 5:-2})
xi: the returned feature_nodearray (a ctypes structure)
max_idx: the maximal feature index of xi
issparse: if issparse == True, zero feature values are removed. The default
value is True for the sparsity.
feature_max: if feature_max is assigned, features with indices larger than
feature_max are removed.
- class problem:
Construct an problem instance
>>> prob = problem(y, x, [bias=-1])
y: a Python list/tuple of l labels (type must be int/double).
x: a Python list/tuple of l data instances. Each element of x must be
an instance of list/tuple/dictionary type.
bias: if bias >= 0, instance x becomes [x; bias]; if < 0, no bias term
added (default -1)
You can alos modify the bias value by
>>> prob.set_bias(1)
Note that if your x contains sparse data (i.e., dictionary), the internal
ctypes data format is still sparse.
- class parameter:
Construct an parameter instance
>>> param = parameter('training_options')
If 'training_options' is empty, LIBLINEAR default values are applied.
Set param to LIBLINEAR default values.
>>> param.set_to_default_values()
Parse a string of options.
>>> param.parse_options('training_options')
Show values of parameters.
>>> param.show()
- class model:
There are two ways to obtain an instance of model:
>>> model_ = train(y, x)
>>> model_ = load_model('model_file_name')
Note that the returned structure of interface functions
liblinear.train and liblinear.load_model is a ctypes pointer of
model, which is different from the model object returned
by train and load_model in liblinearutil.py. We provide a
function toPyModel for the conversion:
>>> model_ptr = liblinear.train(prob, param)
>>> model_ = toPyModel(model_ptr)
If you obtain a model in a way other than the above approaches,
handle it carefully to avoid memory leak or segmentation fault.
Some interface functions to access LIBLINEAR models are wrapped as
members of the class model:
>>> type = model_.get_type()
>>> nr_feature = model_.get_nr_feature()
>>> nr_class = model_.get_nr_class()
>>> class_labels = model_.get_labels()
>>> is_prob_model = model_.is_probability_model()
Utility Functions
=================
To use utility functions, type
>>> from liblinearutil import *
The above command loads
train() : train an linear model
predict() : predict testing data
svm_read_problem() : read the data from a LIBSVM-format file.
load_model() : load a LIBLINEAR model.
save_model() : save model to a file.
evaluations() : evaluate prediction results.
- Function: train
There are three ways to call train()
>>> model = train(y, x [, 'training_options'])
>>> model = train(prob [, 'training_options'])
>>> model = train(prob, param)
y: a list/tuple of l training labels (type must be int/double).
x: a list/tuple of l training instances. The feature vector of
each training instance is an instance of list/tuple or dictionary.
training_options: a string in the same form as that for LIBLINEAR command
mode.
prob: an problem instance generated by calling
problem(y, x).
param: an parameter instance generated by calling
parameter('training_options')
model: the returned model instance. See linear.h for details of this
structure. If '-v' is specified, cross validation is
conducted and the returned model is just a scalar: cross-validation
accuracy for classification and mean-squared error for regression.
To train the same data many times with different
parameters, the second and the third ways should be faster..
Examples:
>>> y, x = svm_read_problem('../heart_scale')
>>> prob = problem(y, x)
>>> param = parameter('-s 3 -c 5 -q')
>>> m = train(y, x, '-c 5')
>>> m = train(prob, '-w1 5 -c 5')
>>> m = train(prob, param)
>>> CV_ACC = train(y, x, '-v 3')
- Function: predict
To predict testing data with a model, use
>>> p_labs, p_acc, p_vals = predict(y, x, model [,'predicting_options'])
y: a list/tuple of l true labels (type must be int/double). It is used
for calculating the accuracy. Use [] if true labels are
unavailable.
x: a list/tuple of l predicting instances. The feature vector of
each predicting instance is an instance of list/tuple or dictionary.
predicting_options: a string of predicting options in the same format as
that of LIBLINEAR.
model: an model instance.
p_labels: a list of predicted labels
p_acc: testing accuracy
p_vals: a list of decision values or probability estimates (if '-b 1'
is specified). If k is the number of classes, for decision values,
each element includes results of predicting k binary-class
SVMs. if k = 2 and solver is not MCSVM_CS, only one decision value
is returned. For probabilities, each element contains k values
indicating the probability that the testing instance is in each class.
Note that the order of classes here is the same as 'model.label'
field in the model structure.
Example:
>>> m = train(y, x, '-c 5')
>>> p_labels, p_acc, p_vals = predict(y, x, m)
- Functions: svm_read_problem/load_model/save_model
See the usage by examples:
>>> y, x = svm_read_problem('data.txt')
>>> m = load_model('model_file')
>>> save_model('model_file', m)
- Function: evaluations
Calculate some evaluations using the true values (ty) and predicted
values (pv):
>>> ACC = evaluations(ty, pv)
ty: a list of true values.
pv: a list of predict values.
ACC: accuracy.
Additional Information
======================
This interface was written by Hsiang-Fu Yu from Department of Computer
Science, National Taiwan University. If you find this tool useful, please
cite LIBLINEAR as follows
R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin.
LIBLINEAR: A Library for Large Linear Classification, Journal of
Machine Learning Research 9(2008), 1871-1874. Software available at
http://www.csie.ntu.edu.tw/~cjlin/liblinear
For any question, please contact Chih-Jen Lin <cjlin@csie.ntu.edu.tw>,
or check the FAQ page:
http://www.csie.ntu.edu.tw/~cjlin/liblinear/faq.html
|