File: contribute.rst

package info (click to toggle)
scikit-learn 0.4-3
  • links: PTS, VCS
  • area: main
  • in suites: squeeze
  • size: 5,132 kB
  • ctags: 5,521
  • sloc: cpp: 5,834; ansic: 5,811; python: 4,345; makefile: 129
file content (206 lines) | stat: -rw-r--r-- 5,820 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
===========
Development
===========

This project is a community effort, and everyone is welcomed to
contribute.

Bug Tracker
===========

In case you experience difficulties using the package, do not hesitate
to submit a ticket to the
`Bug Tracker <http://sourceforge.net/apps/trac/scikit-learn/report/1>`_.

You are also welcomed to post there feature requests and patches.

Code
====


Git repo
--------

You can check the latest sources with the command::

    git clone git://scikit-learn.git.sourceforge.net/gitroot/scikit-learn/scikit-learn

or if you have write privileges::

    git clone ssh://USERNAME@scikit-learn.git.sourceforge.net/gitroot/scikit-learn/scikit-learn

If you have contributed some code and would like to have write
privileges in subversion repository, please contact me (Fabian
Pedregosa <fabian.pedregosa@inria.fr>) and I'll give you write
privileges.

If you run the development version, it is cumbersome to re-install the
package each time you update the sources. It is thus preferred that
you add the scikit-directory to your PYTHONPATH and build the
extension in place::

    python setup.py build_ext --inplace


Patches
-------
Patches are the prefered way to contribute to a project if you do not
have write privileges.

Let's suppose that you have the latest sources for subversion and that
you just made some modifications that you'd like to share with the
world. You might proceed as:

1. Create a patch file. The command::

    git format-patch origin

will create a series of patch files with the changes you made with
the code base. 

2. Send that file to the mailing list or attach it to an
issue in the issue tracker and some devs will push that patch to the
main repository.

3. Wait for a reply. You should soon receive a reply on whether your
patch was committed.


EasyFix Issues
^^^^^^^^^^^^^^

The best way to get your feet wet is to pick up an issue from the
`issue tracker
<https://sourceforge.net/apps/trac/scikit-learn/report>`_ that are
labeled as EasyFix. This means that the knowledge needed to solve the
issue is low, but still you are helping the project and letting more
experienced developers concentrate on other issues.


Roadmap
-------

`Here <http://sourceforge.net/apps/trac/scikit-learn/roadmap`_ you
will find a detailed roadmap, with a description on what's planned to
be implemented in the following releases.

.. _packaging:

Packaging
^^^^^^^^^

You can also help making binary distributions for windows, OsX or packages for some
distribution.

Developers web site
===================
More information can be found at the developer's web site:
http://sourceforge.net/apps/trac/scikit-learn/wiki , which contains a
wiki, an issue tracker, and a Roadmap

Documentation
=============

I am glad to accept any sort of documentation: function docstrings,
rst docs (like this one), tutorials, etc. Rst docs live in the source
code repository, under directory doc/.

You can edit them using any text editor and generate the html docs by
typing ``make html`` from the doc/ directory. That should create a
directory _build/html/ with html files that are viewable in a web
browser.


API guidelines
==============

The following are some guidelines on how new code should be
written. Of course, there are special cases and there will be
exceptions to these rules. However, following these rules when
submitting new code makes the review easier so new code can be
integrated in less time.


Estimators
----------

The API has one predominant object: the estimator. A estimator is an
object that fits a model based on some training data and is capable of
inferring some properties on new data. It can be for instance a
classifier or a regressor.


Instantiation
^^^^^^^^^^^^^

This concerns the object creation. The object's __init__ method might
accept as arguments constants that determine the estimator behavior
(like the C constant in SVMs).

It should not, however, take the actual training data as argument, as
this is leaved to the ``fit()`` method::

    clf1 = SVM(impl='c_svm')
    clf2 = SVM(C=2.3)
    clf3 = SVM([[1, 2], [2, 3]], [-1, 1]) # WRONG!


Fitting
^^^^^^^

The next thing you'll probably want to do is to estimate some
parameters in the model. This is implemented in the .fit() method.

The fit method takes as argument the training data, which can be one
array in the case of unsupervised learning, or two arrays in the case
of supervised learning.

Note that the model is fitted using X and y but the object holds no
reference to X, y. There are however some exceptions to this, as in
the case of precomputed kernels where you need to store access these
data in the predict method.

  Parameters

    * X : array-like, with shape = [N, D], where N is the number of
      samples and D is the number of features.
    * Y : array, with shape = [N], where N is the number of samples.

X.shape[0] should be the same as Y.shape[0]. If this requisite is not
met, an exception should be raised.

Y might be dropped in the case of unsupervised learning.

The method should return the object (self).


Python tuples
^^^^^^^^^^^^^

In addition to numpy arrays, all methods should be able to accept
python tuples as arguments. In practice, this means you should call
numpy.asanyarray at the beginning at each public method that accepts
arrays.


Optional Arguments
^^^^^^^^^^^^^^^^^^

In iterative algorithms, number of iterations should be specified by
an int called ``niter``.


TODO
----
Some things are must still be decided:

    * what should happen when predict is called before than fit() ?
    * which exception should be raised when arrays' shape do not match
      in fit() ?


Specific models
---------------

In linear models, coefficients are stored in an array called ``coef_``,
and independent term is stored in ``intercept_``.