File: related_projects.rst

package info (click to toggle)
scikit-learn 0.20.2%2Bdfsg-6
  • links: PTS, VCS
  • area: main
  • in suites: buster
  • size: 51,036 kB
  • sloc: python: 108,171; ansic: 8,722; cpp: 5,651; makefile: 192; sh: 40
file content (282 lines) | stat: -rw-r--r-- 11,538 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
.. _related_projects:

=====================================
Related Projects
=====================================

Projects implementing the scikit-learn estimator API are encouraged to use
the `scikit-learn-contrib template <https://github.com/scikit-learn-contrib/project-template>`_
which facilitates best practices for testing and documenting estimators.
The `scikit-learn-contrib GitHub organisation <https://github.com/scikit-learn-contrib/scikit-learn-contrib>`_
also accepts high-quality contributions of repositories conforming to this
template.

Below is a list of sister-projects, extensions and domain specific packages.

Interoperability and framework enhancements
-------------------------------------------

These tools adapt scikit-learn for use with other technologies or otherwise
enhance the functionality of scikit-learn's estimators.

**Data formats**

- `sklearn_pandas <https://github.com/paulgb/sklearn-pandas/>`_ bridge for
  scikit-learn pipelines and pandas data frame with dedicated transformers.
  
- `sklearn_xarray <https://github.com/phausamann/sklearn-xarray/>`_ provides
  compatibility of scikit-learn estimators with xarray data structures.

**Auto-ML**

- `auto_ml <https://github.com/ClimbsRocks/auto_ml/>`_
  Automated machine learning for production and analytics, built on scikit-learn
  and related projects. Trains a pipeline wth all the standard machine learning 
  steps. Tuned for prediction speed and ease of transfer to production environments. 

- `auto-sklearn <https://github.com/automl/auto-sklearn/>`_
  An automated machine learning toolkit and a drop-in replacement for a
  scikit-learn estimator

- `TPOT <https://github.com/rhiever/tpot>`_
  An automated machine learning toolkit that optimizes a series of scikit-learn
  operators to design a machine learning pipeline, including data and feature
  preprocessors as well as the estimators. Works as a drop-in replacement for a
  scikit-learn estimator.

- `scikit-optimize <https://scikit-optimize.github.io/>`_
  A library to minimize (very) expensive and noisy black-box functions. It
  implements several methods for sequential model-based optimization, and
  includes a replacement for ``GridSearchCV`` or ``RandomizedSearchCV`` to do
  cross-validated parameter search using any of these strategies.

**Experimentation frameworks**

- `REP <https://github.com/yandex/REP>`_ Environment for conducting data-driven
  research in a consistent and reproducible way

- `ML Frontend <https://github.com/jeff1evesque/machine-learning>`_ provides
  dataset management and SVM fitting/prediction through
  `web-based <https://github.com/jeff1evesque/machine-learning#web-interface>`_
  and `programmatic <https://github.com/jeff1evesque/machine-learning#programmatic-interface>`_
  interfaces.

- `Scikit-Learn Laboratory
  <https://skll.readthedocs.io/en/latest/index.html>`_  A command-line
  wrapper around scikit-learn that makes it easy to run machine learning
  experiments with multiple learners and large feature sets.

- `Xcessiv <https://github.com/reiinakano/xcessiv>`_ is a notebook-like
  application for quick, scalable, and automated hyperparameter tuning
  and stacked ensembling. Provides a framework for keeping track of 
  model-hyperparameter combinations.

**Model inspection and visualisation**

- `eli5 <https://github.com/TeamHG-Memex/eli5/>`_ A library for
  debugging/inspecting machine learning models and explaining their
  predictions.

- `mlxtend <https://github.com/rasbt/mlxtend>`_ Includes model visualization
  utilities.

- `scikit-plot <https://github.com/reiinakano/scikit-plot>`_ A visualization library
  for quick and easy generation of common plots in data analysis and machine learning.

- `yellowbrick <https://github.com/DistrictDataLabs/yellowbrick>`_ A suite of
  custom matplotlib visualizers for scikit-learn estimators to support visual feature
  analysis, model selection, evaluation, and diagnostics.


**Model export for production**

- `sklearn-pmml <https://github.com/alex-pirozhenko/sklearn-pmml>`_
  Serialization of (some) scikit-learn estimators into PMML.

- `sklearn2pmml <https://github.com/jpmml/sklearn2pmml>`_
  Serialization of a wide variety of scikit-learn estimators and transformers
  into PMML with the help of `JPMML-SkLearn <https://github.com/jpmml/jpmml-sklearn>`_
  library.

- `sklearn-porter <https://github.com/nok/sklearn-porter>`_
  Transpile trained scikit-learn models to C, Java, Javascript and others.

- `sklearn-compiledtrees <https://github.com/ajtulloch/sklearn-compiledtrees/>`_
  Generate a C++ implementation of the predict function for decision trees (and
  ensembles) trained by sklearn. Useful for latency-sensitive production
  environments.


Other estimators and tasks
--------------------------

Not everything belongs or is mature enough for the central scikit-learn
project. The following are projects providing interfaces similar to
scikit-learn for additional learning algorithms, infrastructures
and tasks.

**Structured learning**

- `Seqlearn <https://github.com/larsmans/seqlearn>`_  Sequence classification
  using HMMs or structured perceptron.

- `HMMLearn <https://github.com/hmmlearn/hmmlearn>`_ Implementation of hidden
  markov models that was previously part of scikit-learn.

- `PyStruct <https://pystruct.github.io>`_ General conditional random fields
  and structured prediction.

- `pomegranate <https://github.com/jmschrei/pomegranate>`_ Probabilistic modelling
  for Python, with an emphasis on hidden Markov models.

- `sklearn-crfsuite <https://github.com/TeamHG-Memex/sklearn-crfsuite>`_
  Linear-chain conditional random fields
  (`CRFsuite <http://www.chokkan.org/software/crfsuite/>`_ wrapper with
  sklearn-like API).

**Deep neural networks etc.**

- `pylearn2 <http://deeplearning.net/software/pylearn2/>`_ A deep learning and
  neural network library build on theano with scikit-learn like interface.

- `sklearn_theano <https://sklearn-theano.github.io/>`_ scikit-learn compatible
  estimators, transformers, and datasets which use Theano internally

- `nolearn <https://github.com/dnouri/nolearn>`_ A number of wrappers and
  abstractions around existing neural network libraries

- `keras <https://github.com/fchollet/keras>`_ Deep Learning library capable of
  running on top of either TensorFlow or Theano.

- `lasagne <https://github.com/Lasagne/Lasagne>`_ A lightweight library to
  build and train neural networks in Theano.
  
- `skorch <https://github.com/dnouri/skorch>`_ A scikit-learn compatible 
  neural network library that wraps PyTorch.

**Broad scope**

- `mlxtend <https://github.com/rasbt/mlxtend>`_ Includes a number of additional
  estimators as well as model visualization utilities.

- `sparkit-learn <https://github.com/lensacom/sparkit-learn>`_ Scikit-learn
  API and functionality for PySpark's distributed modelling.

**Other regression and classification**

- `xgboost <https://github.com/dmlc/xgboost>`_ Optimised gradient boosted decision
  tree library.

- `ML-Ensemble <https://mlens.readthedocs.io/>`_ Generalized
  ensemble learning (stacking, blending, subsemble, deep ensembles,
  etc.).

- `lightning <https://github.com/scikit-learn-contrib/lightning>`_ Fast
  state-of-the-art linear model solvers (SDCA, AdaGrad, SVRG, SAG, etc...).

- `py-earth <https://github.com/scikit-learn-contrib/py-earth>`_ Multivariate
  adaptive regression splines

- `Kernel Regression <https://github.com/jmetzen/kernel_regression>`_
  Implementation of Nadaraya-Watson kernel regression with automatic bandwidth
  selection

- `gplearn <https://github.com/trevorstephens/gplearn>`_ Genetic Programming
  for symbolic regression tasks.

- `multiisotonic <https://github.com/alexfields/multiisotonic>`_ Isotonic
  regression on multidimensional features.

- `scikit-multilearn <https://scikit.ml>`_ Multi-label classification with 
  focus on label space manipulation.

- `seglearn <https://github.com/dmbee/seglearn>`_ Time series and sequence 
  learning using sliding window segmentation.

**Decomposition and clustering**

- `lda <https://github.com/ariddell/lda/>`_: Fast implementation of latent
  Dirichlet allocation in Cython which uses `Gibbs sampling
  <https://en.wikipedia.org/wiki/Gibbs_sampling>`_ to sample from the true
  posterior distribution. (scikit-learn's
  :class:`sklearn.decomposition.LatentDirichletAllocation` implementation uses
  `variational inference
  <https://en.wikipedia.org/wiki/Variational_Bayesian_methods>`_ to sample from
  a tractable approximation of a topic model's posterior distribution.)

- `Sparse Filtering <https://github.com/jmetzen/sparse-filtering>`_
  Unsupervised feature learning based on sparse-filtering

- `kmodes <https://github.com/nicodv/kmodes>`_ k-modes clustering algorithm for
  categorical data, and several of its variations.

- `hdbscan <https://github.com/scikit-learn-contrib/hdbscan>`_ HDBSCAN and Robust Single
  Linkage clustering algorithms for robust variable density clustering.

- `spherecluster <https://github.com/clara-labs/spherecluster>`_ Spherical
  K-means and mixture of von Mises Fisher clustering routines for data on the
  unit hypersphere.

**Pre-processing**

- `categorical-encoding
  <https://github.com/scikit-learn-contrib/categorical-encoding>`_ A
  library of sklearn compatible categorical variable encoders.

- `imbalanced-learn
  <https://github.com/scikit-learn-contrib/imbalanced-learn>`_ Various
  methods to under- and over-sample datasets.

Statistical learning with Python
--------------------------------
Other packages useful for data analysis and machine learning.

- `Pandas <https://pandas.pydata.org/>`_ Tools for working with heterogeneous and
  columnar data, relational queries, time series and basic statistics.

- `theano <http://deeplearning.net/software/theano/>`_ A CPU/GPU array
  processing framework geared towards deep learning research.

- `statsmodels <https://www.statsmodels.org>`_ Estimating and analysing
  statistical models. More focused on statistical tests and less on prediction
  than scikit-learn.

- `PyMC <https://pymc-devs.github.io/pymc/>`_ Bayesian statistical models and
  fitting algorithms.

- `Sacred <https://github.com/IDSIA/Sacred>`_ Tool to help you configure,
  organize, log and reproduce experiments

- `Seaborn <http://stanford.edu/~mwaskom/software/seaborn/>`_ Visualization library based on
  matplotlib. It provides a high-level interface for drawing attractive statistical graphics.

- `Deep Learning <http://deeplearning.net/software_links/>`_ A curated list of deep learning
  software libraries.


Domain specific packages
~~~~~~~~~~~~~~~~~~~~~~~~

- `scikit-image <https://scikit-image.org/>`_ Image processing and computer
  vision in python.

- `Natural language toolkit (nltk) <https://www.nltk.org/>`_ Natural language
  processing and some machine learning.

- `gensim <https://radimrehurek.com/gensim/>`_  A library for topic modelling,
  document indexing and similarity retrieval

- `NiLearn <https://nilearn.github.io/>`_ Machine learning for neuro-imaging.

- `AstroML <https://www.astroml.org/>`_  Machine learning for astronomy.

- `MSMBuilder <http://msmbuilder.org/>`_  Machine learning for protein
  conformational dynamics time series.

- `scikit-surprise <https://surpriselib.com/>`_ A scikit for building and
  evaluating recommender systems.

Snippets and tidbits
---------------------

The `wiki <https://github.com/scikit-learn/scikit-learn/wiki/Third-party-projects-and-code-snippets>`_ has more!