File: related.rst

package info (click to toggle)
statsmodels 0.4.2-1.2
  • links: PTS, VCS
  • area: main
  • in suites: jessie, jessie-kfreebsd
  • size: 19,676 kB
  • ctags: 10,337
  • sloc: python: 67,108; ansic: 300; makefile: 220; asm: 171
file content (276 lines) | stat: -rw-r--r-- 8,145 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
.. _related:

.. currentmodule:: statsmodels


Related Packages
================

These are some python packages that have a related purpose and can be
useful in combination with statsmodels. The selection in this list is
biased towards packages that might be directly useful for data handling and
statistical analysis, and towards those that have a BSD compatible license,
which implies that we are not restricted in looking at the source to learn
of different ways of implementation or of different algorithms.
The following descriptions are taken from the websites with small adjustments.



Data Handling
-------------

Scikits.timeseries
^^^^^^^^^^^^^^^^^^

http://pypi.python.org/pypi/scikits.timeseries

"Time series manipulation

The scikits.timeseries module provides classes and functions for manipulating,
reporting, and plotting time series of various frequencies. The focus is on
convenient data access and manipulation while leveraging the existing
mathematical functionality in Numpy and SciPy."

Licence: BSD
Language: Python, C, binary distributions available


*Comments*

Timeseries is based on numpys MaskedArray and is designed for handling data
with missing values. It also includes functions for statistical analysis.


Pandas
^^^^^^

http://pypi.python.org/pypi/pandas

"This project aims to provide the following
 * A set of fast NumPy-based data structures optimized for panel, time series,
   and cross-sectional data analysis.
 * A set of tools for loading such data from various sources and providing
   efficient ways to persist the data.
 * A robust statistics and econometrics library which closely integrates with
   the core data structures."

License: New BSD
Language: Python, Cython,
binary distribution available for win32-py25, but easy to build with MinGW

*Comments*

Uses statsmodels as optional dependency for statistical analysis, but has
additional statistical and econometrics algorithms that focus on panel data
analysis, mostly in the time dimension. It has several data structures that
allow dictionary access to the underlying 1, 2, or 3 dimensional arrays. It
was initially focused on a two-dimensional representation of the data, but
now also allows for different representation of three-dimensional arrays. It
allows for arbitrary axis labels, but offers also a convenient time series
class.


Tabular
^^^^^^^

http://pypi.python.org/pypi/tabular

"Tabular data container and associated convenience routines in Python

Tabular is a package of Python modules for working with tabular data. Its main
object is the tabarray class, a data structure for holding and manipulating
tabular data.

The tabarray object is based on the ndarray object from the Numerical Python
package (NumPy), and the Tabular package is built to interface well with NumPy
in general. "

License: MIT
Language: Python

*Comments*

Uses numpys structured arrays as basic building block. Focused on
spreadsheet-style operations for working with two-dimensional tables and
associated data handling and analysis.
It is instructive to read the code of tabular for working with structured
arrays.


La
^^

http://pypi.python.org/pypi/la

"Label the rows, columns, any dimension, of your NumPy arrays.

The main class of the la package is a labeled array, larry. A larry consists of
a data array and a label list. The data array is stored as a NumPy array and
the label list as a list of lists. "

License: BSD
Language: Python

*Comments*

The data handling is in intention similar to pandas but closer to working
with standard numpy ndarrays. The main addition to numpy arrays are
arbitrary labels for each axis of the array. Larry delegates to numpy
functions but does not subclass numpy's ndarrays. It also provides functions
for basic descriptive statistics.




Data Analysis
-------------

Pymc
^^^^

http://pypi.python.org/pypi/pymc

"Bayesian estimation, particularly using Markov chain Monte Carlo (MCMC), is
an increasingly relevant approach to statistical estimation.
PyMC is a python module that implements the Metropolis-Hastings algorithm
as a python class, and is extremely flexible and applicable to a large suite
of problems.""

License: MIT, Academic Free License (?)
Language: Python, C, Fortran
binary (bundle ?) installer

*Comments*
This is to some extent the modern Bayesian analog of statsmodels. It is by
far the most mature project in this group including statsmodels.


Scikits.talkbox
^^^^^^^^^^^^^^^

http://pypi.python.org/pypi/scikits.talkbox

Talkbox is set of python modules for speech/signal processing. The goal of this
toolbox is to be a sandbox for features which may end up in scipy at some
point.

License: BSD
Language: Python, C optional


*Comments*

Although specialized on speech processing, talkbox has some accessible and
useful functions for time series analysis, especially a fast implementation
for estimating AR models (with ...) and spectral density based on estimated
AR coefficients.


Nitime
^^^^^^
http://github.com/fperez/nitime

"Nitime is a library for time-series analysis of data from neuroscience experiments.

It contains a core of numerical algorithms for time-series analysis both in
the time and spectral domains, a set of container objects to represent
time-series, and auxiliary objects that expose a high level interface to the
numerical machinery and make common analysis tasks easy to express with
compact and semantically clear code."

License: BSD
Language: Python

*Comments*
Althoug focused on neuroscience, the algorithms for time series analysis are
independent of the data representation and can be used with numpy arrays.
Current focus is on spectral analysis including coherence between several
time series.


KF - Kalman Filter
^^^^^^^^^^^^^^^^^^

http://pypi.python.org/pypi/KF

"This project was started to test different avaiable tools to track mutual
funds and hedge fund using Capital Asset Pricing Model (CAPM thereafter)
introduced my Sharpe and Arbitrage Pricing Theory (APT thereafter) introduced
by Ross.
"

 * License : BSD -check
 * Language Python (requires cvxopt)


*Comments*
Very young project but with a similar, although narrower, focus as pandas
and (parts of) statsmodels. Uses Kalman Filter for rolling linear regression
and allows for equality and inequality constraints in the estimation.
Includes its own time series class, and the estimation seems (?) to depend on
it.



Domain-specific Data Analysis
-----------------------------

The following packages contain interesting statistical algorithms, however
they are tightly focused on their application, and are or might be more
difficult to use "from the outside". (Descriptions are taken from websites)

Pymvpa
^^^^^^

PyMVPA is a Python module intended to ease pattern classification analyses of
large datasets
http://pymvpa.org/
License: MIT

Nipy
^^^^

Nipy aims to provide a complete Python environment for the analysis of
structural and functional neuroimaging data
http://nipy.sourceforge.net/
License: BSD

Biopython
^^^^^^^^^

Biopython is a set of tools for biological computation
http://biopython.org/wiki/Main_Page
License: http://www.biopython.org/DIST/LICENSE   similar to MIT (?))

Pysal
^^^^^

A library for exploratory spatial analysis and geocomputation
http://code.google.com/p/pysal/
License: BSD

glu-genetics
^^^^^^^^^^^^

A broad array of tools to store, clean, and analyze data generated by
whole-genome or candidate gene association scans.
http://code.google.com/p/glu-genetics/
License: BSD


Other packages
--------------

There exists a large number of machine learning packages in python, many of
them with a well established code base. Unfortunately, none of the packages
with a wider coverage of algorithms has a scipy compatible license.
A listing can be found at http://mloss.org/software/language/python/
scikits.learn includes several machine learning algorithms and is currently
undergoing a cleanup and enhancement http://pypi.python.org/pypi/scikits.learn/0.1 .

Other packages are available that provide additional functionality,
especially openopt which offers additional optimization routines compared to
the ones in scipy.