1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276
|
.. _related:
.. currentmodule:: statsmodels
Related Packages
================
These are some python packages that have a related purpose and can be
useful in combination with statsmodels. The selection in this list is
biased towards packages that might be directly useful for data handling and
statistical analysis, and towards those that have a BSD compatible license,
which implies that we are not restricted in looking at the source to learn
of different ways of implementation or of different algorithms.
The following descriptions are taken from the websites with small adjustments.
Data Handling
-------------
Scikits.timeseries
^^^^^^^^^^^^^^^^^^
http://pypi.python.org/pypi/scikits.timeseries
"Time series manipulation
The scikits.timeseries module provides classes and functions for manipulating,
reporting, and plotting time series of various frequencies. The focus is on
convenient data access and manipulation while leveraging the existing
mathematical functionality in Numpy and SciPy."
Licence: BSD
Language: Python, C, binary distributions available
*Comments*
Timeseries is based on numpys MaskedArray and is designed for handling data
with missing values. It also includes functions for statistical analysis.
Pandas
^^^^^^
http://pypi.python.org/pypi/pandas
"This project aims to provide the following
* A set of fast NumPy-based data structures optimized for panel, time series,
and cross-sectional data analysis.
* A set of tools for loading such data from various sources and providing
efficient ways to persist the data.
* A robust statistics and econometrics library which closely integrates with
the core data structures."
License: New BSD
Language: Python, Cython,
binary distribution available for win32-py25, but easy to build with MinGW
*Comments*
Uses statsmodels as optional dependency for statistical analysis, but has
additional statistical and econometrics algorithms that focus on panel data
analysis, mostly in the time dimension. It has several data structures that
allow dictionary access to the underlying 1, 2, or 3 dimensional arrays. It
was initially focused on a two-dimensional representation of the data, but
now also allows for different representation of three-dimensional arrays. It
allows for arbitrary axis labels, but offers also a convenient time series
class.
Tabular
^^^^^^^
http://pypi.python.org/pypi/tabular
"Tabular data container and associated convenience routines in Python
Tabular is a package of Python modules for working with tabular data. Its main
object is the tabarray class, a data structure for holding and manipulating
tabular data.
The tabarray object is based on the ndarray object from the Numerical Python
package (NumPy), and the Tabular package is built to interface well with NumPy
in general. "
License: MIT
Language: Python
*Comments*
Uses numpys structured arrays as basic building block. Focused on
spreadsheet-style operations for working with two-dimensional tables and
associated data handling and analysis.
It is instructive to read the code of tabular for working with structured
arrays.
La
^^
http://pypi.python.org/pypi/la
"Label the rows, columns, any dimension, of your NumPy arrays.
The main class of the la package is a labeled array, larry. A larry consists of
a data array and a label list. The data array is stored as a NumPy array and
the label list as a list of lists. "
License: BSD
Language: Python
*Comments*
The data handling is in intention similar to pandas but closer to working
with standard numpy ndarrays. The main addition to numpy arrays are
arbitrary labels for each axis of the array. Larry delegates to numpy
functions but does not subclass numpy's ndarrays. It also provides functions
for basic descriptive statistics.
Data Analysis
-------------
Pymc
^^^^
http://pypi.python.org/pypi/pymc
"Bayesian estimation, particularly using Markov chain Monte Carlo (MCMC), is
an increasingly relevant approach to statistical estimation.
PyMC is a python module that implements the Metropolis-Hastings algorithm
as a python class, and is extremely flexible and applicable to a large suite
of problems.""
License: MIT, Academic Free License (?)
Language: Python, C, Fortran
binary (bundle ?) installer
*Comments*
This is to some extent the modern Bayesian analog of statsmodels. It is by
far the most mature project in this group including statsmodels.
Scikits.talkbox
^^^^^^^^^^^^^^^
http://pypi.python.org/pypi/scikits.talkbox
Talkbox is set of python modules for speech/signal processing. The goal of this
toolbox is to be a sandbox for features which may end up in scipy at some
point.
License: BSD
Language: Python, C optional
*Comments*
Although specialized on speech processing, talkbox has some accessible and
useful functions for time series analysis, especially a fast implementation
for estimating AR models (with ...) and spectral density based on estimated
AR coefficients.
Nitime
^^^^^^
http://github.com/fperez/nitime
"Nitime is a library for time-series analysis of data from neuroscience experiments.
It contains a core of numerical algorithms for time-series analysis both in
the time and spectral domains, a set of container objects to represent
time-series, and auxiliary objects that expose a high level interface to the
numerical machinery and make common analysis tasks easy to express with
compact and semantically clear code."
License: BSD
Language: Python
*Comments*
Althoug focused on neuroscience, the algorithms for time series analysis are
independent of the data representation and can be used with numpy arrays.
Current focus is on spectral analysis including coherence between several
time series.
KF - Kalman Filter
^^^^^^^^^^^^^^^^^^
http://pypi.python.org/pypi/KF
"This project was started to test different avaiable tools to track mutual
funds and hedge fund using Capital Asset Pricing Model (CAPM thereafter)
introduced my Sharpe and Arbitrage Pricing Theory (APT thereafter) introduced
by Ross.
"
* License : BSD -check
* Language Python (requires cvxopt)
*Comments*
Very young project but with a similar, although narrower, focus as pandas
and (parts of) statsmodels. Uses Kalman Filter for rolling linear regression
and allows for equality and inequality constraints in the estimation.
Includes its own time series class, and the estimation seems (?) to depend on
it.
Domain-specific Data Analysis
-----------------------------
The following packages contain interesting statistical algorithms, however
they are tightly focused on their application, and are or might be more
difficult to use "from the outside". (Descriptions are taken from websites)
Pymvpa
^^^^^^
PyMVPA is a Python module intended to ease pattern classification analyses of
large datasets
http://pymvpa.org/
License: MIT
Nipy
^^^^
Nipy aims to provide a complete Python environment for the analysis of
structural and functional neuroimaging data
http://nipy.sourceforge.net/
License: BSD
Biopython
^^^^^^^^^
Biopython is a set of tools for biological computation
http://biopython.org/wiki/Main_Page
License: http://www.biopython.org/DIST/LICENSE similar to MIT (?))
Pysal
^^^^^
A library for exploratory spatial analysis and geocomputation
http://code.google.com/p/pysal/
License: BSD
glu-genetics
^^^^^^^^^^^^
A broad array of tools to store, clean, and analyze data generated by
whole-genome or candidate gene association scans.
http://code.google.com/p/glu-genetics/
License: BSD
Other packages
--------------
There exists a large number of machine learning packages in python, many of
them with a well established code base. Unfortunately, none of the packages
with a wider coverage of algorithms has a scipy compatible license.
A listing can be found at http://mloss.org/software/language/python/
scikits.learn includes several machine learning algorithms and is currently
undergoing a cleanup and enhancement http://pypi.python.org/pypi/scikits.learn/0.1 .
Other packages are available that provide additional functionality,
especially openopt which offers additional optimization routines compared to
the ones in scipy.
|