1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215
|
.. _whatsnew_0162:
Version 0.16.2 (June 12, 2015)
------------------------------
{{ header }}
This is a minor bug-fix release from 0.16.1 and includes a large number of
bug fixes along some new features (:meth:`~DataFrame.pipe` method), enhancements, and performance improvements.
We recommend that all users upgrade to this version.
Highlights include:
- A new ``pipe`` method, see :ref:`here <whatsnew_0162.enhancements.pipe>`
- Documentation on how to use numba_ with *pandas*, see :ref:`here <enhancingperf.numba>`
.. contents:: What's new in v0.16.2
:local:
:backlinks: none
.. _numba: http://numba.pydata.org
.. _whatsnew_0162.enhancements:
New features
~~~~~~~~~~~~
.. _whatsnew_0162.enhancements.pipe:
Pipe
^^^^
We've introduced a new method :meth:`DataFrame.pipe`. As suggested by the name, ``pipe``
should be used to pipe data through a chain of function calls.
The goal is to avoid confusing nested function calls like
.. code-block:: python
# df is a DataFrame
# f, g, and h are functions that take and return DataFrames
f(g(h(df), arg1=1), arg2=2, arg3=3) # noqa F821
The logic flows from inside out, and function names are separated from their keyword arguments.
This can be rewritten as
.. code-block:: python
(
df.pipe(h) # noqa F821
.pipe(g, arg1=1) # noqa F821
.pipe(f, arg2=2, arg3=3) # noqa F821
)
Now both the code and the logic flow from top to bottom. Keyword arguments are next to
their functions. Overall the code is much more readable.
In the example above, the functions ``f``, ``g``, and ``h`` each expected the DataFrame as the first positional argument.
When the function you wish to apply takes its data anywhere other than the first argument, pass a tuple
of ``(function, keyword)`` indicating where the DataFrame should flow. For example:
.. code-block:: ipython
In [1]: import statsmodels.formula.api as sm
In [2]: bb = pd.read_csv("data/baseball.csv", index_col="id")
# sm.ols takes (formula, data)
In [3]: (
...: bb.query("h > 0")
...: .assign(ln_h=lambda df: np.log(df.h))
...: .pipe((sm.ols, "data"), "hr ~ ln_h + year + g + C(lg)")
...: .fit()
...: .summary()
...: )
...:
Out[3]:
<class 'statsmodels.iolib.summary.Summary'>
"""
OLS Regression Results
==============================================================================
Dep. Variable: hr R-squared: 0.685
Model: OLS Adj. R-squared: 0.665
Method: Least Squares F-statistic: 34.28
Date: Tue, 22 Nov 2022 Prob (F-statistic): 3.48e-15
Time: 05:35:23 Log-Likelihood: -205.92
No. Observations: 68 AIC: 421.8
Df Residuals: 63 BIC: 432.9
Df Model: 4
Covariance Type: nonrobust
===============================================================================
coef std err t P>|t| [0.025 0.975]
-------------------------------------------------------------------------------
Intercept -8484.7720 4664.146 -1.819 0.074 -1.78e+04 835.780
C(lg)[T.NL] -2.2736 1.325 -1.716 0.091 -4.922 0.375
ln_h -1.3542 0.875 -1.547 0.127 -3.103 0.395
year 4.2277 2.324 1.819 0.074 -0.417 8.872
g 0.1841 0.029 6.258 0.000 0.125 0.243
==============================================================================
Omnibus: 10.875 Durbin-Watson: 1.999
Prob(Omnibus): 0.004 Jarque-Bera (JB): 17.298
Skew: 0.537 Prob(JB): 0.000175
Kurtosis: 5.225 Cond. No. 1.49e+07
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.49e+07. This might indicate that there are
strong multicollinearity or other numerical problems.
"""
The pipe method is inspired by unix pipes, which stream text through
processes. More recently dplyr_ and magrittr_ have introduced the
popular ``(%>%)`` pipe operator for R_.
See the :ref:`documentation <basics.pipe>` for more. (:issue:`10129`)
.. _dplyr: https://github.com/tidyverse/dplyr
.. _magrittr: https://github.com/smbache/magrittr
.. _R: http://www.r-project.org
.. _whatsnew_0162.enhancements.other:
Other enhancements
^^^^^^^^^^^^^^^^^^
- Added ``rsplit`` to Index/Series StringMethods (:issue:`10303`)
- Removed the hard-coded size limits on the ``DataFrame`` HTML representation
in the IPython notebook, and leave this to IPython itself (only for IPython
v3.0 or greater). This eliminates the duplicate scroll bars that appeared in
the notebook with large frames (:issue:`10231`).
Note that the notebook has a ``toggle output scrolling`` feature to limit the
display of very large frames (by clicking left of the output).
You can also configure the way DataFrames are displayed using the pandas
options, see here :ref:`here <options.frequently_used>`.
- ``axis`` parameter of ``DataFrame.quantile`` now accepts also ``index`` and ``column``. (:issue:`9543`)
.. _whatsnew_0162.api:
API changes
~~~~~~~~~~~
- ``Holiday`` now raises ``NotImplementedError`` if both ``offset`` and ``observance`` are used in the constructor instead of returning an incorrect result (:issue:`10217`).
.. _whatsnew_0162.performance:
Performance improvements
~~~~~~~~~~~~~~~~~~~~~~~~
- Improved ``Series.resample`` performance with ``dtype=datetime64[ns]`` (:issue:`7754`)
- Increase performance of ``str.split`` when ``expand=True`` (:issue:`10081`)
.. _whatsnew_0162.bug_fixes:
Bug fixes
~~~~~~~~~
- Bug in ``Series.hist`` raises an error when a one row ``Series`` was given (:issue:`10214`)
- Bug where ``HDFStore.select`` modifies the passed columns list (:issue:`7212`)
- Bug in ``Categorical`` repr with ``display.width`` of ``None`` in Python 3 (:issue:`10087`)
- Bug in ``to_json`` with certain orients and a ``CategoricalIndex`` would segfault (:issue:`10317`)
- Bug where some of the nan functions do not have consistent return dtypes (:issue:`10251`)
- Bug in ``DataFrame.quantile`` on checking that a valid axis was passed (:issue:`9543`)
- Bug in ``groupby.apply`` aggregation for ``Categorical`` not preserving categories (:issue:`10138`)
- Bug in ``to_csv`` where ``date_format`` is ignored if the ``datetime`` is fractional (:issue:`10209`)
- Bug in ``DataFrame.to_json`` with mixed data types (:issue:`10289`)
- Bug in cache updating when consolidating (:issue:`10264`)
- Bug in ``mean()`` where integer dtypes can overflow (:issue:`10172`)
- Bug where ``Panel.from_dict`` does not set dtype when specified (:issue:`10058`)
- Bug in ``Index.union`` raises ``AttributeError`` when passing array-likes. (:issue:`10149`)
- Bug in ``Timestamp``'s' ``microsecond``, ``quarter``, ``dayofyear``, ``week`` and ``daysinmonth`` properties return ``np.int`` type, not built-in ``int``. (:issue:`10050`)
- Bug in ``NaT`` raises ``AttributeError`` when accessing to ``daysinmonth``, ``dayofweek`` properties. (:issue:`10096`)
- Bug in Index repr when using the ``max_seq_items=None`` setting (:issue:`10182`).
- Bug in getting timezone data with ``dateutil`` on various platforms ( :issue:`9059`, :issue:`8639`, :issue:`9663`, :issue:`10121`)
- Bug in displaying datetimes with mixed frequencies; display 'ms' datetimes to the proper precision. (:issue:`10170`)
- Bug in ``setitem`` where type promotion is applied to the entire block (:issue:`10280`)
- Bug in ``Series`` arithmetic methods may incorrectly hold names (:issue:`10068`)
- Bug in ``GroupBy.get_group`` when grouping on multiple keys, one of which is categorical. (:issue:`10132`)
- Bug in ``DatetimeIndex`` and ``TimedeltaIndex`` names are lost after timedelta arithmetic ( :issue:`9926`)
- Bug in ``DataFrame`` construction from nested ``dict`` with ``datetime64`` (:issue:`10160`)
- Bug in ``Series`` construction from ``dict`` with ``datetime64`` keys (:issue:`9456`)
- Bug in ``Series.plot(label="LABEL")`` not correctly setting the label (:issue:`10119`)
- Bug in ``plot`` not defaulting to matplotlib ``axes.grid`` setting (:issue:`9792`)
- Bug causing strings containing an exponent, but no decimal to be parsed as ``int`` instead of ``float`` in ``engine='python'`` for the ``read_csv`` parser (:issue:`9565`)
- Bug in ``Series.align`` resets ``name`` when ``fill_value`` is specified (:issue:`10067`)
- Bug in ``read_csv`` causing index name not to be set on an empty DataFrame (:issue:`10184`)
- Bug in ``SparseSeries.abs`` resets ``name`` (:issue:`10241`)
- Bug in ``TimedeltaIndex`` slicing may reset freq (:issue:`10292`)
- Bug in ``GroupBy.get_group`` raises ``ValueError`` when group key contains ``NaT`` (:issue:`6992`)
- Bug in ``SparseSeries`` constructor ignores input data name (:issue:`10258`)
- Bug in ``Categorical.remove_categories`` causing a ``ValueError`` when removing the ``NaN`` category if underlying dtype is floating-point (:issue:`10156`)
- Bug where infer_freq infers time rule (WOM-5XXX) unsupported by to_offset (:issue:`9425`)
- Bug in ``DataFrame.to_hdf()`` where table format would raise a seemingly unrelated error for invalid (non-string) column names. This is now explicitly forbidden. (:issue:`9057`)
- Bug to handle masking empty ``DataFrame`` (:issue:`10126`).
- Bug where MySQL interface could not handle numeric table/column names (:issue:`10255`)
- Bug in ``read_csv`` with a ``date_parser`` that returned a ``datetime64`` array of other time resolution than ``[ns]`` (:issue:`10245`)
- Bug in ``Panel.apply`` when the result has ndim=0 (:issue:`10332`)
- Bug in ``read_hdf`` where ``auto_close`` could not be passed (:issue:`9327`).
- Bug in ``read_hdf`` where open stores could not be used (:issue:`10330`).
- Bug in adding empty ``DataFrames``, now results in a ``DataFrame`` that ``.equals`` an empty ``DataFrame`` (:issue:`10181`).
- Bug in ``to_hdf`` and ``HDFStore`` which did not check that complib choices were valid (:issue:`4582`, :issue:`8874`).
.. _whatsnew_0.16.2.contributors:
Contributors
~~~~~~~~~~~~
.. contributors:: v0.16.1..v0.16.2
|