1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284
|
.. _whatsnew_0141:
Version 0.14.1 (July 11, 2014)
------------------------------
{{ header }}
This is a minor release from 0.14.0 and includes a small number of API changes, several new features,
enhancements, and performance improvements along with a large number of bug fixes. We recommend that all
users upgrade to this version.
- Highlights include:
- New methods :meth:`~pandas.DataFrame.select_dtypes` to select columns
based on the dtype and :meth:`~pandas.Series.sem` to calculate the
standard error of the mean.
- Support for dateutil timezones (see :ref:`docs <timeseries.timezone>`).
- Support for ignoring full line comments in the :func:`~pandas.read_csv`
text parser.
- New documentation section on :ref:`Options and Settings <options>`.
- Lots of bug fixes.
- :ref:`Enhancements <whatsnew_0141.enhancements>`
- :ref:`API Changes <whatsnew_0141.api>`
- :ref:`Performance Improvements <whatsnew_0141.performance>`
- :ref:`Experimental Changes <whatsnew_0141.experimental>`
- :ref:`Bug Fixes <whatsnew_0141.bug_fixes>`
.. _whatsnew_0141.api:
API changes
~~~~~~~~~~~
- Openpyxl now raises a ValueError on construction of the openpyxl writer
instead of warning on pandas import (:issue:`7284`).
- For ``StringMethods.extract``, when no match is found, the result - only
containing ``NaN`` values - now also has ``dtype=object`` instead of
``float`` (:issue:`7242`)
- ``Period`` objects no longer raise a ``TypeError`` when compared using ``==``
with another object that *isn't* a ``Period``. Instead
when comparing a ``Period`` with another object using ``==`` if the other
object isn't a ``Period`` ``False`` is returned. (:issue:`7376`)
- Previously, the behaviour on resetting the time or not in
``offsets.apply``, ``rollforward`` and ``rollback`` operations differed
between offsets. With the support of the ``normalize`` keyword for all offsets(see
below) with a default value of False (preserve time), the behaviour changed for certain
offsets (BusinessMonthBegin, MonthEnd, BusinessMonthEnd, CustomBusinessMonthEnd,
BusinessYearBegin, LastWeekOfMonth, FY5253Quarter, LastWeekOfMonth, Easter):
.. code-block:: ipython
In [6]: from pandas.tseries import offsets
In [7]: d = pd.Timestamp('2014-01-01 09:00')
# old behaviour < 0.14.1
In [8]: d + offsets.MonthEnd()
Out[8]: pd.Timestamp('2014-01-31 00:00:00')
Starting from 0.14.1 all offsets preserve time by default. The old
behaviour can be obtained with ``normalize=True``
.. ipython:: python
:suppress:
import pandas.tseries.offsets as offsets
d = pd.Timestamp("2014-01-01 09:00")
.. ipython:: python
# new behaviour
d + offsets.MonthEnd()
d + offsets.MonthEnd(normalize=True)
Note that for the other offsets the default behaviour did not change.
- Add back ``#N/A N/A`` as a default NA value in text parsing, (regression from 0.12) (:issue:`5521`)
- Raise a ``TypeError`` on inplace-setting with a ``.where`` and a non ``np.nan`` value as this is inconsistent
with a set-item expression like ``df[mask] = None`` (:issue:`7656`)
.. _whatsnew_0141.enhancements:
Enhancements
~~~~~~~~~~~~
- Add ``dropna`` argument to ``value_counts`` and ``nunique`` (:issue:`5569`).
- Add :meth:`~pandas.DataFrame.select_dtypes` method to allow selection of
columns based on dtype (:issue:`7316`). See :ref:`the docs <basics.selectdtypes>`.
- All ``offsets`` supports the ``normalize`` keyword to specify whether
``offsets.apply``, ``rollforward`` and ``rollback`` resets the time (hour,
minute, etc) or not (default ``False``, preserves time) (:issue:`7156`):
.. code-block:: python
import pandas.tseries.offsets as offsets
day = offsets.Day()
day.apply(pd.Timestamp("2014-01-01 09:00"))
day = offsets.Day(normalize=True)
day.apply(pd.Timestamp("2014-01-01 09:00"))
- ``PeriodIndex`` is represented as the same format as ``DatetimeIndex`` (:issue:`7601`)
- ``StringMethods`` now work on empty Series (:issue:`7242`)
- The file parsers ``read_csv`` and ``read_table`` now ignore line comments provided by
the parameter ``comment``, which accepts only a single character for the C reader.
In particular, they allow for comments before file data begins (:issue:`2685`)
- Add ``NotImplementedError`` for simultaneous use of ``chunksize`` and ``nrows``
for read_csv() (:issue:`6774`).
- Tests for basic reading of public S3 buckets now exist (:issue:`7281`).
- ``read_html`` now sports an ``encoding`` argument that is passed to the
underlying parser library. You can use this to read non-ascii encoded web
pages (:issue:`7323`).
- ``read_excel`` now supports reading from URLs in the same way
that ``read_csv`` does. (:issue:`6809`)
- Support for dateutil timezones, which can now be used in the same way as
pytz timezones across pandas. (:issue:`4688`)
.. ipython:: python
rng = pd.date_range(
"3/6/2012 00:00", periods=10, freq="D", tz="dateutil/Europe/London"
)
rng.tz
See :ref:`the docs <timeseries.timezone>`.
- Implemented ``sem`` (standard error of the mean) operation for ``Series``,
``DataFrame``, ``Panel``, and ``Groupby`` (:issue:`6897`)
- Add ``nlargest`` and ``nsmallest`` to the ``Series`` ``groupby`` allowlist,
which means you can now use these methods on a ``SeriesGroupBy`` object
(:issue:`7053`).
- All offsets ``apply``, ``rollforward`` and ``rollback`` can now handle ``np.datetime64``, previously results in ``ApplyTypeError`` (:issue:`7452`)
- ``Period`` and ``PeriodIndex`` can contain ``NaT`` in its values (:issue:`7485`)
- Support pickling ``Series``, ``DataFrame`` and ``Panel`` objects with
non-unique labels along *item* axis (``index``, ``columns`` and ``items``
respectively) (:issue:`7370`).
- Improved inference of datetime/timedelta with mixed null objects. Regression from 0.13.1 in interpretation of an object Index
with all null elements (:issue:`7431`)
.. _whatsnew_0141.performance:
Performance
~~~~~~~~~~~
- Improvements in dtype inference for numeric operations involving yielding performance gains for dtypes: ``int64``, ``timedelta64``, ``datetime64`` (:issue:`7223`)
- Improvements in Series.transform for significant performance gains (:issue:`6496`)
- Improvements in DataFrame.transform with ufuncs and built-in grouper functions for significant performance gains (:issue:`7383`)
- Regression in groupby aggregation of datetime64 dtypes (:issue:`7555`)
- Improvements in ``MultiIndex.from_product`` for large iterables (:issue:`7627`)
.. _whatsnew_0141.experimental:
Experimental
~~~~~~~~~~~~
- ``pandas.io.data.Options`` has a new method, ``get_all_data`` method, and now consistently returns a
MultiIndexed ``DataFrame`` (:issue:`5602`)
- ``io.gbq.read_gbq`` and ``io.gbq.to_gbq`` were refactored to remove the
dependency on the Google ``bq.py`` command line client. This submodule
now uses ``httplib2`` and the Google ``apiclient`` and ``oauth2client`` API client
libraries which should be more stable and, therefore, reliable than
``bq.py``. See :ref:`the docs <io.bigquery>`. (:issue:`6937`).
.. _whatsnew_0141.bug_fixes:
Bug fixes
~~~~~~~~~
- Bug in ``DataFrame.where`` with a symmetric shaped frame and a passed other of a DataFrame (:issue:`7506`)
- Bug in Panel indexing with a MultiIndex axis (:issue:`7516`)
- Regression in datetimelike slice indexing with a duplicated index and non-exact end-points (:issue:`7523`)
- Bug in setitem with list-of-lists and single vs mixed types (:issue:`7551`:)
- Bug in time ops with non-aligned Series (:issue:`7500`)
- Bug in timedelta inference when assigning an incomplete Series (:issue:`7592`)
- Bug in groupby ``.nth`` with a Series and integer-like column name (:issue:`7559`)
- Bug in ``Series.get`` with a boolean accessor (:issue:`7407`)
- Bug in ``value_counts`` where ``NaT`` did not qualify as missing (``NaN``) (:issue:`7423`)
- Bug in ``to_timedelta`` that accepted invalid units and misinterpreted 'm/h' (:issue:`7611`, :issue:`6423`)
- Bug in line plot doesn't set correct ``xlim`` if ``secondary_y=True`` (:issue:`7459`)
- Bug in grouped ``hist`` and ``scatter`` plots use old ``figsize`` default (:issue:`7394`)
- Bug in plotting subplots with ``DataFrame.plot``, ``hist`` clears passed ``ax`` even if the number of subplots is one (:issue:`7391`).
- Bug in plotting subplots with ``DataFrame.boxplot`` with ``by`` kw raises ``ValueError`` if the number of subplots exceeds 1 (:issue:`7391`).
- Bug in subplots displays ``ticklabels`` and ``labels`` in different rule (:issue:`5897`)
- Bug in ``Panel.apply`` with a MultiIndex as an axis (:issue:`7469`)
- Bug in ``DatetimeIndex.insert`` doesn't preserve ``name`` and ``tz`` (:issue:`7299`)
- Bug in ``DatetimeIndex.asobject`` doesn't preserve ``name`` (:issue:`7299`)
- Bug in MultiIndex slicing with datetimelike ranges (strings and Timestamps), (:issue:`7429`)
- Bug in ``Index.min`` and ``max`` doesn't handle ``nan`` and ``NaT`` properly (:issue:`7261`)
- Bug in ``PeriodIndex.min/max`` results in ``int`` (:issue:`7609`)
- Bug in ``resample`` where ``fill_method`` was ignored if you passed ``how`` (:issue:`2073`)
- Bug in ``TimeGrouper`` doesn't exclude column specified by ``key`` (:issue:`7227`)
- Bug in ``DataFrame`` and ``Series`` bar and barh plot raises ``TypeError`` when ``bottom``
and ``left`` keyword is specified (:issue:`7226`)
- Bug in ``DataFrame.hist`` raises ``TypeError`` when it contains non numeric column (:issue:`7277`)
- Bug in ``Index.delete`` does not preserve ``name`` and ``freq`` attributes (:issue:`7302`)
- Bug in ``DataFrame.query()``/``eval`` where local string variables with the @
sign were being treated as temporaries attempting to be deleted
(:issue:`7300`).
- Bug in ``Float64Index`` which didn't allow duplicates (:issue:`7149`).
- Bug in ``DataFrame.replace()`` where truthy values were being replaced
(:issue:`7140`).
- Bug in ``StringMethods.extract()`` where a single match group Series
would use the matcher's name instead of the group name (:issue:`7313`).
- Bug in ``isnull()`` when ``mode.use_inf_as_null == True`` where isnull
wouldn't test ``True`` when it encountered an ``inf``/``-inf``
(:issue:`7315`).
- Bug in inferred_freq results in None for eastern hemisphere timezones (:issue:`7310`)
- Bug in ``Easter`` returns incorrect date when offset is negative (:issue:`7195`)
- Bug in broadcasting with ``.div``, integer dtypes and divide-by-zero (:issue:`7325`)
- Bug in ``CustomBusinessDay.apply`` raises ``NameError`` when ``np.datetime64`` object is passed (:issue:`7196`)
- Bug in ``MultiIndex.append``, ``concat`` and ``pivot_table`` don't preserve timezone (:issue:`6606`)
- Bug in ``.loc`` with a list of indexers on a single-multi index level (that is not nested) (:issue:`7349`)
- Bug in ``Series.map`` when mapping a dict with tuple keys of different lengths (:issue:`7333`)
- Bug all ``StringMethods`` now work on empty Series (:issue:`7242`)
- Fix delegation of ``read_sql`` to ``read_sql_query`` when query does not contain 'select' (:issue:`7324`).
- Bug where a string column name assignment to a ``DataFrame`` with a
``Float64Index`` raised a ``TypeError`` during a call to ``np.isnan``
(:issue:`7366`).
- Bug where ``NDFrame.replace()`` didn't correctly replace objects with
``Period`` values (:issue:`7379`).
- Bug in ``.ix`` getitem should always return a Series (:issue:`7150`)
- Bug in MultiIndex slicing with incomplete indexers (:issue:`7399`)
- Bug in MultiIndex slicing with a step in a sliced level (:issue:`7400`)
- Bug where negative indexers in ``DatetimeIndex`` were not correctly sliced
(:issue:`7408`)
- Bug where ``NaT`` wasn't repr'd correctly in a ``MultiIndex`` (:issue:`7406`,
:issue:`7409`).
- Bug where bool objects were converted to ``nan`` in ``convert_objects``
(:issue:`7416`).
- Bug in ``quantile`` ignoring the axis keyword argument (:issue:`7306`)
- Bug where ``nanops._maybe_null_out`` doesn't work with complex numbers
(:issue:`7353`)
- Bug in several ``nanops`` functions when ``axis==0`` for
1-dimensional ``nan`` arrays (:issue:`7354`)
- Bug where ``nanops.nanmedian`` doesn't work when ``axis==None``
(:issue:`7352`)
- Bug where ``nanops._has_infs`` doesn't work with many dtypes
(:issue:`7357`)
- Bug in ``StataReader.data`` where reading a 0-observation dta failed (:issue:`7369`)
- Bug in ``StataReader`` when reading Stata 13 (117) files containing fixed width strings (:issue:`7360`)
- Bug in ``StataWriter`` where encoding was ignored (:issue:`7286`)
- Bug in ``DatetimeIndex`` comparison doesn't handle ``NaT`` properly (:issue:`7529`)
- Bug in passing input with ``tzinfo`` to some offsets ``apply``, ``rollforward`` or ``rollback`` resets ``tzinfo`` or raises ``ValueError`` (:issue:`7465`)
- Bug in ``DatetimeIndex.to_period``, ``PeriodIndex.asobject``, ``PeriodIndex.to_timestamp`` doesn't preserve ``name`` (:issue:`7485`)
- Bug in ``DatetimeIndex.to_period`` and ``PeriodIndex.to_timestamp`` handle ``NaT`` incorrectly (:issue:`7228`)
- Bug in ``offsets.apply``, ``rollforward`` and ``rollback`` may return normal ``datetime`` (:issue:`7502`)
- Bug in ``resample`` raises ``ValueError`` when target contains ``NaT`` (:issue:`7227`)
- Bug in ``Timestamp.tz_localize`` resets ``nanosecond`` info (:issue:`7534`)
- Bug in ``DatetimeIndex.asobject`` raises ``ValueError`` when it contains ``NaT`` (:issue:`7539`)
- Bug in ``Timestamp.__new__`` doesn't preserve nanosecond properly (:issue:`7610`)
- Bug in ``Index.astype(float)`` where it would return an ``object`` dtype
``Index`` (:issue:`7464`).
- Bug in ``DataFrame.reset_index`` loses ``tz`` (:issue:`3950`)
- Bug in ``DatetimeIndex.freqstr`` raises ``AttributeError`` when ``freq`` is ``None`` (:issue:`7606`)
- Bug in ``GroupBy.size`` created by ``TimeGrouper`` raises ``AttributeError`` (:issue:`7453`)
- Bug in single column bar plot is misaligned (:issue:`7498`).
- Bug in area plot with tz-aware time series raises ``ValueError`` (:issue:`7471`)
- Bug in non-monotonic ``Index.union`` may preserve ``name`` incorrectly (:issue:`7458`)
- Bug in ``DatetimeIndex.intersection`` doesn't preserve timezone (:issue:`4690`)
- Bug in ``rolling_var`` where a window larger than the array would raise an error(:issue:`7297`)
- Bug with last plotted timeseries dictating ``xlim`` (:issue:`2960`)
- Bug with ``secondary_y`` axis not being considered for timeseries ``xlim`` (:issue:`3490`)
- Bug in ``Float64Index`` assignment with a non scalar indexer (:issue:`7586`)
- Bug in ``pandas.core.strings.str_contains`` does not properly match in a case insensitive fashion when ``regex=False`` and ``case=False`` (:issue:`7505`)
- Bug in ``expanding_cov``, ``expanding_corr``, ``rolling_cov``, and ``rolling_corr`` for two arguments with mismatched index (:issue:`7512`)
- Bug in ``to_sql`` taking the boolean column as text column (:issue:`7678`)
- Bug in grouped ``hist`` doesn't handle ``rot`` kw and ``sharex`` kw properly (:issue:`7234`)
- Bug in ``.loc`` performing fallback integer indexing with ``object`` dtype indices (:issue:`7496`)
- Bug (regression) in ``PeriodIndex`` constructor when passed ``Series`` objects (:issue:`7701`).
.. _whatsnew_0.14.1.contributors:
Contributors
~~~~~~~~~~~~
.. contributors:: v0.14.0..v0.14.1
|