1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988
|
.. _whatsnew_0130:
v0.13.0 (January 3, 2014)
---------------------------
This is a major release from 0.12.0 and includes a number of API changes, several new features and
enhancements along with a large number of bug fixes.
Highlights include:
- support for a new index type ``Float64Index``, and other Indexing enhancements
- ``HDFStore`` has a new string based syntax for query specification
- support for new methods of interpolation
- updated ``timedelta`` operations
- a new string manipulation method ``extract``
- Nanosecond support for Offsets
- ``isin`` for DataFrames
Several experimental features are added, including:
- new ``eval/query`` methods for expression evaluation
- support for ``msgpack`` serialization
- an i/o interface to Google's ``BigQuery``
Their are several new or updated docs sections including:
- :ref:`Comparison with SQL<compare_with_sql>`, which should be useful for those familiar with SQL but still learning pandas.
- :ref:`Comparison with R<compare_with_r>`, idiom translations from R to pandas.
- :ref:`Enhancing Performance<enhancingperf>`, ways to enhance pandas performance with ``eval/query``.
.. warning::
In 0.13.0 ``Series`` has internally been refactored to no longer sub-class ``ndarray``
but instead subclass ``NDFrame``, similar to the rest of the pandas containers. This should be
a transparent change with only very limited API implications. See :ref:`Internal Refactoring<whatsnew_0130.refactoring>`
API changes
~~~~~~~~~~~
- ``read_excel`` now supports an integer in its ``sheetname`` argument giving
the index of the sheet to read in (:issue:`4301`).
- Text parser now treats anything that reads like inf ("inf", "Inf", "-Inf",
"iNf", etc.) as infinity. (:issue:`4220`, :issue:`4219`), affecting
``read_table``, ``read_csv``, etc.
- ``pandas`` now is Python 2/3 compatible without the need for 2to3 thanks to
@jtratner. As a result, pandas now uses iterators more extensively. This
also led to the introduction of substantive parts of the Benjamin
Peterson's ``six`` library into compat. (:issue:`4384`, :issue:`4375`,
:issue:`4372`)
- ``pandas.util.compat`` and ``pandas.util.py3compat`` have been merged into
``pandas.compat``. ``pandas.compat`` now includes many functions allowing
2/3 compatibility. It contains both list and iterator versions of range,
filter, map and zip, plus other necessary elements for Python 3
compatibility. ``lmap``, ``lzip``, ``lrange`` and ``lfilter`` all produce
lists instead of iterators, for compatibility with ``numpy``, subscripting
and ``pandas`` constructors.(:issue:`4384`, :issue:`4375`, :issue:`4372`)
- ``Series.get`` with negative indexers now returns the same as ``[]`` (:issue:`4390`)
- Changes to how ``Index`` and ``MultiIndex`` handle metadata (``levels``,
``labels``, and ``names``) (:issue:`4039`):
.. code-block:: python
# previously, you would have set levels or labels directly
index.levels = [[1, 2, 3, 4], [1, 2, 4, 4]]
# now, you use the set_levels or set_labels methods
index = index.set_levels([[1, 2, 3, 4], [1, 2, 4, 4]])
# similarly, for names, you can rename the object
# but setting names is not deprecated
index = index.set_names(["bob", "cranberry"])
# and all methods take an inplace kwarg - but return None
index.set_names(["bob", "cranberry"], inplace=True)
- **All** division with ``NDFrame`` objects is now *truedivision*, regardless
of the future import. This means that operating on pandas objects will by default
use *floating point* division, and return a floating point dtype.
You can use ``//`` and ``floordiv`` to do integer division.
Integer division
.. code-block:: ipython
In [3]: arr = np.array([1, 2, 3, 4])
In [4]: arr2 = np.array([5, 3, 2, 1])
In [5]: arr / arr2
Out[5]: array([0, 0, 1, 4])
In [6]: Series(arr) // Series(arr2)
Out[6]:
0 0
1 0
2 1
3 4
dtype: int64
True Division
.. code-block:: ipython
In [7]: pd.Series(arr) / pd.Series(arr2) # no future import required
Out[7]:
0 0.200000
1 0.666667
2 1.500000
3 4.000000
dtype: float64
- Infer and downcast dtype if ``downcast='infer'`` is passed to ``fillna/ffill/bfill`` (:issue:`4604`)
- ``__nonzero__`` for all NDFrame objects, will now raise a ``ValueError``, this reverts back to (:issue:`1073`, :issue:`4633`)
behavior. See :ref:`gotchas<gotchas.truth>` for a more detailed discussion.
This prevents doing boolean comparison on *entire* pandas objects, which is inherently ambiguous. These all will raise a ``ValueError``.
.. code-block:: python
if df:
....
df1 and df2
s1 and s2
Added the ``.bool()`` method to ``NDFrame`` objects to facilitate evaluating of single-element boolean Series:
.. ipython:: python
Series([True]).bool()
Series([False]).bool()
DataFrame([[True]]).bool()
DataFrame([[False]]).bool()
- All non-Index NDFrames (``Series``, ``DataFrame``, ``Panel``, ``Panel4D``,
``SparsePanel``, etc.), now support the entire set of arithmetic operators
and arithmetic flex methods (add, sub, mul, etc.). ``SparsePanel`` does not
support ``pow`` or ``mod`` with non-scalars. (:issue:`3765`)
- ``Series`` and ``DataFrame`` now have a ``mode()`` method to calculate the
statistical mode(s) by axis/Series. (:issue:`5367`)
- Chained assignment will now by default warn if the user is assigning to a copy. This can be changed
with the option ``mode.chained_assignment``, allowed options are ``raise/warn/None``. See :ref:`the docs<indexing.view_versus_copy>`.
.. ipython:: python
dfc = DataFrame({'A':['aaa','bbb','ccc'],'B':[1,2,3]})
pd.set_option('chained_assignment','warn')
The following warning / exception will show if this is attempted.
.. ipython:: python
:okwarning:
dfc.loc[0]['A'] = 1111
::
Traceback (most recent call last)
...
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
Here is the correct method of assignment.
.. ipython:: python
dfc.loc[0,'A'] = 11
dfc
- ``Panel.reindex`` has the following call signature ``Panel.reindex(items=None, major_axis=None, minor_axis=None, **kwargs)``
to conform with other ``NDFrame`` objects. See :ref:`Internal Refactoring<whatsnew_0130.refactoring>` for more information.
- ``Series.argmin`` and ``Series.argmax`` are now aliased to ``Series.idxmin`` and ``Series.idxmax``. These return the *index* of the
min or max element respectively. Prior to 0.13.0 these would return the position of the min / max element. (:issue:`6214`)
Prior Version Deprecations/Changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
These were announced changes in 0.12 or prior that are taking effect as of 0.13.0
- Remove deprecated ``Factor`` (:issue:`3650`)
- Remove deprecated ``set_printoptions/reset_printoptions`` (:issue:`3046`)
- Remove deprecated ``_verbose_info`` (:issue:`3215`)
- Remove deprecated ``read_clipboard/to_clipboard/ExcelFile/ExcelWriter`` from ``pandas.io.parsers`` (:issue:`3717`)
These are available as functions in the main pandas namespace (e.g. ``pd.read_clipboard``)
- default for ``tupleize_cols`` is now ``False`` for both ``to_csv`` and ``read_csv``. Fair warning in 0.12 (:issue:`3604`)
- default for `display.max_seq_len` is now 100 rather then `None`. This activates
truncated display ("...") of long sequences in various places. (:issue:`3391`)
Deprecations
~~~~~~~~~~~~
Deprecated in 0.13.0
- deprecated ``iterkv``, which will be removed in a future release (this was
an alias of iteritems used to bypass ``2to3``'s changes).
(:issue:`4384`, :issue:`4375`, :issue:`4372`)
- deprecated the string method ``match``, whose role is now performed more
idiomatically by ``extract``. In a future release, the default behavior
of ``match`` will change to become analogous to ``contains``, which returns
a boolean indexer. (Their
distinction is strictness: ``match`` relies on ``re.match`` while
``contains`` relies on ``re.search``.) In this release, the deprecated
behavior is the default, but the new behavior is available through the
keyword argument ``as_indexer=True``.
Indexing API Changes
~~~~~~~~~~~~~~~~~~~~
Prior to 0.13, it was impossible to use a label indexer (``.loc/.ix``) to set a value that
was not contained in the index of a particular axis. (:issue:`2578`). See :ref:`the docs<indexing.basics.partial_setting>`
In the ``Series`` case this is effectively an appending operation
.. ipython:: python
s = Series([1,2,3])
s
s[5] = 5.
s
.. ipython:: python
dfi = DataFrame(np.arange(6).reshape(3,2),
columns=['A','B'])
dfi
This would previously ``KeyError``
.. ipython:: python
dfi.loc[:,'C'] = dfi.loc[:,'A']
dfi
This is like an ``append`` operation.
.. ipython:: python
dfi.loc[3] = 5
dfi
A Panel setting operation on an arbitrary axis aligns the input to the Panel
.. ipython:: python
p = pd.Panel(np.arange(16).reshape(2,4,2),
items=['Item1','Item2'],
major_axis=pd.date_range('2001/1/12',periods=4),
minor_axis=['A','B'],dtype='float64')
p
p.loc[:,:,'C'] = Series([30,32],index=p.items)
p
p.loc[:,:,'C']
Float64Index API Change
~~~~~~~~~~~~~~~~~~~~~~~
- Added a new index type, ``Float64Index``. This will be automatically created when passing floating values in index creation.
This enables a pure label-based slicing paradigm that makes ``[],ix,loc`` for scalar indexing and slicing work exactly the
same. See :ref:`the docs<indexing.float64index>`, (:issue:`263`)
Construction is by default for floating type values.
.. ipython:: python
index = Index([1.5, 2, 3, 4.5, 5])
index
s = Series(range(5),index=index)
s
Scalar selection for ``[],.ix,.loc`` will always be label based. An integer will match an equal float index (e.g. ``3`` is equivalent to ``3.0``)
.. ipython:: python
s[3]
s.ix[3]
s.loc[3]
The only positional indexing is via ``iloc``
.. ipython:: python
s.iloc[3]
A scalar index that is not found will raise ``KeyError``
Slicing is ALWAYS on the values of the index, for ``[],ix,loc`` and ALWAYS positional with ``iloc``
.. ipython:: python
s[2:4]
s.ix[2:4]
s.loc[2:4]
s.iloc[2:4]
In float indexes, slicing using floats are allowed
.. ipython:: python
s[2.1:4.6]
s.loc[2.1:4.6]
- Indexing on other index types are preserved (and positional fallback for ``[],ix``), with the exception, that floating point slicing
on indexes on non ``Float64Index`` will now raise a ``TypeError``.
.. code-block:: ipython
In [1]: Series(range(5))[3.5]
TypeError: the label [3.5] is not a proper indexer for this index type (Int64Index)
In [1]: Series(range(5))[3.5:4.5]
TypeError: the slice start [3.5] is not a proper indexer for this index type (Int64Index)
Using a scalar float indexer will be deprecated in a future version, but is allowed for now.
.. code-block:: ipython
In [3]: Series(range(5))[3.0]
Out[3]: 3
HDFStore API Changes
~~~~~~~~~~~~~~~~~~~~
- Query Format Changes. A much more string-like query format is now supported. See :ref:`the docs<io.hdf5-query>`.
.. ipython:: python
path = 'test.h5'
dfq = DataFrame(randn(10,4),
columns=list('ABCD'),
index=date_range('20130101',periods=10))
dfq.to_hdf(path,'dfq',format='table',data_columns=True)
Use boolean expressions, with in-line function evaluation.
.. ipython:: python
read_hdf(path,'dfq',
where="index>Timestamp('20130104') & columns=['A', 'B']")
Use an inline column reference
.. ipython:: python
read_hdf(path,'dfq',
where="A>0 or C>0")
.. ipython:: python
:suppress:
import os
os.remove(path)
- the ``format`` keyword now replaces the ``table`` keyword; allowed values are ``fixed(f)`` or ``table(t)``
the same defaults as prior < 0.13.0 remain, e.g. ``put`` implies ``fixed`` format and ``append`` implies
``table`` format. This default format can be set as an option by setting ``io.hdf.default_format``.
.. ipython:: python
path = 'test.h5'
df = DataFrame(randn(10,2))
df.to_hdf(path,'df_table',format='table')
df.to_hdf(path,'df_table2',append=True)
df.to_hdf(path,'df_fixed')
with get_store(path) as store:
print(store)
.. ipython:: python
:suppress:
import os
os.remove(path)
- Significant table writing performance improvements
- handle a passed ``Series`` in table format (:issue:`4330`)
- can now serialize a ``timedelta64[ns]`` dtype in a table (:issue:`3577`), See :ref:`the docs<io.hdf5-timedelta>`.
- added an ``is_open`` property to indicate if the underlying file handle is_open;
a closed store will now report 'CLOSED' when viewing the store (rather than raising an error)
(:issue:`4409`)
- a close of a ``HDFStore`` now will close that instance of the ``HDFStore``
but will only close the actual file if the ref count (by ``PyTables``) w.r.t. all of the open handles
are 0. Essentially you have a local instance of ``HDFStore`` referenced by a variable. Once you
close it, it will report closed. Other references (to the same file) will continue to operate
until they themselves are closed. Performing an action on a closed file will raise
``ClosedFileError``
.. ipython:: python
path = 'test.h5'
df = DataFrame(randn(10,2))
store1 = HDFStore(path)
store2 = HDFStore(path)
store1.append('df',df)
store2.append('df2',df)
store1
store2
store1.close()
store2
store2.close()
store2
.. ipython:: python
:suppress:
import os
os.remove(path)
- removed the ``_quiet`` attribute, replace by a ``DuplicateWarning`` if retrieving
duplicate rows from a table (:issue:`4367`)
- removed the ``warn`` argument from ``open``. Instead a ``PossibleDataLossError`` exception will
be raised if you try to use ``mode='w'`` with an OPEN file handle (:issue:`4367`)
- allow a passed locations array or mask as a ``where`` condition (:issue:`4467`).
See :ref:`the docs<io.hdf5-where_mask>` for an example.
- add the keyword ``dropna=True`` to ``append`` to change whether ALL nan rows are not written
to the store (default is ``True``, ALL nan rows are NOT written), also settable
via the option ``io.hdf.dropna_table`` (:issue:`4625`)
- pass thru store creation arguments; can be used to support in-memory stores
DataFrame repr Changes
~~~~~~~~~~~~~~~~~~~~~~
The HTML and plain text representations of :class:`DataFrame` now show
a truncated view of the table once it exceeds a certain size, rather
than switching to the short info view (:issue:`4886`, :issue:`5550`).
This makes the representation more consistent as small DataFrames get
larger.
.. image:: _static/df_repr_truncated.png
:alt: Truncated HTML representation of a DataFrame
To get the info view, call :meth:`DataFrame.info`. If you prefer the
info view as the repr for large DataFrames, you can set this by running
``set_option('display.large_repr', 'info')``.
Enhancements
~~~~~~~~~~~~
- ``df.to_clipboard()`` learned a new ``excel`` keyword that let's you
paste df data directly into excel (enabled by default). (:issue:`5070`).
- ``read_html`` now raises a ``URLError`` instead of catching and raising a
``ValueError`` (:issue:`4303`, :issue:`4305`)
- Added a test for ``read_clipboard()`` and ``to_clipboard()`` (:issue:`4282`)
- Clipboard functionality now works with PySide (:issue:`4282`)
- Added a more informative error message when plot arguments contain
overlapping color and style arguments (:issue:`4402`)
- ``to_dict`` now takes ``records`` as a possible outtype. Returns an array
of column-keyed dictionaries. (:issue:`4936`)
- ``NaN`` handing in get_dummies (:issue:`4446`) with `dummy_na`
.. ipython:: python
# previously, nan was erroneously counted as 2 here
# now it is not counted at all
get_dummies([1, 2, np.nan])
# unless requested
get_dummies([1, 2, np.nan], dummy_na=True)
- ``timedelta64[ns]`` operations. See :ref:`the docs<timedeltas.timedeltas_convert>`.
.. warning::
Most of these operations require ``numpy >= 1.7``
Using the new top-level ``to_timedelta``, you can convert a scalar or array from the standard
timedelta format (produced by ``to_csv``) into a timedelta type (``np.timedelta64`` in ``nanoseconds``).
.. ipython:: python
to_timedelta('1 days 06:05:01.00003')
to_timedelta('15.5us')
to_timedelta(['1 days 06:05:01.00003','15.5us','nan'])
to_timedelta(np.arange(5),unit='s')
to_timedelta(np.arange(5),unit='d')
A Series of dtype ``timedelta64[ns]`` can now be divided by another
``timedelta64[ns]`` object, or astyped to yield a ``float64`` dtyped Series. This
is frequency conversion. See :ref:`the docs<timedeltas.timedeltas_convert>` for the docs.
.. ipython:: python
from datetime import timedelta
td = Series(date_range('20130101',periods=4))-Series(date_range('20121201',periods=4))
td[2] += np.timedelta64(timedelta(minutes=5,seconds=3))
td[3] = np.nan
td
# to days
td / np.timedelta64(1,'D')
td.astype('timedelta64[D]')
# to seconds
td / np.timedelta64(1,'s')
td.astype('timedelta64[s]')
Dividing or multiplying a ``timedelta64[ns]`` Series by an integer or integer Series
.. ipython:: python
td * -1
td * Series([1,2,3,4])
Absolute ``DateOffset`` objects can act equivalently to ``timedeltas``
.. ipython:: python
from pandas import offsets
td + offsets.Minute(5) + offsets.Milli(5)
Fillna is now supported for timedeltas
.. ipython:: python
td.fillna(0)
td.fillna(timedelta(days=1,seconds=5))
You can do numeric reduction operations on timedeltas.
.. ipython:: python
td.mean()
td.quantile(.1)
- ``plot(kind='kde')`` now accepts the optional parameters ``bw_method`` and
``ind``, passed to scipy.stats.gaussian_kde() (for scipy >= 0.11.0) to set
the bandwidth, and to gkde.evaluate() to specify the indices at which it
is evaluated, respectively. See scipy docs. (:issue:`4298`)
- DataFrame constructor now accepts a numpy masked record array (:issue:`3478`)
- The new vectorized string method ``extract`` return regular expression
matches more conveniently.
.. ipython:: python
:okwarning:
Series(['a1', 'b2', 'c3']).str.extract('[ab](\d)')
Elements that do not match return ``NaN``. Extracting a regular expression
with more than one group returns a DataFrame with one column per group.
.. ipython:: python
:okwarning:
Series(['a1', 'b2', 'c3']).str.extract('([ab])(\d)')
Elements that do not match return a row of ``NaN``.
Thus, a Series of messy strings can be *converted* into a
like-indexed Series or DataFrame of cleaned-up or more useful strings,
without necessitating ``get()`` to access tuples or ``re.match`` objects.
Named groups like
.. ipython:: python
:okwarning:
Series(['a1', 'b2', 'c3']).str.extract(
'(?P<letter>[ab])(?P<digit>\d)')
and optional groups can also be used.
.. ipython:: python
:okwarning:
Series(['a1', 'b2', '3']).str.extract(
'(?P<letter>[ab])?(?P<digit>\d)')
- ``read_stata`` now accepts Stata 13 format (:issue:`4291`)
- ``read_fwf`` now infers the column specifications from the first 100 rows of
the file if the data has correctly separated and properly aligned columns
using the delimiter provided to the function (:issue:`4488`).
- support for nanosecond times as an offset
.. warning::
These operations require ``numpy >= 1.7``
Period conversions in the range of seconds and below were reworked and extended
up to nanoseconds. Periods in the nanosecond range are now available.
.. ipython:: python
date_range('2013-01-01', periods=5, freq='5N')
or with frequency as offset
.. ipython:: python
date_range('2013-01-01', periods=5, freq=pd.offsets.Nano(5))
Timestamps can be modified in the nanosecond range
.. ipython:: python
t = Timestamp('20130101 09:01:02')
t + pd.tseries.offsets.Nano(123)
- A new method, ``isin`` for DataFrames, which plays nicely with boolean indexing. The argument to ``isin``, what we're comparing the DataFrame to, can be a DataFrame, Series, dict, or array of values. See :ref:`the docs<indexing.basics.indexing_isin>` for more.
To get the rows where any of the conditions are met:
.. ipython:: python
dfi = DataFrame({'A': [1, 2, 3, 4], 'B': ['a', 'b', 'f', 'n']})
dfi
other = DataFrame({'A': [1, 3, 3, 7], 'B': ['e', 'f', 'f', 'e']})
mask = dfi.isin(other)
mask
dfi[mask.any(1)]
- ``Series`` now supports a ``to_frame`` method to convert it to a single-column DataFrame (:issue:`5164`)
- All R datasets listed here http://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html can now be loaded into Pandas objects
.. code-block:: python
# note that pandas.rpy was deprecated in v0.16.0
import pandas.rpy.common as com
com.load_data('Titanic')
- ``tz_localize`` can infer a fall daylight savings transition based on the structure
of the unlocalized data (:issue:`4230`), see :ref:`the docs<timeseries.timezone>`
- ``DatetimeIndex`` is now in the API documentation, see :ref:`the docs<api.datetimeindex>`
- :meth:`~pandas.io.json.json_normalize` is a new method to allow you to create a flat table
from semi-structured JSON data. See :ref:`the docs<io.json_normalize>` (:issue:`1067`)
- Added PySide support for the qtpandas DataFrameModel and DataFrameWidget.
- Python csv parser now supports usecols (:issue:`4335`)
- Frequencies gained several new offsets:
* ``LastWeekOfMonth`` (:issue:`4637`)
* ``FY5253``, and ``FY5253Quarter`` (:issue:`4511`)
- DataFrame has a new ``interpolate`` method, similar to Series (:issue:`4434`, :issue:`1892`)
.. ipython:: python
df = DataFrame({'A': [1, 2.1, np.nan, 4.7, 5.6, 6.8],
'B': [.25, np.nan, np.nan, 4, 12.2, 14.4]})
df.interpolate()
Additionally, the ``method`` argument to ``interpolate`` has been expanded
to include ``'nearest', 'zero', 'slinear', 'quadratic', 'cubic',
'barycentric', 'krogh', 'piecewise_polynomial', 'pchip', `polynomial`, 'spline'``
The new methods require scipy_. Consult the Scipy reference guide_ and documentation_ for more information
about when the various methods are appropriate. See :ref:`the docs<missing_data.interpolate>`.
Interpolate now also accepts a ``limit`` keyword argument.
This works similar to ``fillna``'s limit:
.. ipython:: python
ser = Series([1, 3, np.nan, np.nan, np.nan, 11])
ser.interpolate(limit=2)
- Added ``wide_to_long`` panel data convenience function. See :ref:`the docs<reshaping.melt>`.
.. ipython:: python
np.random.seed(123)
df = pd.DataFrame({"A1970" : {0 : "a", 1 : "b", 2 : "c"},
"A1980" : {0 : "d", 1 : "e", 2 : "f"},
"B1970" : {0 : 2.5, 1 : 1.2, 2 : .7},
"B1980" : {0 : 3.2, 1 : 1.3, 2 : .1},
"X" : dict(zip(range(3), np.random.randn(3)))
})
df["id"] = df.index
df
wide_to_long(df, ["A", "B"], i="id", j="year")
.. _scipy: http://www.scipy.org
.. _documentation: http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation
.. _guide: http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html
- ``to_csv`` now takes a ``date_format`` keyword argument that specifies how
output datetime objects should be formatted. Datetimes encountered in the
index, columns, and values will all have this formatting applied. (:issue:`4313`)
- ``DataFrame.plot`` will scatter plot x versus y by passing ``kind='scatter'`` (:issue:`2215`)
- Added support for Google Analytics v3 API segment IDs that also supports v2 IDs. (:issue:`5271`)
.. _whatsnew_0130.experimental:
Experimental
~~~~~~~~~~~~
- The new :func:`~pandas.eval` function implements expression evaluation using
``numexpr`` behind the scenes. This results in large speedups for
complicated expressions involving large DataFrames/Series. For example,
.. ipython:: python
nrows, ncols = 20000, 100
df1, df2, df3, df4 = [DataFrame(randn(nrows, ncols))
for _ in range(4)]
.. ipython:: python
# eval with NumExpr backend
%timeit pd.eval('df1 + df2 + df3 + df4')
.. ipython:: python
# pure Python evaluation
%timeit df1 + df2 + df3 + df4
For more details, see the :ref:`the docs<enhancingperf.eval>`
- Similar to ``pandas.eval``, :class:`~pandas.DataFrame` has a new
``DataFrame.eval`` method that evaluates an expression in the context of
the ``DataFrame``. For example,
.. ipython:: python
:suppress:
try:
del a
except NameError:
pass
try:
del b
except NameError:
pass
.. ipython:: python
df = DataFrame(randn(10, 2), columns=['a', 'b'])
df.eval('a + b')
- :meth:`~pandas.DataFrame.query` method has been added that allows
you to select elements of a ``DataFrame`` using a natural query syntax
nearly identical to Python syntax. For example,
.. ipython:: python
:suppress:
try:
del a
except NameError:
pass
try:
del b
except NameError:
pass
try:
del c
except NameError:
pass
.. ipython:: python
n = 20
df = DataFrame(np.random.randint(n, size=(n, 3)), columns=['a', 'b', 'c'])
df.query('a < b < c')
selects all the rows of ``df`` where ``a < b < c`` evaluates to ``True``.
For more details see the :ref:`the docs<indexing.query>`.
- ``pd.read_msgpack()`` and ``pd.to_msgpack()`` are now a supported method of serialization
of arbitrary pandas (and python objects) in a lightweight portable binary format. See :ref:`the docs<io.msgpack>`
.. warning::
Since this is an EXPERIMENTAL LIBRARY, the storage format may not be stable until a future release.
.. ipython:: python
df = DataFrame(np.random.rand(5,2),columns=list('AB'))
df.to_msgpack('foo.msg')
pd.read_msgpack('foo.msg')
s = Series(np.random.rand(5),index=date_range('20130101',periods=5))
pd.to_msgpack('foo.msg', df, s)
pd.read_msgpack('foo.msg')
You can pass ``iterator=True`` to iterator over the unpacked results
.. ipython:: python
for o in pd.read_msgpack('foo.msg',iterator=True):
print o
.. ipython:: python
:suppress:
:okexcept:
os.remove('foo.msg')
- ``pandas.io.gbq`` provides a simple way to extract from, and load data into,
Google's BigQuery Data Sets by way of pandas DataFrames. BigQuery is a high
performance SQL-like database service, useful for performing ad-hoc queries
against extremely large datasets. :ref:`See the docs <io.bigquery>`
.. code-block:: python
from pandas.io import gbq
# A query to select the average monthly temperatures in the
# in the year 2000 across the USA. The dataset,
# publicata:samples.gsod, is available on all BigQuery accounts,
# and is based on NOAA gsod data.
query = """SELECT station_number as STATION,
month as MONTH, AVG(mean_temp) as MEAN_TEMP
FROM publicdata:samples.gsod
WHERE YEAR = 2000
GROUP BY STATION, MONTH
ORDER BY STATION, MONTH ASC"""
# Fetch the result set for this query
# Your Google BigQuery Project ID
# To find this, see your dashboard:
# https://console.developers.google.com/iam-admin/projects?authuser=0
projectid = xxxxxxxxx;
df = gbq.read_gbq(query, project_id = projectid)
# Use pandas to process and reshape the dataset
df2 = df.pivot(index='STATION', columns='MONTH', values='MEAN_TEMP')
df3 = pandas.concat([df2.min(), df2.mean(), df2.max()],
axis=1,keys=["Min Tem", "Mean Temp", "Max Temp"])
The resulting DataFrame is::
> df3
Min Tem Mean Temp Max Temp
MONTH
1 -53.336667 39.827892 89.770968
2 -49.837500 43.685219 93.437932
3 -77.926087 48.708355 96.099998
4 -82.892858 55.070087 97.317240
5 -92.378261 61.428117 102.042856
6 -77.703334 65.858888 102.900000
7 -87.821428 68.169663 106.510714
8 -89.431999 68.614215 105.500000
9 -86.611112 63.436935 107.142856
10 -78.209677 56.880838 92.103333
11 -50.125000 48.861228 94.996428
12 -50.332258 42.286879 94.396774
.. warning::
To use this module, you will need a BigQuery account. See
<https://cloud.google.com/products/big-query> for details.
As of 10/10/13, there is a bug in Google's API preventing result sets
from being larger than 100,000 rows. A patch is scheduled for the week of
10/14/13.
.. _whatsnew_0130.refactoring:
Internal Refactoring
~~~~~~~~~~~~~~~~~~~~
In 0.13.0 there is a major refactor primarily to subclass ``Series`` from
``NDFrame``, which is the base class currently for ``DataFrame`` and ``Panel``,
to unify methods and behaviors. Series formerly subclassed directly from
``ndarray``. (:issue:`4080`, :issue:`3862`, :issue:`816`)
.. warning::
There are two potential incompatibilities from < 0.13.0
- Using certain numpy functions would previously return a ``Series`` if passed a ``Series``
as an argument. This seems only to affect ``np.ones_like``, ``np.empty_like``,
``np.diff`` and ``np.where``. These now return ``ndarrays``.
.. ipython:: python
s = Series([1,2,3,4])
Numpy Usage
.. ipython:: python
np.ones_like(s)
np.diff(s)
np.where(s>1,s,np.nan)
Pandonic Usage
.. ipython:: python
Series(1,index=s.index)
s.diff()
s.where(s>1)
- Passing a ``Series`` directly to a cython function expecting an ``ndarray`` type will no
long work directly, you must pass ``Series.values``, See :ref:`Enhancing Performance<enhancingperf.ndarray>`
- ``Series(0.5)`` would previously return the scalar ``0.5``, instead this will return a 1-element ``Series``
- This change breaks ``rpy2<=2.3.8``. an Issue has been opened against rpy2 and a workaround
is detailed in :issue:`5698`. Thanks @JanSchulz.
- Pickle compatibility is preserved for pickles created prior to 0.13. These must be unpickled with ``pd.read_pickle``, see :ref:`Pickling<io.pickle>`.
- Refactor of series.py/frame.py/panel.py to move common code to generic.py
- added ``_setup_axes`` to created generic NDFrame structures
- moved methods
- ``from_axes,_wrap_array,axes,ix,loc,iloc,shape,empty,swapaxes,transpose,pop``
- ``__iter__,keys,__contains__,__len__,__neg__,__invert__``
- ``convert_objects,as_blocks,as_matrix,values``
- ``__getstate__,__setstate__`` (compat remains in frame/panel)
- ``__getattr__,__setattr__``
- ``_indexed_same,reindex_like,align,where,mask``
- ``fillna,replace`` (``Series`` replace is now consistent with ``DataFrame``)
- ``filter`` (also added axis argument to selectively filter on a different axis)
- ``reindex,reindex_axis,take``
- ``truncate`` (moved to become part of ``NDFrame``)
- These are API changes which make ``Panel`` more consistent with ``DataFrame``
- ``swapaxes`` on a ``Panel`` with the same axes specified now return a copy
- support attribute access for setting
- filter supports the same API as the original ``DataFrame`` filter
- Reindex called with no arguments will now return a copy of the input object
- ``TimeSeries`` is now an alias for ``Series``. the property ``is_time_series``
can be used to distinguish (if desired)
- Refactor of Sparse objects to use BlockManager
- Created a new block type in internals, ``SparseBlock``, which can hold multi-dtypes
and is non-consolidatable. ``SparseSeries`` and ``SparseDataFrame`` now inherit
more methods from there hierarchy (Series/DataFrame), and no longer inherit
from ``SparseArray`` (which instead is the object of the ``SparseBlock``)
- Sparse suite now supports integration with non-sparse data. Non-float sparse
data is supportable (partially implemented)
- Operations on sparse structures within DataFrames should preserve sparseness,
merging type operations will convert to dense (and back to sparse), so might
be somewhat inefficient
- enable setitem on ``SparseSeries`` for boolean/integer/slices
- ``SparsePanels`` implementation is unchanged (e.g. not using BlockManager, needs work)
- added ``ftypes`` method to Series/DataFrame, similar to ``dtypes``, but indicates
if the underlying is sparse/dense (as well as the dtype)
- All ``NDFrame`` objects can now use ``__finalize__()`` to specify various
values to propagate to new objects from an existing one (e.g. ``name`` in ``Series`` will
follow more automatically now)
- Internal type checking is now done via a suite of generated classes, allowing ``isinstance(value, klass)``
without having to directly import the klass, courtesy of @jtratner
- Bug in Series update where the parent frame is not updating its cache based on
changes (:issue:`4080`) or types (:issue:`3217`), fillna (:issue:`3386`)
- Indexing with dtype conversions fixed (:issue:`4463`, :issue:`4204`)
- Refactor ``Series.reindex`` to core/generic.py (:issue:`4604`, :issue:`4618`), allow ``method=`` in reindexing
on a Series to work
- ``Series.copy`` no longer accepts the ``order`` parameter and is now consistent with ``NDFrame`` copy
- Refactor ``rename`` methods to core/generic.py; fixes ``Series.rename`` for (:issue:`4605`), and adds ``rename``
with the same signature for ``Panel``
- Refactor ``clip`` methods to core/generic.py (:issue:`4798`)
- Refactor of ``_get_numeric_data/_get_bool_data`` to core/generic.py, allowing Series/Panel functionality
- ``Series`` (for index) / ``Panel`` (for items) now allow attribute access to its elements (:issue:`1903`)
.. ipython:: python
s = Series([1,2,3],index=list('abc'))
s.b
s.a = 5
s
Bug Fixes
~~~~~~~~~
See :ref:`V0.13.0 Bug Fixes<release.bug_fixes-0.13.0>` for an extensive list of bugs that have been fixed in 0.13.0.
See the :ref:`full release notes
<release>` or issue tracker
on GitHub for a complete list of all API changes, Enhancements and Bug Fixes.
|