File: v0.13.0.txt

package info (click to toggle)
pandas 0.19.2-5.1
  • links: PTS, VCS
  • area: main
  • in suites: stretch
  • size: 101,196 kB
  • ctags: 83,045
  • sloc: python: 210,909; ansic: 12,582; sh: 501; makefile: 130
file content (988 lines) | stat: -rw-r--r-- 34,881 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
.. _whatsnew_0130:

v0.13.0 (January 3, 2014)
---------------------------

This is a major release from 0.12.0 and includes a number of API changes, several new features and
enhancements along with a large number of bug fixes.

Highlights include:

- support for a new index type ``Float64Index``, and other Indexing enhancements
- ``HDFStore`` has a new string based syntax for query specification
- support for new methods of interpolation
- updated ``timedelta`` operations
- a new string manipulation method ``extract``
- Nanosecond support for Offsets
- ``isin`` for DataFrames

Several experimental features are added, including:

- new ``eval/query`` methods for expression evaluation
- support for ``msgpack`` serialization
- an i/o interface to Google's ``BigQuery``

Their are several new or updated docs sections including:

- :ref:`Comparison with SQL<compare_with_sql>`, which should be useful for those familiar with SQL but still learning pandas.
- :ref:`Comparison with R<compare_with_r>`, idiom translations from R to pandas.
- :ref:`Enhancing Performance<enhancingperf>`, ways to enhance pandas performance with ``eval/query``.

.. warning::

   In 0.13.0 ``Series`` has internally been refactored to no longer sub-class ``ndarray``
   but instead subclass ``NDFrame``, similar to the rest of the pandas containers. This should be
   a transparent change with only very limited API implications. See :ref:`Internal Refactoring<whatsnew_0130.refactoring>`

API changes
~~~~~~~~~~~

- ``read_excel`` now supports an integer in its ``sheetname`` argument giving
  the index of the sheet to read in (:issue:`4301`).
- Text parser now treats anything that reads like inf ("inf", "Inf", "-Inf",
  "iNf", etc.) as infinity. (:issue:`4220`, :issue:`4219`), affecting
  ``read_table``, ``read_csv``, etc.
- ``pandas`` now is Python 2/3 compatible without the need for 2to3 thanks to
  @jtratner. As a result, pandas now uses iterators more extensively. This
  also led to the introduction of substantive parts of the Benjamin
  Peterson's ``six`` library into compat. (:issue:`4384`, :issue:`4375`,
  :issue:`4372`)
- ``pandas.util.compat`` and ``pandas.util.py3compat`` have been merged into
  ``pandas.compat``. ``pandas.compat`` now includes many functions allowing
  2/3 compatibility. It contains both list and iterator versions of range,
  filter, map and zip, plus other necessary elements for Python 3
  compatibility. ``lmap``, ``lzip``, ``lrange`` and ``lfilter`` all produce
  lists instead of iterators, for compatibility with ``numpy``, subscripting
  and ``pandas`` constructors.(:issue:`4384`, :issue:`4375`, :issue:`4372`)
- ``Series.get`` with negative indexers now returns the same as ``[]`` (:issue:`4390`)
- Changes to how ``Index`` and ``MultiIndex`` handle metadata (``levels``,
  ``labels``, and ``names``) (:issue:`4039`):

  .. code-block:: python

      # previously, you would have set levels or labels directly
      index.levels = [[1, 2, 3, 4], [1, 2, 4, 4]]

      # now, you use the set_levels or set_labels methods
      index = index.set_levels([[1, 2, 3, 4], [1, 2, 4, 4]])

      # similarly, for names, you can rename the object
      # but setting names is not deprecated
      index = index.set_names(["bob", "cranberry"])

      # and all methods take an inplace kwarg - but return None
      index.set_names(["bob", "cranberry"], inplace=True)

- **All** division with ``NDFrame`` objects is now *truedivision*, regardless
  of the future import. This means that operating on pandas objects will by default
  use *floating point* division, and return a floating point dtype.
  You can use ``//`` and ``floordiv`` to do integer division.

  Integer division

  .. code-block:: ipython

      In [3]: arr = np.array([1, 2, 3, 4])

      In [4]: arr2 = np.array([5, 3, 2, 1])

      In [5]: arr / arr2
      Out[5]: array([0, 0, 1, 4])

      In [6]: Series(arr) // Series(arr2)
      Out[6]:
      0    0
      1    0
      2    1
      3    4
      dtype: int64

  True Division

  .. code-block:: ipython

      In [7]: pd.Series(arr) / pd.Series(arr2) # no future import required
      Out[7]:
      0    0.200000
      1    0.666667
      2    1.500000
      3    4.000000
      dtype: float64

- Infer and downcast dtype if ``downcast='infer'`` is passed to ``fillna/ffill/bfill`` (:issue:`4604`)
- ``__nonzero__`` for all NDFrame objects, will now raise a ``ValueError``, this reverts back to (:issue:`1073`, :issue:`4633`)
  behavior. See :ref:`gotchas<gotchas.truth>` for a more detailed discussion.

  This prevents doing boolean comparison on *entire* pandas objects, which is inherently ambiguous. These all will raise a ``ValueError``.

  .. code-block:: python

      if df:
         ....
      df1 and df2
      s1 and s2

  Added the ``.bool()`` method to ``NDFrame`` objects to facilitate evaluating of single-element boolean Series:

  .. ipython:: python

     Series([True]).bool()
     Series([False]).bool()
     DataFrame([[True]]).bool()
     DataFrame([[False]]).bool()

- All non-Index NDFrames (``Series``, ``DataFrame``, ``Panel``, ``Panel4D``,
  ``SparsePanel``, etc.), now support the entire set of arithmetic operators
  and arithmetic flex methods (add, sub, mul, etc.). ``SparsePanel`` does not
  support ``pow`` or ``mod`` with non-scalars. (:issue:`3765`)
- ``Series`` and ``DataFrame`` now have a ``mode()`` method to calculate the
  statistical mode(s) by axis/Series. (:issue:`5367`)

- Chained assignment will now by default warn if the user is assigning to a copy. This can be changed
  with the option ``mode.chained_assignment``, allowed options are ``raise/warn/None``. See :ref:`the docs<indexing.view_versus_copy>`.

  .. ipython:: python

     dfc = DataFrame({'A':['aaa','bbb','ccc'],'B':[1,2,3]})
     pd.set_option('chained_assignment','warn')

  The following warning / exception will show if this is attempted.

  .. ipython:: python
     :okwarning:

     dfc.loc[0]['A'] = 1111

  ::

     Traceback (most recent call last)
        ...
     SettingWithCopyWarning:
        A value is trying to be set on a copy of a slice from a DataFrame.
        Try using .loc[row_index,col_indexer] = value instead

  Here is the correct method of assignment.

  .. ipython:: python

     dfc.loc[0,'A'] = 11
     dfc

- ``Panel.reindex`` has the following call signature ``Panel.reindex(items=None, major_axis=None, minor_axis=None, **kwargs)``
   to conform with other ``NDFrame`` objects. See :ref:`Internal Refactoring<whatsnew_0130.refactoring>` for more information.

- ``Series.argmin`` and ``Series.argmax`` are now aliased to ``Series.idxmin`` and ``Series.idxmax``. These return the *index* of the
   min or max element respectively. Prior to 0.13.0 these would return the position of the min / max element. (:issue:`6214`)

Prior Version Deprecations/Changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

These were announced changes in 0.12 or prior that are taking effect as of 0.13.0

- Remove deprecated ``Factor`` (:issue:`3650`)
- Remove deprecated ``set_printoptions/reset_printoptions`` (:issue:`3046`)
- Remove deprecated ``_verbose_info`` (:issue:`3215`)
- Remove deprecated ``read_clipboard/to_clipboard/ExcelFile/ExcelWriter`` from ``pandas.io.parsers`` (:issue:`3717`)
  These are available as functions in the main pandas namespace (e.g. ``pd.read_clipboard``)
- default for ``tupleize_cols`` is now ``False`` for both ``to_csv`` and ``read_csv``. Fair warning in 0.12 (:issue:`3604`)
- default for `display.max_seq_len` is now 100 rather then `None`. This activates
  truncated display ("...") of long sequences in various places. (:issue:`3391`)

Deprecations
~~~~~~~~~~~~

Deprecated in 0.13.0

- deprecated ``iterkv``, which will be removed in a future release (this was
  an alias of iteritems used to bypass ``2to3``'s changes).
  (:issue:`4384`, :issue:`4375`, :issue:`4372`)
- deprecated the string method ``match``, whose role is now performed more
  idiomatically by ``extract``. In a future release, the default behavior
  of ``match`` will change to become analogous to ``contains``, which returns
  a boolean indexer. (Their
  distinction is strictness: ``match`` relies on ``re.match`` while
  ``contains`` relies on ``re.search``.) In this release, the deprecated
  behavior is the default, but the new behavior is available through the
  keyword argument ``as_indexer=True``.

Indexing API Changes
~~~~~~~~~~~~~~~~~~~~

Prior to 0.13, it was impossible to use a label indexer (``.loc/.ix``) to set a value that
was not contained in the index of a particular axis. (:issue:`2578`). See :ref:`the docs<indexing.basics.partial_setting>`

In the ``Series`` case this is effectively an appending operation

.. ipython:: python

   s = Series([1,2,3])
   s
   s[5] = 5.
   s

.. ipython:: python

   dfi = DataFrame(np.arange(6).reshape(3,2),
                   columns=['A','B'])
   dfi

This would previously ``KeyError``

.. ipython:: python

   dfi.loc[:,'C'] = dfi.loc[:,'A']
   dfi

This is like an ``append`` operation.

.. ipython:: python

   dfi.loc[3] = 5
   dfi

A Panel setting operation on an arbitrary axis aligns the input to the Panel

.. ipython:: python

   p = pd.Panel(np.arange(16).reshape(2,4,2),
               items=['Item1','Item2'],
               major_axis=pd.date_range('2001/1/12',periods=4),
               minor_axis=['A','B'],dtype='float64')
   p
   p.loc[:,:,'C'] = Series([30,32],index=p.items)
   p
   p.loc[:,:,'C']

Float64Index API Change
~~~~~~~~~~~~~~~~~~~~~~~

- Added a new index type, ``Float64Index``. This will be automatically created when passing floating values in index creation.
  This enables a pure label-based slicing paradigm that makes ``[],ix,loc`` for scalar indexing and slicing work exactly the
  same. See :ref:`the docs<indexing.float64index>`, (:issue:`263`)

  Construction is by default for floating type values.

  .. ipython:: python

     index = Index([1.5, 2, 3, 4.5, 5])
     index
     s = Series(range(5),index=index)
     s

  Scalar selection for ``[],.ix,.loc`` will always be label based. An integer will match an equal float index (e.g. ``3`` is equivalent to ``3.0``)

  .. ipython:: python

     s[3]
     s.ix[3]
     s.loc[3]

  The only positional indexing is via ``iloc``

  .. ipython:: python

     s.iloc[3]

  A scalar index that is not found will raise ``KeyError``

  Slicing is ALWAYS on the values of the index, for ``[],ix,loc`` and ALWAYS positional with ``iloc``

  .. ipython:: python

     s[2:4]
     s.ix[2:4]
     s.loc[2:4]
     s.iloc[2:4]

  In float indexes, slicing using floats are allowed

  .. ipython:: python

     s[2.1:4.6]
     s.loc[2.1:4.6]

- Indexing on other index types are preserved (and positional fallback for ``[],ix``), with the exception, that floating point slicing
  on indexes on non ``Float64Index`` will now raise a ``TypeError``.

  .. code-block:: ipython

     In [1]: Series(range(5))[3.5]
     TypeError: the label [3.5] is not a proper indexer for this index type (Int64Index)

     In [1]: Series(range(5))[3.5:4.5]
     TypeError: the slice start [3.5] is not a proper indexer for this index type (Int64Index)

  Using a scalar float indexer will be deprecated in a future version, but is allowed for now.

  .. code-block:: ipython

     In [3]: Series(range(5))[3.0]
     Out[3]: 3

HDFStore API Changes
~~~~~~~~~~~~~~~~~~~~

- Query Format Changes. A much more string-like query format is now supported. See :ref:`the docs<io.hdf5-query>`.

  .. ipython:: python

     path = 'test.h5'
     dfq = DataFrame(randn(10,4),
              columns=list('ABCD'),
              index=date_range('20130101',periods=10))
     dfq.to_hdf(path,'dfq',format='table',data_columns=True)

  Use boolean expressions, with in-line function evaluation.

  .. ipython:: python

     read_hdf(path,'dfq',
         where="index>Timestamp('20130104') & columns=['A', 'B']")

  Use an inline column reference

  .. ipython:: python

     read_hdf(path,'dfq',
         where="A>0 or C>0")

  .. ipython:: python
     :suppress:

     import os
     os.remove(path)

- the ``format`` keyword now replaces the ``table`` keyword; allowed values are ``fixed(f)`` or ``table(t)``
  the same defaults as prior < 0.13.0 remain, e.g. ``put`` implies ``fixed`` format and ``append`` implies
  ``table`` format. This default format can be set as an option by setting ``io.hdf.default_format``.

  .. ipython:: python

     path = 'test.h5'
     df = DataFrame(randn(10,2))
     df.to_hdf(path,'df_table',format='table')
     df.to_hdf(path,'df_table2',append=True)
     df.to_hdf(path,'df_fixed')
     with get_store(path) as store:
        print(store)

  .. ipython:: python
     :suppress:

     import os
     os.remove(path)

- Significant table writing performance improvements
- handle a passed ``Series`` in table format (:issue:`4330`)
- can now serialize a ``timedelta64[ns]`` dtype in a table (:issue:`3577`), See :ref:`the docs<io.hdf5-timedelta>`.
- added an ``is_open`` property to indicate if the underlying file handle is_open;
  a closed store will now report 'CLOSED' when viewing the store (rather than raising an error)
  (:issue:`4409`)
- a close of a ``HDFStore`` now will close that instance of the ``HDFStore``
  but will only close the actual file if the ref count (by ``PyTables``) w.r.t. all of the open handles
  are 0. Essentially you have a local instance of ``HDFStore`` referenced by a variable. Once you
  close it, it will report closed. Other references (to the same file) will continue to operate
  until they themselves are closed. Performing an action on a closed file will raise
  ``ClosedFileError``

  .. ipython:: python

     path = 'test.h5'
     df = DataFrame(randn(10,2))
     store1 = HDFStore(path)
     store2 = HDFStore(path)
     store1.append('df',df)
     store2.append('df2',df)

     store1
     store2
     store1.close()
     store2
     store2.close()
     store2

  .. ipython:: python
     :suppress:

     import os
     os.remove(path)

- removed the ``_quiet`` attribute, replace by a ``DuplicateWarning`` if retrieving
  duplicate rows from a table (:issue:`4367`)
- removed the ``warn`` argument from ``open``. Instead a ``PossibleDataLossError`` exception will
  be raised if you try to use ``mode='w'`` with an OPEN file handle (:issue:`4367`)
- allow a passed locations array or mask as a ``where`` condition (:issue:`4467`).
  See :ref:`the docs<io.hdf5-where_mask>` for an example.
- add the keyword ``dropna=True`` to ``append`` to change whether ALL nan rows are not written
  to the store (default is ``True``, ALL nan rows are NOT written), also settable
  via the option ``io.hdf.dropna_table`` (:issue:`4625`)
- pass thru store creation arguments; can be used to support in-memory stores

DataFrame repr Changes
~~~~~~~~~~~~~~~~~~~~~~

The HTML and plain text representations of :class:`DataFrame` now show
a truncated view of the table once it exceeds a certain size, rather
than switching to the short info view (:issue:`4886`, :issue:`5550`).
This makes the representation more consistent as small DataFrames get
larger.

.. image:: _static/df_repr_truncated.png
   :alt: Truncated HTML representation of a DataFrame

To get the info view, call :meth:`DataFrame.info`. If you prefer the
info view as the repr for large DataFrames, you can set this by running
``set_option('display.large_repr', 'info')``.

Enhancements
~~~~~~~~~~~~

- ``df.to_clipboard()`` learned a new ``excel`` keyword that let's you
  paste df data directly into excel (enabled by default). (:issue:`5070`).
- ``read_html`` now raises a ``URLError`` instead of catching and raising a
  ``ValueError`` (:issue:`4303`, :issue:`4305`)
- Added a test for ``read_clipboard()`` and ``to_clipboard()`` (:issue:`4282`)
- Clipboard functionality now works with PySide (:issue:`4282`)
- Added a more informative error message when plot arguments contain
  overlapping color and style arguments (:issue:`4402`)
- ``to_dict`` now takes ``records`` as a possible outtype.  Returns an array
  of column-keyed dictionaries. (:issue:`4936`)

- ``NaN`` handing in get_dummies (:issue:`4446`) with `dummy_na`

  .. ipython:: python

     # previously, nan was erroneously counted as 2 here
     # now it is not counted at all
     get_dummies([1, 2, np.nan])

     # unless requested
     get_dummies([1, 2, np.nan], dummy_na=True)


- ``timedelta64[ns]`` operations. See :ref:`the docs<timedeltas.timedeltas_convert>`.

  .. warning::

     Most of these operations require ``numpy >= 1.7``

  Using the new top-level ``to_timedelta``, you can convert a scalar or array from the standard
  timedelta format (produced by ``to_csv``) into a timedelta type (``np.timedelta64`` in ``nanoseconds``).

  .. ipython:: python

     to_timedelta('1 days 06:05:01.00003')
     to_timedelta('15.5us')
     to_timedelta(['1 days 06:05:01.00003','15.5us','nan'])
     to_timedelta(np.arange(5),unit='s')
     to_timedelta(np.arange(5),unit='d')

  A Series of dtype ``timedelta64[ns]`` can now be divided by another
  ``timedelta64[ns]`` object, or astyped to yield a ``float64`` dtyped Series. This
  is frequency conversion. See :ref:`the docs<timedeltas.timedeltas_convert>` for the docs.

  .. ipython:: python

     from datetime import timedelta
     td = Series(date_range('20130101',periods=4))-Series(date_range('20121201',periods=4))
     td[2] += np.timedelta64(timedelta(minutes=5,seconds=3))
     td[3] = np.nan
     td

     # to days
     td / np.timedelta64(1,'D')
     td.astype('timedelta64[D]')

     # to seconds
     td / np.timedelta64(1,'s')
     td.astype('timedelta64[s]')

  Dividing or multiplying a ``timedelta64[ns]`` Series by an integer or integer Series

  .. ipython:: python

     td * -1
     td * Series([1,2,3,4])

  Absolute ``DateOffset`` objects can act equivalently to ``timedeltas``

  .. ipython:: python

     from pandas import offsets
     td + offsets.Minute(5) + offsets.Milli(5)

  Fillna is now supported for timedeltas

  .. ipython:: python

     td.fillna(0)
     td.fillna(timedelta(days=1,seconds=5))

  You can do numeric reduction operations on timedeltas.

  .. ipython:: python

     td.mean()
     td.quantile(.1)

- ``plot(kind='kde')`` now accepts the optional parameters ``bw_method`` and
  ``ind``, passed to scipy.stats.gaussian_kde() (for scipy >= 0.11.0) to set
  the bandwidth, and to gkde.evaluate() to specify the indices at which it
  is evaluated, respectively. See scipy docs. (:issue:`4298`)

- DataFrame constructor now accepts a numpy masked record array (:issue:`3478`)

- The new vectorized string method ``extract`` return regular expression
  matches more conveniently.

  .. ipython:: python
     :okwarning:

     Series(['a1', 'b2', 'c3']).str.extract('[ab](\d)')

  Elements that do not match return ``NaN``. Extracting a regular expression
  with more than one group returns a DataFrame with one column per group.


  .. ipython:: python
     :okwarning:

     Series(['a1', 'b2', 'c3']).str.extract('([ab])(\d)')

  Elements that do not match return a row of ``NaN``.
  Thus, a Series of messy strings can be *converted* into a
  like-indexed Series or DataFrame of cleaned-up or more useful strings,
  without necessitating ``get()`` to access tuples or ``re.match`` objects.

  Named groups like

  .. ipython:: python
     :okwarning:

     Series(['a1', 'b2', 'c3']).str.extract(
             '(?P<letter>[ab])(?P<digit>\d)')

  and optional groups can also be used.

  .. ipython:: python
     :okwarning:

      Series(['a1', 'b2', '3']).str.extract(
              '(?P<letter>[ab])?(?P<digit>\d)')

- ``read_stata`` now accepts Stata 13 format (:issue:`4291`)

- ``read_fwf`` now infers the column specifications from the first 100 rows of
  the file if the data has correctly separated and properly aligned columns
  using the delimiter provided to the function (:issue:`4488`).

- support for nanosecond times as an offset

  .. warning::

     These operations require ``numpy >= 1.7``

  Period conversions in the range of seconds and below were reworked and extended
  up to nanoseconds. Periods in the nanosecond range are now available.

  .. ipython:: python

     date_range('2013-01-01', periods=5, freq='5N')

  or with frequency as offset

  .. ipython:: python

     date_range('2013-01-01', periods=5, freq=pd.offsets.Nano(5))

  Timestamps can be modified in the nanosecond range

  .. ipython:: python

     t = Timestamp('20130101 09:01:02')
     t + pd.tseries.offsets.Nano(123)

- A new method, ``isin`` for DataFrames, which plays nicely with boolean indexing. The argument to ``isin``, what we're comparing the DataFrame to, can be a DataFrame, Series, dict, or array of values. See :ref:`the docs<indexing.basics.indexing_isin>` for more.

  To get the rows where any of the conditions are met:

  .. ipython:: python

     dfi = DataFrame({'A': [1, 2, 3, 4], 'B': ['a', 'b', 'f', 'n']})
     dfi
     other = DataFrame({'A': [1, 3, 3, 7], 'B': ['e', 'f', 'f', 'e']})
     mask = dfi.isin(other)
     mask
     dfi[mask.any(1)]

- ``Series`` now supports a ``to_frame`` method to convert it to a single-column DataFrame (:issue:`5164`)

- All R datasets listed here http://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html can now be loaded into Pandas objects

  .. code-block:: python

     # note that pandas.rpy was deprecated in v0.16.0
     import pandas.rpy.common as com
     com.load_data('Titanic')

- ``tz_localize`` can infer a fall daylight savings transition based on the structure
  of the unlocalized data (:issue:`4230`), see :ref:`the docs<timeseries.timezone>`

- ``DatetimeIndex`` is now in the API documentation, see :ref:`the docs<api.datetimeindex>`

- :meth:`~pandas.io.json.json_normalize` is a new method to allow you to create a flat table
  from semi-structured JSON data. See :ref:`the docs<io.json_normalize>` (:issue:`1067`)

- Added PySide support for the qtpandas DataFrameModel and DataFrameWidget.

- Python csv parser now supports usecols (:issue:`4335`)

- Frequencies gained several new offsets:

  * ``LastWeekOfMonth`` (:issue:`4637`)
  * ``FY5253``, and ``FY5253Quarter`` (:issue:`4511`)


- DataFrame has a new ``interpolate`` method, similar to Series (:issue:`4434`, :issue:`1892`)

  .. ipython:: python

      df = DataFrame({'A': [1, 2.1, np.nan, 4.7, 5.6, 6.8],
                      'B': [.25, np.nan, np.nan, 4, 12.2, 14.4]})
      df.interpolate()

  Additionally, the ``method`` argument to ``interpolate`` has been expanded
  to include ``'nearest', 'zero', 'slinear', 'quadratic', 'cubic',
  'barycentric', 'krogh', 'piecewise_polynomial', 'pchip', `polynomial`, 'spline'``
  The new methods require scipy_. Consult the Scipy reference guide_ and documentation_ for more information
  about when the various methods are appropriate. See :ref:`the docs<missing_data.interpolate>`.

  Interpolate now also accepts a ``limit`` keyword argument.
  This works similar to ``fillna``'s limit:

  .. ipython:: python

    ser = Series([1, 3, np.nan, np.nan, np.nan, 11])
    ser.interpolate(limit=2)

- Added ``wide_to_long`` panel data convenience function. See :ref:`the docs<reshaping.melt>`.

  .. ipython:: python

    np.random.seed(123)
    df = pd.DataFrame({"A1970" : {0 : "a", 1 : "b", 2 : "c"},
                       "A1980" : {0 : "d", 1 : "e", 2 : "f"},
                       "B1970" : {0 : 2.5, 1 : 1.2, 2 : .7},
                       "B1980" : {0 : 3.2, 1 : 1.3, 2 : .1},
                       "X"     : dict(zip(range(3), np.random.randn(3)))
                      })
    df["id"] = df.index
    df
    wide_to_long(df, ["A", "B"], i="id", j="year")

.. _scipy: http://www.scipy.org
.. _documentation: http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation
.. _guide: http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html

- ``to_csv`` now takes a ``date_format`` keyword argument that specifies how
  output datetime objects should be formatted. Datetimes encountered in the
  index, columns, and values will all have this formatting applied. (:issue:`4313`)
- ``DataFrame.plot`` will scatter plot x versus y by passing ``kind='scatter'`` (:issue:`2215`)
- Added support for Google Analytics v3 API segment IDs that also supports v2 IDs. (:issue:`5271`)

.. _whatsnew_0130.experimental:

Experimental
~~~~~~~~~~~~

- The new :func:`~pandas.eval` function implements expression evaluation using
  ``numexpr`` behind the scenes. This results in large speedups for
  complicated expressions involving large DataFrames/Series. For example,

  .. ipython:: python

     nrows, ncols = 20000, 100
     df1, df2, df3, df4 = [DataFrame(randn(nrows, ncols))
                           for _ in range(4)]

  .. ipython:: python

     # eval with NumExpr backend
     %timeit pd.eval('df1 + df2 + df3 + df4')

  .. ipython:: python

     # pure Python evaluation
     %timeit df1 + df2 + df3 + df4

  For more details, see the :ref:`the docs<enhancingperf.eval>`

- Similar to ``pandas.eval``, :class:`~pandas.DataFrame` has a new
  ``DataFrame.eval`` method that evaluates an expression in the context of
  the ``DataFrame``. For example,

  .. ipython:: python
     :suppress:

     try:
        del a
     except NameError:
        pass

     try:
        del b
     except NameError:
        pass

  .. ipython:: python

     df = DataFrame(randn(10, 2), columns=['a', 'b'])
     df.eval('a + b')

- :meth:`~pandas.DataFrame.query` method has been added that allows
  you to select elements of a ``DataFrame`` using a natural query syntax
  nearly identical to Python syntax. For example,

  .. ipython:: python
     :suppress:

     try:
        del a
     except NameError:
        pass

     try:
        del b
     except NameError:
        pass

     try:
        del c
     except NameError:
        pass

  .. ipython:: python

     n = 20
     df = DataFrame(np.random.randint(n, size=(n, 3)), columns=['a', 'b', 'c'])
     df.query('a < b < c')

  selects all the rows of ``df`` where ``a < b < c`` evaluates to ``True``.
  For more details see the :ref:`the docs<indexing.query>`.

- ``pd.read_msgpack()`` and ``pd.to_msgpack()`` are now a supported method of serialization
  of arbitrary pandas (and python objects) in a lightweight portable binary format. See :ref:`the docs<io.msgpack>`

  .. warning::

     Since this is an EXPERIMENTAL LIBRARY, the storage format may not be stable until a future release.

  .. ipython:: python

     df = DataFrame(np.random.rand(5,2),columns=list('AB'))
     df.to_msgpack('foo.msg')
     pd.read_msgpack('foo.msg')

     s = Series(np.random.rand(5),index=date_range('20130101',periods=5))
     pd.to_msgpack('foo.msg', df, s)
     pd.read_msgpack('foo.msg')

  You can pass ``iterator=True`` to iterator over the unpacked results

  .. ipython:: python

     for o in pd.read_msgpack('foo.msg',iterator=True):
        print o

  .. ipython:: python
     :suppress:
     :okexcept:

     os.remove('foo.msg')

- ``pandas.io.gbq`` provides a simple way to extract from, and load data into,
  Google's BigQuery Data Sets by way of pandas DataFrames. BigQuery is a high
  performance SQL-like database service, useful for performing ad-hoc queries
  against extremely large datasets. :ref:`See the docs <io.bigquery>`

  .. code-block:: python

     from pandas.io import gbq

     # A query to select the average monthly temperatures in the
     # in the year 2000 across the USA. The dataset,
     # publicata:samples.gsod, is available on all BigQuery accounts,
     # and is based on NOAA gsod data.

     query = """SELECT station_number as STATION,
     month as MONTH, AVG(mean_temp) as MEAN_TEMP
     FROM publicdata:samples.gsod
     WHERE YEAR = 2000
     GROUP BY STATION, MONTH
     ORDER BY STATION, MONTH ASC"""

     # Fetch the result set for this query

     # Your Google BigQuery Project ID
     # To find this, see your dashboard:
     # https://console.developers.google.com/iam-admin/projects?authuser=0
     projectid = xxxxxxxxx;

     df = gbq.read_gbq(query, project_id = projectid)

     # Use pandas to process and reshape the dataset

     df2 = df.pivot(index='STATION', columns='MONTH', values='MEAN_TEMP')
     df3 = pandas.concat([df2.min(), df2.mean(), df2.max()],
                         axis=1,keys=["Min Tem", "Mean Temp", "Max Temp"])

  The resulting DataFrame is::

     > df3
                 Min Tem  Mean Temp    Max Temp
      MONTH
      1     -53.336667  39.827892   89.770968
      2     -49.837500  43.685219   93.437932
      3     -77.926087  48.708355   96.099998
      4     -82.892858  55.070087   97.317240
      5     -92.378261  61.428117  102.042856
      6     -77.703334  65.858888  102.900000
      7     -87.821428  68.169663  106.510714
      8     -89.431999  68.614215  105.500000
      9     -86.611112  63.436935  107.142856
      10    -78.209677  56.880838   92.103333
      11    -50.125000  48.861228   94.996428
      12    -50.332258  42.286879   94.396774

  .. warning::

     To use this module, you will need a BigQuery account. See
     <https://cloud.google.com/products/big-query> for details.

     As of 10/10/13, there is a bug in Google's API preventing result sets
     from being larger than 100,000 rows. A patch is scheduled for the week of
     10/14/13.

.. _whatsnew_0130.refactoring:

Internal Refactoring
~~~~~~~~~~~~~~~~~~~~

In 0.13.0 there is a major refactor primarily to subclass ``Series`` from
``NDFrame``, which is the base class currently for ``DataFrame`` and ``Panel``,
to unify methods and behaviors. Series formerly subclassed directly from
``ndarray``. (:issue:`4080`, :issue:`3862`, :issue:`816`)

.. warning::

   There are two potential incompatibilities from < 0.13.0

   - Using certain numpy functions would previously return a ``Series`` if passed a ``Series``
     as an argument. This seems only to affect ``np.ones_like``, ``np.empty_like``,
     ``np.diff`` and ``np.where``. These now return ``ndarrays``.

     .. ipython:: python

        s = Series([1,2,3,4])

     Numpy Usage

     .. ipython:: python

        np.ones_like(s)
        np.diff(s)
        np.where(s>1,s,np.nan)

     Pandonic Usage

     .. ipython:: python

        Series(1,index=s.index)
        s.diff()
        s.where(s>1)

   - Passing a ``Series`` directly to a cython function expecting an ``ndarray`` type will no
     long work directly, you must pass ``Series.values``, See :ref:`Enhancing Performance<enhancingperf.ndarray>`

   - ``Series(0.5)`` would previously return the scalar ``0.5``, instead this will return a 1-element ``Series``

   - This change breaks ``rpy2<=2.3.8``. an Issue has been opened against rpy2 and a workaround
     is detailed in :issue:`5698`. Thanks @JanSchulz.

- Pickle compatibility is preserved for pickles created prior to 0.13. These must be unpickled with ``pd.read_pickle``, see :ref:`Pickling<io.pickle>`.

- Refactor of series.py/frame.py/panel.py to move common code to generic.py

  - added ``_setup_axes`` to created generic NDFrame structures
  - moved methods

    - ``from_axes,_wrap_array,axes,ix,loc,iloc,shape,empty,swapaxes,transpose,pop``
    - ``__iter__,keys,__contains__,__len__,__neg__,__invert__``
    - ``convert_objects,as_blocks,as_matrix,values``
    - ``__getstate__,__setstate__`` (compat remains in frame/panel)
    - ``__getattr__,__setattr__``
    - ``_indexed_same,reindex_like,align,where,mask``
    - ``fillna,replace`` (``Series`` replace is now consistent with ``DataFrame``)
    - ``filter`` (also added axis argument to selectively filter on a different axis)
    - ``reindex,reindex_axis,take``
    - ``truncate`` (moved to become part of ``NDFrame``)

- These are API changes which make ``Panel`` more consistent with ``DataFrame``

  - ``swapaxes`` on a ``Panel`` with the same axes specified now return a copy
  - support attribute access for setting
  - filter supports the same API as the original ``DataFrame`` filter

- Reindex called with no arguments will now return a copy of the input object

- ``TimeSeries`` is now an alias for ``Series``. the property ``is_time_series``
  can be used to distinguish (if desired)

- Refactor of Sparse objects to use BlockManager

  - Created a new block type in internals, ``SparseBlock``, which can hold multi-dtypes
    and is non-consolidatable. ``SparseSeries`` and ``SparseDataFrame`` now inherit
    more methods from there hierarchy (Series/DataFrame), and no longer inherit
    from ``SparseArray`` (which instead is the object of the ``SparseBlock``)
  - Sparse suite now supports integration with non-sparse data. Non-float sparse
    data is supportable (partially implemented)
  - Operations on sparse structures within DataFrames should preserve sparseness,
    merging type operations will convert to dense (and back to sparse), so might
    be somewhat inefficient
  - enable setitem on ``SparseSeries`` for boolean/integer/slices
  - ``SparsePanels`` implementation is unchanged (e.g. not using BlockManager, needs work)

- added ``ftypes`` method to Series/DataFrame, similar to ``dtypes``, but indicates
  if the underlying is sparse/dense (as well as the dtype)
- All ``NDFrame`` objects can now use ``__finalize__()`` to specify various
  values to propagate to new objects from an existing one (e.g. ``name`` in ``Series`` will
  follow more automatically now)
- Internal type checking is now done via a suite of generated classes, allowing ``isinstance(value, klass)``
  without having to directly import the klass, courtesy of @jtratner
- Bug in Series update where the parent frame is not updating its cache based on
  changes (:issue:`4080`) or types (:issue:`3217`), fillna (:issue:`3386`)
- Indexing with dtype conversions fixed (:issue:`4463`, :issue:`4204`)
- Refactor ``Series.reindex`` to core/generic.py (:issue:`4604`, :issue:`4618`), allow ``method=`` in reindexing
  on a Series to work
- ``Series.copy`` no longer accepts the ``order`` parameter and is now consistent with ``NDFrame`` copy
- Refactor ``rename`` methods to core/generic.py; fixes ``Series.rename`` for (:issue:`4605`), and adds ``rename``
  with the same signature for ``Panel``
- Refactor ``clip`` methods to core/generic.py (:issue:`4798`)
- Refactor of ``_get_numeric_data/_get_bool_data`` to core/generic.py, allowing Series/Panel functionality
- ``Series`` (for index) / ``Panel`` (for items) now allow attribute access to its elements  (:issue:`1903`)

  .. ipython:: python

     s = Series([1,2,3],index=list('abc'))
     s.b
     s.a = 5
     s

Bug Fixes
~~~~~~~~~

See :ref:`V0.13.0 Bug Fixes<release.bug_fixes-0.13.0>` for an extensive list of bugs that have been fixed in 0.13.0.

See the :ref:`full release notes
<release>` or issue tracker
on GitHub for a complete list of all API changes, Enhancements and Bug Fixes.