.. _whatsnew_0110:

v0.11.0 (April 22, 2013)
------------------------

This is a major release from 0.10.1 and includes many new features and
enhancements along with a large number of bug fixes. The methods of Selecting
Data have had quite a number of additions, and Dtype support is now full-fledged.
There are also a number of important API changes that long-time pandas users should
pay close attention to.

There is a new section in the documentation, :ref:`10 Minutes to Pandas <10min>`,
primarily geared to new users.

There is a new section in the documentation, :ref:`Cookbook <cookbook>`, a collection
of useful recipes in pandas (and that we want contributions!).

There are several libraries that are now :ref:`Recommended Dependencies <install.recommended_dependencies>`

Selection Choices
~~~~~~~~~~~~~~~~~

Starting in 0.11.0, object selection has had a number of user-requested additions in
order to support more explicit location based indexing. Pandas now supports
three types of multi-axis indexing.

- ``.loc`` is strictly label based, will raise ``KeyError`` when the items are not found, allowed inputs are:

  - A single label, e.g. ``5`` or ``'a'``, (note that ``5`` is interpreted as a *label* of the index. This use is **not** an integer position along the index)
  - A list or array of labels ``['a', 'b', 'c']``
  - A slice object with labels ``'a':'f'``, (note that contrary to usual python slices, **both** the start and the stop are included!)
  - A boolean array

  See more at :ref:`Selection by Label <indexing.label>`

- ``.iloc`` is strictly integer position based (from ``0`` to ``length-1`` of the axis), will raise ``IndexError`` when the requested indicies are out of bounds. Allowed inputs are:

  - An integer e.g. ``5``
  - A list or array of integers ``[4, 3, 0]``
  - A slice object with ints ``1:7``
  - A boolean array

  See more at :ref:`Selection by Position <indexing.integer>`

- ``.ix`` supports mixed integer and label based access. It is primarily label based, but will fallback to integer positional access. ``.ix`` is the most general and will support
  any of the inputs to ``.loc`` and ``.iloc``, as well as support for floating point label schemes. ``.ix`` is especially useful when dealing with mixed positional and label
  based hierarchial indexes.

  As using integer slices with ``.ix`` have different behavior depending on whether the slice
  is interpreted as position based or label based, it's usually better to be
  explicit and use ``.iloc`` or ``.loc``.

  See more at :ref:`Advanced Indexing <advanced>` and :ref:`Advanced Hierarchical <advanced.advanced_hierarchical>`.


Selection Deprecations
~~~~~~~~~~~~~~~~~~~~~~

Starting in version 0.11.0, these methods *may* be deprecated in future versions.

- ``irow``
- ``icol``
- ``iget_value``

See the section :ref:`Selection by Position <indexing.integer>` for substitutes.

Dtypes
~~~~~~

Numeric dtypes will propagate and can coexist in DataFrames. If a dtype is passed (either directly via the ``dtype`` keyword, a passed ``ndarray``, or a passed ``Series``, then it will be preserved in DataFrame operations. Furthermore, different numeric dtypes will **NOT** be combined. The following example will give you a taste.

.. ipython:: python

   df1 = DataFrame(randn(8, 1), columns = ['A'], dtype = 'float32')
   df1
   df1.dtypes
   df2 = DataFrame(dict( A = Series(randn(8),dtype='float16'),
                         B = Series(randn(8)),
                         C = Series(randn(8),dtype='uint8') ))
   df2
   df2.dtypes

   # here you get some upcasting
   df3 = df1.reindex_like(df2).fillna(value=0.0) + df2
   df3
   df3.dtypes

Dtype Conversion
~~~~~~~~~~~~~~~~

This is lower-common-denomicator upcasting, meaning you get the dtype which can accomodate all of the types

.. ipython:: python

   df3.values.dtype

Conversion

.. ipython:: python

   df3.astype('float32').dtypes

Mixed Conversion

.. ipython:: python
   :okwarning:

   df3['D'] = '1.'
   df3['E'] = '1'
   df3.convert_objects(convert_numeric=True).dtypes

   # same, but specific dtype conversion
   df3['D'] = df3['D'].astype('float16')
   df3['E'] = df3['E'].astype('int32')
   df3.dtypes

Forcing Date coercion (and setting ``NaT`` when not datelike)

.. ipython:: python
   :okwarning:

   from datetime import datetime
   s = Series([datetime(2001,1,1,0,0), 'foo', 1.0, 1,
               Timestamp('20010104'), '20010105'],dtype='O')
   s.convert_objects(convert_dates='coerce')

Dtype Gotchas
~~~~~~~~~~~~~

**Platform Gotchas**

Starting in 0.11.0, construction of DataFrame/Series will use default dtypes of ``int64`` and ``float64``,
*regardless of platform*. This is not an apparent change from earlier versions of pandas. If you specify
dtypes, they *WILL* be respected, however (:issue:`2837`)

The following will all result in ``int64`` dtypes

.. ipython:: python

    DataFrame([1,2],columns=['a']).dtypes
    DataFrame({'a' : [1,2] }).dtypes
    DataFrame({'a' : 1 }, index=range(2)).dtypes

Keep in mind that ``DataFrame(np.array([1,2]))`` **WILL** result in ``int32`` on 32-bit platforms!


**Upcasting Gotchas**

Performing indexing operations on integer type data can easily upcast the data.
The dtype of the input data will be preserved in cases where ``nans`` are not introduced.

.. ipython:: python

   dfi = df3.astype('int32')
   dfi['D'] = dfi['D'].astype('int64')
   dfi
   dfi.dtypes

   casted = dfi[dfi>0]
   casted
   casted.dtypes

While float dtypes are unchanged.

.. ipython:: python

   df4 = df3.copy()
   df4['A'] = df4['A'].astype('float32')
   df4.dtypes

   casted = df4[df4>0]
   casted
   casted.dtypes

Datetimes Conversion
~~~~~~~~~~~~~~~~~~~~

Datetime64[ns] columns in a DataFrame (or a Series) allow the use of ``np.nan`` to indicate a nan value,
in addition to the traditional ``NaT``, or not-a-time. This allows convenient nan setting in a generic way.
Furthermore ``datetime64[ns]`` columns are created by default, when passed datetimelike objects (*this change was introduced in 0.10.1*)
(:issue:`2809`, :issue:`2810`)

.. ipython:: python

   df = DataFrame(randn(6,2),date_range('20010102',periods=6),columns=['A','B'])
   df['timestamp'] = Timestamp('20010103')
   df

   # datetime64[ns] out of the box
   df.get_dtype_counts()

   # use the traditional nan, which is mapped to NaT internally
   df.ix[2:4,['A','timestamp']] = np.nan
   df

Astype conversion on ``datetime64[ns]`` to ``object``, implicity converts ``NaT`` to ``np.nan``

.. ipython:: python

   import datetime
   s = Series([datetime.datetime(2001, 1, 2, 0, 0) for i in range(3)])
   s.dtype
   s[1] = np.nan
   s
   s.dtype
   s = s.astype('O')
   s
   s.dtype


API changes
~~~~~~~~~~~

  - Added to_series() method to indicies, to facilitate the creation of indexers
    (:issue:`3275`)

  - ``HDFStore``

    - added the method ``select_column`` to select a single column from a table as a Series.
    - deprecated the ``unique`` method, can be replicated by ``select_column(key,column).unique()``
    - ``min_itemsize`` parameter to ``append`` will now automatically create data_columns for passed keys

Enhancements
~~~~~~~~~~~~

  - Improved performance of df.to_csv() by up to 10x in some cases. (:issue:`3059`)

  - Numexpr is now a :ref:`Recommended Dependencies <install.recommended_dependencies>`, to accelerate certain
    types of numerical and boolean operations

  - Bottleneck is now a :ref:`Recommended Dependencies <install.recommended_dependencies>`, to accelerate certain
    types of ``nan`` operations

  - ``HDFStore``

    - support ``read_hdf/to_hdf`` API similar to ``read_csv/to_csv``

      .. ipython:: python
          :suppress:

          from pandas.compat import lrange

      .. ipython:: python

          df = DataFrame(dict(A=lrange(5), B=lrange(5)))
          df.to_hdf('store.h5','table',append=True)
          read_hdf('store.h5', 'table', where = ['index>2'])

      .. ipython:: python
          :suppress:
          :okexcept:

          os.remove('store.h5')

    - provide dotted attribute access to ``get`` from stores, e.g. ``store.df == store['df']``

    - new keywords ``iterator=boolean``, and ``chunksize=number_in_a_chunk`` are
      provided to support iteration on ``select`` and ``select_as_multiple`` (:issue:`3076`)

  - You can now select timestamps from an *unordered* timeseries similarly to an *ordered* timeseries (:issue:`2437`)

  - You can now select with a string from a DataFrame with a datelike index, in a similar way to a Series (:issue:`3070`)

    .. ipython:: python

        idx = date_range("2001-10-1", periods=5, freq='M')
        ts = Series(np.random.rand(len(idx)),index=idx)
        ts['2001']

        df = DataFrame(dict(A = ts))
        df['2001']

  - ``Squeeze`` to possibly remove length 1 dimensions from an object.

    .. ipython:: python

       p = Panel(randn(3,4,4),items=['ItemA','ItemB','ItemC'],
                          major_axis=date_range('20010102',periods=4),
                          minor_axis=['A','B','C','D'])
       p
       p.reindex(items=['ItemA']).squeeze()
       p.reindex(items=['ItemA'],minor=['B']).squeeze()

  - In ``pd.io.data.Options``,

    + Fix bug when trying to fetch data for the current month when already
      past expiry.
    + Now using lxml to scrape html instead of BeautifulSoup (lxml was faster).
    + New instance variables for calls and puts are automatically created
      when a method that creates them is called. This works for current month
      where the instance variables are simply ``calls`` and ``puts``. Also
      works for future expiry months and save the instance variable as
      ``callsMMYY`` or ``putsMMYY``, where ``MMYY`` are, respectively, the
      month and year of the option's expiry.
    + ``Options.get_near_stock_price`` now allows the user to specify the
      month for which to get relevant options data.
    + ``Options.get_forward_data`` now has optional kwargs ``near`` and
      ``above_below``. This allows the user to specify if they would like to
      only return forward looking data for options near the current stock
      price. This just obtains the data from Options.get_near_stock_price
      instead of Options.get_xxx_data() (:issue:`2758`).

  - Cursor coordinate information is now displayed in time-series plots.

  - added option `display.max_seq_items` to control the number of
    elements printed per sequence pprinting it.  (:issue:`2979`)

  - added option `display.chop_threshold` to control display of small numerical
    values. (:issue:`2739`)

  - added option `display.max_info_rows` to prevent verbose_info from being
    calculated for frames above 1M rows (configurable). (:issue:`2807`, :issue:`2918`)

  - value_counts() now accepts a "normalize" argument, for normalized
    histograms. (:issue:`2710`).

  - DataFrame.from_records now accepts not only dicts but any instance of
    the collections.Mapping ABC.

  - added option `display.mpl_style` providing a sleeker visual style
    for plots. Based on https://gist.github.com/huyng/816622 (:issue:`3075`).

  - Treat boolean values as integers (values 1 and 0) for numeric
    operations. (:issue:`2641`)

  - to_html() now accepts an optional "escape" argument to control reserved
    HTML character escaping (enabled by default) and escapes ``&``, in addition
    to ``<`` and ``>``.  (:issue:`2919`)

See the :ref:`full release notes
<release>` or issue tracker
on GitHub for a complete list.