Python – IndexingError using Boolean Indexing


I am trying to index a dataframe using a boolean Series similar to here

In [1]: import pandas as pd
In [2]: idx = pd.Index(["USD.CAD", "AUD.NZD", "EUR.USD", "GBP.USD"],
   ...:                name="Currency Pair")
In [3]: pairs = pd.DataFrame({"mean":[3.6,5.1,3.6,2.7], "count":[1,5,8,2]}, index=idx)
In [4]: mask = pairs.reset_index().loc[:,"Currency Pair"].str.contains("USD")

In [5]: pairs.reset_index()[mask]
  Currency Pair  count  mean
0       USD.CAD      1   3.6
2       EUR.USD      8   3.6
3       GBP.USD      2   2.7

The above works as expected however when I try with the original dataframe without the index reset I get the following error

In [6]: pairs[mask]
C:\Anaconda\lib\site-packages\pandas\core\ UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  "DataFrame index.", UserWarning)
IndexingError                             Traceback (most recent call last)
<ipython-input-6-9eca5ffbdaf7> in <module>()
----> 1 pairs[mask]

C:\Anaconda\lib\site-packages\pandas\core\frame.pyc in __getitem__(self, key)
   1772         if isinstance(key, (Series, np.ndarray, Index, list)):
   1773             # either boolean or fancy integer index
-> 1774             return self._getitem_array(key)
   1775         elif isinstance(key, DataFrame):
   1776             return self._getitem_frame(key)

C:\Anaconda\lib\site-packages\pandas\core\frame.pyc in _getitem_array(self, key)
   1812             # _check_bool_indexer will throw exception if Series key cannot
   1813             # be reindexed to match DataFrame rows
-> 1814             key = _check_bool_indexer(self.index, key)
   1815             indexer = key.nonzero()[0]
   1816             return self.take(indexer, axis=0, convert=False)

C:\Anaconda\lib\site-packages\pandas\core\indexing.pyc in _check_bool_indexer(ax, key)
   1637         mask = com.isnull(result.values)
   1638         if mask.any():
-> 1639             raise IndexingError('Unalignable boolean Series key provided')
   1641         result = result.astype(bool).values

IndexingError: Unalignable boolean Series key provided

I am confused by this error since my impression was this was an error received if the boolean index length differed from that of the dataframe? Which is not the case as can be seen below.

In [7]: len(mask)
Out[7]: 4
In [8]: len(pairs)
Out[8]: 4
In [9]: len(pairs.reset_index())
Out[9]: 4

Best Solution

I figured I would put down the solution @EdChum indicated in the comments. The issue as he indicated was that the mask.index does not agree with pairs.index. Replacing the index of mask with the index from pairs we get the expected behaviour.

In[10]: mask.index = pairs.index.copy()
In[11]: pairs[mask]
               count  mean
Currency Pair             
USD.CAD            1   3.6
EUR.USD            8   3.6
GBP.USD            2   2.7