Python – pandas – groupby by partial string

pandaspythonstring

I would like to group a DataFrame by partial substrings. This is a sample .csv file:

GridCode,Key
1000,Colour
1000,Colours
1001,Behaviours
1001,Behaviour
1002,Favourite
1003,COLORS
1004,Honours

What I did so far is importing the file as df = pd.read_csv(sample.csv), and then I put all the strings to lowercases with df['Key'] = df['Key'].str.lower(). The first thing I tried is groupby by GridCode and Key with:

g = df.groupby([df['GridCode'],df['Key']]).size()

then unstack and fill:

d = g.unstack().fillna(0)

and the resulting DataFrame is:

Key       behaviour  behaviours  colors  colour  colours  favourite  honours
GridCode                                                                    
1000              0           0       0       1        1          0        0
1001              1           1       0       0        0          0        0
1002              0           0       0       0        0          1        0
1003              0           0       1       0        0          0        0
1004              0           0       0       0        0          0        1

Now what I would like to do is to group only strings containing the substring 'our', in this case avoiding only the colors Key, creating a new column with the desired substring.
The expected result would be like:

Key       'our'
GridCode                                                                    
1000        2              
1001        2
1002        1
1003        0
1004        1

I tried also to mask the DataFrame with masked = df['Key'].str.contains('our'), then df1 = df[mask], but I can't figured out how to make a new column with the new groupby counts. Any help would be really appreciated.

Best Solution

>>> import re  # for the re.IGNORECASE flag
>>> df['Key'].str.contains('our', re.IGNORECASE).groupby(df['GridCode']).sum()
GridCode
1000        2
1001        2
1002        1
1003        0
1004        1
Name: Key, dtype: float64

also, instead of

df.groupby([df['GridCode'],df['Key']])

it is better to do:

df.groupby(['GridCode', 'Key'])