Python – How to apply function to dataframe in place

pandaspythonscipyvectorization

Is there a way I could use a scipy function like norm.cdf in place on a numpy.array (or pandas.DataFrame), using a variant of numpy.apply, numpy.apply_along_axs, etc?


The background is, I have a table of z-score values that I would like to convert to CDF values of the norm distribution. I'm currently using norm.cdf from scipy for this.

I'm currently manipulating a dataframe that has non-numeric values.

      Name      Val1      Val2      Val3      Val4 
0        A -1.540369 -0.077779  0.979606 -0.667112   
1        B -0.787154  0.048412  0.775444 -0.510904   
2        C -0.477234  0.414388  1.250544 -0.411658   
3        D -1.430851  0.258759  1.247752 -0.883293   
4        E -0.360181  0.485465  1.123589 -0.379157

(Making the Name variable an index is a solution, but in my actual dataset, the names are not alphabetical characters.)

To modify only the numeric data, I'm using df._get_numeric_data() a private function that returns a dataframe containing a dataframe's numeric data. However, there is no set function. Hence, if I call

norm.cdf(df._get_numeric_data)

this won't change df's original data.

I'm trying to circumvent this by applying norm.cdf to the numeric dataframe inplace, so this changes my original dataset.

Best Solution

I think I would prefer select_dtypes over _get_numeric_data:

In [11]: df.select_dtypes(include=[np.number])
Out[11]:
       Val1      Val2      Val3      Val4
0 -1.540369 -0.077779  0.979606 -0.667112
1 -0.787154  0.048412  0.775444 -0.510904
2 -0.477234  0.414388  1.250544 -0.411658
3 -1.430851  0.258759  1.247752 -0.883293
4 -0.360181  0.485465  1.123589 -0.379157

Although apply doesn't offer an inplace, you could do something like the following (which I would argue was more explicit anyway):

num_df = df.select_dtypes(include=[np.number])
df[num_df.columns] = norm.cdf(num_df.values)