DataFrame apply function using another DataFrame

Question

I'm trying to apply a function to all the columns of a pandas DataFrame. The function consists on divide each column (considered a pandas Series) by a parameter indicated on another DataFrame (df_reference), to which I access through the column name (Series.name).

Nevertheless, the operation is not working and the final df is full of NaNs values. I think is failing the way I'm inferring the name of the column on each iteration.

Here I show the code:

# This is an example of the df I'd like to operate over:

df = pd.DataFrame({'P01':np.random.random(50),
                   'P02':np.random.random(50)},
                   index=pd.period_range(start='2015-03-09', periods=50))

>>> df

              P01          P02
2015-03-09  0.575955    0.735709
2015-03-10  0.290656    0.989249
2015-03-11  0.859850    0.387678
2015-03-12  0.939810    0.085914
2015-03-13  0.278855    0.031567
   ...        ...         ...

# This is an example of the reference df I'd like to consult about:

df_reference = pd.DataFrame({'ID':['P01', 'P02'], 'Lat':[37.261, 37.258],
                             'Lon':[-6.431, -6.433], 'Z':[-0.63, -0.825]})

>>> df_reference

    ID    Lat     Lon      Z
0   P01 37.261  -6.431  -0.630
1   P02 37.258  -6.433  -0.825

Apply operation:

df.apply(lambda x: x/df_reference.loc[df_reference['ID']==x.name]['Z'], axis=1)

Result:

            P01 P02
2015-03-09  NaN NaN
2015-03-10  NaN NaN
2015-03-11  NaN NaN
2015-03-12  NaN NaN
   ...      ... ...

Any clue on what could be happening?

x.name does not contain the column name but the index label since you use axis=1 — Corralien
– Corralien, Commented Feb 5, 2022 at 12:51

Corralien · Accepted Answer · 2022-02-05 13:18:56Z

2

Try:

>>> df / df_reference.set_index('ID')['Z']

# OR

>>> df.apply(lambda x: x/(df_reference.set_index('ID').loc[x.name].Z))

                 P01       P02
2015-03-09 -1.130257 -0.633978
2015-03-10 -0.367410 -0.655255
2015-03-11 -1.358091 -0.405920
2015-03-12 -0.085972 -0.637737
2015-03-13 -0.031896 -0.306626
2015-03-14 -0.934217 -0.257150
2015-03-15 -0.081206 -0.461807
2015-03-16 -1.100641 -1.202574
2015-03-17 -0.523478 -0.354512
2015-03-18 -0.303866 -1.030580

edited Feb 5, 2022 at 13:18

answered Feb 5, 2022 at 12:53

Corralien

121k8 gold badges44 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Corralien Over a year ago

Is it what you expect?

Miguel Gonzalez Over a year ago

Nope... I'd need to vinculate the ID element on each column (P01 or P02)

Corralien Over a year ago

Can you update your post with the expected output from 2015-03-09 to 2015-03-12 please?

Corralien Over a year ago

Can you explain me how did you find this result, please?

Miguel Gonzalez Over a year ago

I forgot it was random arrays... Of course the result will vary on each execution. It's working now for me just setting the index as you say! Thanks a lot!

|

Collectives™ on Stack Overflow

DataFrame apply function using another DataFrame

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related