0

I am having problems with calculating in dataframes.

I am calculating the

  • %-change per column
  • %-share per column based on the sum of the row

Now I want to calculate the %-change per column based on the %-share per column. I would do that like the following:

(col A %-change * col A %-share) + (col B %-change * col B %-share) + (col C %-change * col C %-share) and so on.

I thought of two ways to do that, though neither is working… And – of course – I would like to do it as dynamically as possible as the number of columns will vary.

Here is some code to get a fitting dataframe:

    import pandas as pd
    import numpy as np

    df1 = pd.DataFrame(np.array([
    ['2019-04-29', 5, 5, 10],
    ['2019-04-28', 10, 10, 50],
    ['2019-04-27', 20, 5, 100]]),
    columns=['Date', 'attr1', 'attr2', 'attr3'])

    df1["attr1"] = pd.to_numeric(df1["attr1"])
    df1["attr2"] = pd.to_numeric(df1["attr2"])
    df1["attr3"] = pd.to_numeric(df1["attr3"])

    df1 = df1.set_index(['Date'])
    col_list= list(df1)

    df1['sum'] = df1[col_list].sum(axis=1)

    for i in col_list:
    df1[i + ' %-change'] = df1[i].pct_change()*100
    df1[i + " %-share"] = df1[i]/ df1['sum']

Idea 1: Basically I am trying the formula mentioned above. So multiply the %-share and the %-change of each column and sum these up. I thought about using .sum but don´t know how to use it in this specific situation.

df1[i + ' %-change incl share'] = (df1[i + ' %-share'] * df1[i + ' %-change'])

Idea 2: Thought about doing this in two steps. Step one would be to multiply the %-share and the %-change of each column. The second step would be to sum up the new columns in a new code line. The problem is I can´t make it dynamically. I would have to manually select the columns I need to summarize, but if I am having let’s say a 100 columns that would be tedious work indeed.

I thought about writing the newly calculated columns into a new dataframe so I´ve got all the columns I need, but the data in the dataframe make no sense to me.

df2[i + ' %-change incl share'] = (df1[i + ' %-share'] * df1[i + ' %-change'])

As always any help is greatly appreciated.

Thanks a lot!

1 Answer 1

1

You could try to use dataframe.apply with lumbda function.

# get the column name out
changeCol = [col for col in df1.columns if 'change' in col]
shareCol = [col for col in df1.columns if 'share' in col]

# calculate
result = df1.apply(lambda x: sum([x[changeCol[i]]*x[shareCol[i]] for i in range(len(changeCol))]), axis=1)

# after you could append the result to your dataframe or do whatever you like.
# df1["result"] = result
# ...


[ref] pandas.DataFrame.apply

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.