0

I have a dataframe DF1 which looks like this:

Account Name Task Type Flag Cost
Account 1 Repair True $100
Account 2 Repair True $200
Account 3 Repair False $300

DF2 looks like this:

Country Percentage
US 30%
Canada 20%
India 50%

I want to create DF3 based on DF1 & DF2 by doing the following:

  1. Filter rows with where the Flag = True
  2. Create a new column 'Calculated_Cost' which will multiply the 'Cost' column in DF1 with percentage column of DF2 & create multiple rows based on the number of rows in DF2

The Final output would look like this:

Account Name Task Type Flag Cost Country Calculated_Cost
Account 1 Repair True $100 US $30
Account 1 Repair True $100 Canada $20
Account 1 Repair True $100 India $50
Account 2 Repair True $200 US $60
Account 2 Repair True $200 Canada $40
Account 2 Repair True $200 India $100
Account 3 Repair False $300 Nan Nan
2
  • 1
    Welcome to SO. Your problem is well formulated, but what have you tried? Is there a minimal reproducible example ? Commented May 10, 2022 at 5:13
  • Hello... I am a beginner level coder and i am a bit stumped with the problem. I have tried multiple approaches such as iterating through rows in the dataframe, building a function which will do the multiplication etc. but I just dont have the skill to solve this... I only have bits and pieces of code that I have written but i dont think that will be helpful for you guys Commented May 10, 2022 at 5:21

2 Answers 2

1

Use:

df1['Cost'] = df1['Cost'].str.lstrip('$').astype(int)
df2['Percentage'] = df2['Percentage'].str.rstrip('%').astype(int).div(100)

df = pd.concat([df1[df1['Flag']].merge(df2, how='cross'), df1[~df1['Flag']]])
df['Calculated_Cost'] = df['Cost'].mul(df.pop('Percentage'))
print (df)
  Account Name Task Type   Flag  Cost Country  Calculated_Cost
0    Account 1    Repair   True   100      US             30.0
1    Account 1    Repair   True   100  Canada             20.0
2    Account 1    Repair   True   100   India             50.0
3    Account 2    Repair   True   200      US             60.0
4    Account 2    Repair   True   200  Canada             40.0
5    Account 2    Repair   True   200   India            100.0
2    Account 3    Repair  False   300     NaN              NaN
Sign up to request clarification or add additional context in comments.

2 Comments

Hello... thanks for the answer Jezrael.. I tried the above code but I am getting this error: raise KeyError(f"None of [{key}] are in the [{axis_name}]") KeyError: "None of [Index(['True', 'True', 'False'], dtype='object')] are in the [columns]"
In @jezrael 's example, he converted your flags to booleans. Use df1.Flag = df1.Flag == 'True' to convert them to booleans.
0

I am sure there is a more efficient way to do this... but I got it done using the following code:

import pandas as pd

df1 = pd.DataFrame(
    {
     'Account Name': ['Account 1', 'Account 2', 'Account 3'],
     'Task Type': ['Repair', 'Repair', 'Repair'],
     'Flag': ['True', 'True', 'False'],
     'Cost': ['$100', '$200', '$300']
       }
    )

df2 = pd.DataFrame(
    {
     'Country': ['US', 'Canada', 'India'],
     'Percentage': ['30%', '20%', '50%']
       }
    )

df1['Cost'] = df1['Cost'].str.lstrip('$').astype(int)
df2['Percentage'] = df2['Percentage'].str.rstrip('%').astype(int).div(100)
filtered_df_true = df1.loc[df1['Flag'] == 'True'] 
filtered_df_false = df1.loc[df1['Flag'] == 'False']
df3 = filtered_df_true.assign(key=1).merge(df2.assign(key=1), how = 'outer', on='key')
df3['Calculated Cost'] = df3['Cost']*df3['Percentage']
frames = [df3, filtered_df_false]
result = pd.concat(frames)
result.pop('key')
result.pop('Percentage')
print(result)

1 Comment

Since this is your own attempt, please edit your question and add it to the end.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.