I have 2 input dataframes (df1 and df2), with identical structure, and I want to create a 3rd one (output_df), with all row combinations of the input dataframes.
df1 = pd.DataFrame([["John","18","a"],["Jane","19","b"],["Jim","20","c"]],columns=['Name','Age','Function'])
df2 = pd.DataFrame([["Don","21","d"],["Diana","22","e"],["Dave","23","f"]],columns=['Name','Age','Function'])
output_df=pd.DataFrame([["John_Don","18_21","a_d"],
["John_Diana","18_22","a_e"],
["John_Dave","18_23","a_f"],
["Jane_Don","19_21","b_d"],
["Jane_Diana","19_22","b_e"],
["Jane_Dave","19_23","b_f"],
["Jim_Don","20_21","c_d"],
["Jim_Diana","20_22","c_e"],
["Jim_Dave","20_23","c_f"]],columns=['Name','Age','Function'])
The new dataframe would have the sum ("+") of the corresponding columns of the initial dataframe. (I am aware strings get concatenated - that is what I am after if inputs are strings)
The below code creates the output_df, but it is empty and the code is taking too long to run. The below sample code only runs for 2x10 records as input. Eventually, I will be dealing with thousands of records as input from each dataframe.
Q1: what am I missing when populating the output dataframe?
Q2: how can I make my code more efficient?
output_df=pandas.DataFrame(columns=['Name','Age','Function'])
i=0
for lendf1 in range (10):
for lendf2 in range(10):
output_df=output_df.append(pandas.Series(),ignore_index=True)
i=i+1
for column in output_df:
output_df[column][i]=df1[column][lendf1:lendf1+1]+df2[column][lendf2:lendf2+1]
pandas.concat(...)?df1anddf2, as well as the expected output?pandas.concatcreate a df with 100 elements, based on 2x10 dfs as input...? I am not looking for horizonatl combinations, but for all combinations; 10x10, based on my inputs