0

I have 2 input dataframes (df1 and df2), with identical structure, and I want to create a 3rd one (output_df), with all row combinations of the input dataframes.

df1 = pd.DataFrame([["John","18","a"],["Jane","19","b"],["Jim","20","c"]],columns=['Name','Age','Function'])
df2 = pd.DataFrame([["Don","21","d"],["Diana","22","e"],["Dave","23","f"]],columns=['Name','Age','Function'])

output_df=pd.DataFrame([["John_Don","18_21","a_d"],
["John_Diana","18_22","a_e"],
["John_Dave","18_23","a_f"],
["Jane_Don","19_21","b_d"],
["Jane_Diana","19_22","b_e"],
["Jane_Dave","19_23","b_f"],
["Jim_Don","20_21","c_d"],
["Jim_Diana","20_22","c_e"],
["Jim_Dave","20_23","c_f"]],columns=['Name','Age','Function'])

The new dataframe would have the sum ("+") of the corresponding columns of the initial dataframe. (I am aware strings get concatenated - that is what I am after if inputs are strings)

The below code creates the output_df, but it is empty and the code is taking too long to run. The below sample code only runs for 2x10 records as input. Eventually, I will be dealing with thousands of records as input from each dataframe.

Q1: what am I missing when populating the output dataframe?

Q2: how can I make my code more efficient?

output_df=pandas.DataFrame(columns=['Name','Age','Function'])
i=0
for lendf1 in range (10):
    for lendf2 in range(10):
        output_df=output_df.append(pandas.Series(),ignore_index=True)
        i=i+1
        for column in output_df:
            output_df[column][i]=df1[column][lendf1:lendf1+1]+df2[column][lendf2:lendf2+1]
5
  • Have you tried using pandas.concat(...) ? Commented Mar 21, 2019 at 12:30
  • 2
    Could you post some example data from df1 and df2, as well as the expected output? Commented Mar 21, 2019 at 12:35
  • @Antry - can pandas.concat create a df with 100 elements, based on 2x10 dfs as input...? I am not looking for horizonatl combinations, but for all combinations; 10x10, based on my inputs Commented Mar 21, 2019 at 12:40
  • @horace_vr Sorry i'm not understanding the type of transformation you're trying to accomplish Commented Mar 21, 2019 at 13:21
  • @PeterLeimbigler - I added sample input and outputs Commented Mar 21, 2019 at 17:30

2 Answers 2

3

I believe you are looking for this:

first = pd.Series(['a', 'b', 'c', 'd', 'e'])
second = pd.Series(['f', 'g', 'h', 'i', 'j'])
pd.DataFrame(np.add.outer(first, second))

Output:

    0   1   2   3   4
0  af  ag  ah  ai  aj
1  bf  bg  bh  bi  bj
2  cf  cg  ch  ci  cj
3  df  dg  dh  di  dj
4  ef  eg  eh  ei  ej

Note that the input should be of type pd.Series and not dataframes.

Sign up to request clarification or add additional context in comments.

Comments

1

I think you are trying to concatenate both the dataframe's columns. Please try the following code works for you.

import pandas as pd

df1 = pd.DataFrame([["John","18","a"],["Jane","19","b"],["Jim","20","c"]],columns=['Name','Age','Function'])
df2 = pd.DataFrame([["Don","21","d"],["Diana","22","e"],["Dave","23","f"]],columns=['Name','Age','Function'])

cols = list(df1)

out_list = []
for ind1, row1 in df1.iterrows():
    for ind2, row2 in df2.iterrows():
        in_list = []
        for i in range(0, len(cols)):
            in_list.append(row1[cols[i]] + '_' + row2[cols[i]])
        out_list.append(in_list)

outdf = pd.DataFrame(out_list, columns=cols)
print outdf

3 Comments

With your example, I would need 9 rows as output; all combinations of 1 element from df1 with 1 element from df2
@horace_vr could you please post sample output ?
@horace_vr I have updated the code , Please try it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.