concatinating 2 pandas dataframes does not work

Question

I have 2 dataframes of size 31789x7 and 31789x3. I want to create a 31789x10 dataframe. This works in principle with

df3 = pd.concat([df1, df2], axis=1)

for artificial data in half a second. But on my data the concat does not finish within 10 min. If I do it "manually" with:

for c in df2:
    df1[c] = df2[c]

it crashed with:

ValueError: cannot reindex from a duplicate axis

What is the problem here? (ignore_index=True does not help)

I'm confused, you're issue is that pd.concat is taking a long time? Or that your for loop is throwing an error? In either event a Minimal Complete Verifiable Example would help! — Aaron Brock
– Aaron Brock, Commented Apr 25, 2018 at 14:08

BENY · Accepted Answer · 2018-04-25 14:32:46Z

1

You can try with reindex and assign the value only

df1=df1.reindex(columns=list(df2)+list(df1))
df1[list(df2)]=df2.values

answered Apr 25, 2018 at 14:27

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

@jezrael yep that is true , cause the np.array only can keep one type

jezrael · Accepted Answer · 2018-04-25 14:38:54Z

1

One idea is create default RangeIndex first:

df3 = pd.concat([df1.reset_index(drop=True), 
                 df2.reset_index(drop=True)], axis=1)

df1.reset_index(drop=True, inplace=True)
df2.reset_index(drop=True, inplace=True)

for c in df2:
    df1[c] = df2[c]

Af same types of all columns (e.g. integers), use numpy.hstack:

c = df1.columns.append(df2.columns)
df = pd.DataFrame(np.hstack((df1.values, df2.values)), columns=c)

answered Apr 25, 2018 at 14:28

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges