2

I have 2 dataframes of size 31789x7 and 31789x3. I want to create a 31789x10 dataframe. This works in principle with

df3 = pd.concat([df1, df2], axis=1)

for artificial data in half a second. But on my data the concat does not finish within 10 min. If I do it "manually" with:

for c in df2:
    df1[c] = df2[c]

it crashed with:

ValueError: cannot reindex from a duplicate axis

What is the problem here? (ignore_index=True does not help)

1
  • 1
    I'm confused, you're issue is that pd.concat is taking a long time? Or that your for loop is throwing an error? In either event a Minimal Complete Verifiable Example would help! Commented Apr 25, 2018 at 14:08

2 Answers 2

1

You can try with reindex and assign the value only

df1=df1.reindex(columns=list(df2)+list(df1))
df1[list(df2)]=df2.values
Sign up to request clarification or add additional context in comments.

1 Comment

@jezrael yep that is true , cause the np.array only can keep one type
1

One idea is create default RangeIndex first:

df3 = pd.concat([df1.reset_index(drop=True), 
                 df2.reset_index(drop=True)], axis=1)

df1.reset_index(drop=True, inplace=True)
df2.reset_index(drop=True, inplace=True)

for c in df2:
    df1[c] = df2[c]

Af same types of all columns (e.g. integers), use numpy.hstack:

c = df1.columns.append(df2.columns)
df = pd.DataFrame(np.hstack((df1.values, df2.values)), columns=c)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.