Combine two columns with same column name using pandas

Question

I have a dataframe (df) where two columns are of different length. I would like to combine these two columns as one column. How do I do that?

The table looks like the following:

Col_1	Col_2	Col_1	Col3
A1	12	A1	345
A2	34	A2	980
A3	098	A3	543
		A4	8765
		A5	765

I would like to combine Col_1 as one column in the table. The output should look like the following:

Desired Output:

Col_1	Col_2	Col3
A1	12	345
A2	34	980
A3	098	543
A4		8765
A5		765

What I tried so far?

df1 = df.columns[0]
df2 = df.columns[2]
df_merge_col = pd.merge(df1, df2, on='Col_1')

or

df["Col_1"] = df["Col_1"].astype(str) + df["Col_1"]

mozway · Accepted Answer · 2021-09-23 12:10:10Z

2

Here is a generic solution using MultiIndex and stack.

In summary, it de-duplicates the columns by adding a unique id, which we then use to stack the data. The rest is only sorting/cleaning.

(df.set_axis(pd.MultiIndex.from_arrays([df.columns,
                                        df.groupby(level=0, axis=1).cumcount()
                                       ]), axis=1)
   .stack(level=1)
   .sort_index(level=1)
   .droplevel(1)
   .drop_duplicates(subset=df.columns[df.columns.duplicated()])
)

output:

  Col_1   Col_2  Col_3
0    A1    12.0  345.0
1    A2    34.0  980.0
2    A3    98.0  543.0
3    A4  8765.0    NaN
4    A5   765.0    NaN

edited Sep 23, 2021 at 12:10

answered Sep 23, 2021 at 11:56

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

mozway Over a year ago

I edited a bit the code to avoid modifying the original df, but it's the same

jezrael · Accepted Answer · 2021-09-23 11:55:59Z

1

If duplicated columns names for filtering by position use DataFrame.iloc, then remove NaNs rows by DataFrame.dropna with outer join:

df1 = df.iloc[:, :2].dropna(subset=['Col_1'])
df2 = df.iloc[:, 2:].dropna(subset=['Col_1'])
df_merge_col = pd.merge(df1, df2, on='Col_1', how='outer')
print (df_merge_col)
  Col_1  Col_2  Col3
0    A1   12.0   345
1    A2   34.0   980
2    A3   98.0   543
3    A4    NaN  8765
4    A5    NaN   765

edited Sep 23, 2021 at 11:55

answered Sep 23, 2021 at 11:50

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

2 Comments

Eliza R Over a year ago

I have the same problem but with more then 2 columns with the same name . Is there any way to generalize this solution?

jezrael Over a year ago

@ElizaR - hmm, if mozway solution not working the best post new question.

Daweo · Accepted Answer · 2021-09-23 11:59:39Z

0

You might use np.where for this task following way:

import numpy as np
import pandas as pd
df = pd.DataFrame([['A1', 'A1'], ['A2', 'A2'], ['A3', None], ['A4', None]])
combined = np.where(df[0].isnull(), df[1], df[0])
df.drop(columns=[0, 1], inplace=True)
df[0] = combined
print(df)

output

np.where is like ternary operator for pandas.Series where column 0 has not value get value from column 1 otherwise from column 0

answered Sep 23, 2021 at 11:59

Daweo

38.2k3 gold badges17 silver badges33 bronze badges

Collectives™ on Stack Overflow

Combine two columns with same column name using pandas

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related