2

I have a dataframe (df) where two columns are of different length. I would like to combine these two columns as one column. How do I do that?

The table looks like the following:

Col_1 Col_2 Col_1 Col3
A1 12 A1 345
A2 34 A2 980
A3 098 A3 543
A4 8765
A5 765

I would like to combine Col_1 as one column in the table. The output should look like the following:

Desired Output:

Col_1 Col_2 Col3
A1 12 345
A2 34 980
A3 098 543
A4 8765
A5 765

What I tried so far?

df1 = df.columns[0]
df2 = df.columns[2]
df_merge_col = pd.merge(df1, df2, on='Col_1')

or

df["Col_1"] = df["Col_1"].astype(str) + df["Col_1"]

3 Answers 3

2

Here is a generic solution using MultiIndex and stack.

In summary, it de-duplicates the columns by adding a unique id, which we then use to stack the data. The rest is only sorting/cleaning.

(df.set_axis(pd.MultiIndex.from_arrays([df.columns,
                                        df.groupby(level=0, axis=1).cumcount()
                                       ]), axis=1)
   .stack(level=1)
   .sort_index(level=1)
   .droplevel(1)
   .drop_duplicates(subset=df.columns[df.columns.duplicated()])
)

output:

  Col_1   Col_2  Col_3
0    A1    12.0  345.0
1    A2    34.0  980.0
2    A3    98.0  543.0
3    A4  8765.0    NaN
4    A5   765.0    NaN
Sign up to request clarification or add additional context in comments.

1 Comment

I edited a bit the code to avoid modifying the original df, but it's the same
1

If duplicated columns names for filtering by position use DataFrame.iloc, then remove NaNs rows by DataFrame.dropna with outer join:

df1 = df.iloc[:, :2].dropna(subset=['Col_1'])
df2 = df.iloc[:, 2:].dropna(subset=['Col_1'])
df_merge_col = pd.merge(df1, df2, on='Col_1', how='outer')
print (df_merge_col)
  Col_1  Col_2  Col3
0    A1   12.0   345
1    A2   34.0   980
2    A3   98.0   543
3    A4    NaN  8765
4    A5    NaN   765

2 Comments

I have the same problem but with more then 2 columns with the same name . Is there any way to generalize this solution?
@ElizaR - hmm, if mozway solution not working the best post new question.
0

You might use np.where for this task following way:

import numpy as np
import pandas as pd
df = pd.DataFrame([['A1', 'A1'], ['A2', 'A2'], ['A3', None], ['A4', None]])
combined = np.where(df[0].isnull(), df[1], df[0])
df.drop(columns=[0, 1], inplace=True)
df[0] = combined
print(df)

output

    0
0  A1
1  A2
2  A3
3  A4

np.where is like ternary operator for pandas.Series where column 0 has not value get value from column 1 otherwise from column 0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.