merge dataframe with the same columns name

Question

Hi i have a dataframe that looks like that :

Unnamed: 0	X1	Unnamed: 1	X2	Unnamed: 1	X3	Unnamed: 2	X4
1970-01-31	5.0	1970-01-31	1.0	1970-01-31	1.0	1980-01-30	1.0
1970-02-26	6.0	1970-02-26	3.0	1970-02-26	3.0	1980-02-26	3.0

I have many columns (631) that looks like that.

I would like to have :

date	X1	X2	X3	X4
1970-01-31	5.0	1.0	1.0	na
1970-02-26	6.0	3.0	3.0	na
1980-01-30	na	na	na	1.0
1980-02-26	na	na	na	3.0

I tried :

res_df = pd.concat(
    df2[[date, X]].rename(columns={date: "date"}) for date, X in zip(df2.columns[::2], 
    df2.columns[1::2])
    ).pivot_table(index="date")

It works for small data but do not work for mine. Maybe because I have the same columns name 'Unnamed: 1' in my df. I have a message error:

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

where is the data sourced from? you have duplicate column names there. It might be possible to get a cleaner solution, if we can control/influence how the input dataframe is created — sammywemmy
– sammywemmy, Commented Aug 17, 2022 at 12:12
It is from an excel file. I read multiple worksheets with instruction : > df = pd.concat(pd.read_excel(workbook_url, sheet_name=None, skiprows=[0]), axis=1).droplevel(0, axis=1) — Jacques Tebeka
– Jacques Tebeka, Commented Aug 17, 2022 at 16:30

jezrael · Accepted Answer · 2022-08-18 06:14:16Z

1

Crete index by date varible and use axis=1 in concat:

res_df = (pd.concat((df2[[date, X]].set_index(date) 
                     for date, X in zip(df2.columns[::2],  df2.columns[1::2])), axis=1)
            .rename_axis('date')
            .reset_index())
print (res_df)
         date   X1   X2   X3   X4
0  1970-01-31  5.0  1.0  1.0  NaN
1  1970-02-26  6.0  3.0  3.0  NaN
2  1980-01-30  NaN  NaN  NaN  1.0
3  1980-02-26  NaN  NaN  NaN  3.0

EDIT: Error seems like duplicated columns names in your DataFrame, possible solution is deduplicated before apply solution above:

df = pd.DataFrame(columns=['a','a','b'], index=[0])

#you can test if duplicated columns names
print (df.columns[df.columns.duplicated(keep=False)])
Index(['a', 'a'], dtype='object')

#https://stackoverflow.com/a/43792894/2901002
df.columns = pd.io.parsers.ParserBase({'names':df.columns})._maybe_dedup_names(df.columns)
print (df.columns)
Index(['a', 'a.1', 'b'], dtype='object')

edited Aug 18, 2022 at 6:14

answered Aug 17, 2022 at 10:31

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Jacques Tebeka Over a year ago

Still an error which is : ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

Jacques Tebeka Over a year ago

It still works when I do not have the same columns name but not where I have the same one !

jezrael Over a year ago

@Jacques - Answer was edited.

Collectives™ on Stack Overflow

merge dataframe with the same columns name

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related