0

Hi i have a dataframe that looks like that :

Unnamed: 0 X1 Unnamed: 1 X2 Unnamed: 1 X3 Unnamed: 2 X4
1970-01-31 5.0 1970-01-31 1.0 1970-01-31 1.0 1980-01-30 1.0
1970-02-26 6.0 1970-02-26 3.0 1970-02-26 3.0 1980-02-26 3.0

I have many columns (631) that looks like that.

I would like to have :

date X1 X2 X3 X4
1970-01-31 5.0 1.0 1.0 na
1970-02-26 6.0 3.0 3.0 na
1980-01-30 na na na 1.0
1980-02-26 na na na 3.0

I tried :

res_df = pd.concat(
    df2[[date, X]].rename(columns={date: "date"}) for date, X in zip(df2.columns[::2], 
    df2.columns[1::2])
    ).pivot_table(index="date")

It works for small data but do not work for mine. Maybe because I have the same columns name 'Unnamed: 1' in my df. I have a message error:

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

2
  • where is the data sourced from? you have duplicate column names there. It might be possible to get a cleaner solution, if we can control/influence how the input dataframe is created Commented Aug 17, 2022 at 12:12
  • It is from an excel file. I read multiple worksheets with instruction : > df = pd.concat(pd.read_excel(workbook_url, sheet_name=None, skiprows=[0]), axis=1).droplevel(0, axis=1) Commented Aug 17, 2022 at 16:30

1 Answer 1

1

Crete index by date varible and use axis=1 in concat:

res_df = (pd.concat((df2[[date, X]].set_index(date) 
                     for date, X in zip(df2.columns[::2],  df2.columns[1::2])), axis=1)
            .rename_axis('date')
            .reset_index())
print (res_df)
         date   X1   X2   X3   X4
0  1970-01-31  5.0  1.0  1.0  NaN
1  1970-02-26  6.0  3.0  3.0  NaN
2  1980-01-30  NaN  NaN  NaN  1.0
3  1980-02-26  NaN  NaN  NaN  3.0

EDIT: Error seems like duplicated columns names in your DataFrame, possible solution is deduplicated before apply solution above:

df = pd.DataFrame(columns=['a','a','b'], index=[0])

#you can test if duplicated columns names
print (df.columns[df.columns.duplicated(keep=False)])
Index(['a', 'a'], dtype='object')

#https://stackoverflow.com/a/43792894/2901002
df.columns = pd.io.parsers.ParserBase({'names':df.columns})._maybe_dedup_names(df.columns)
print (df.columns)
Index(['a', 'a.1', 'b'], dtype='object')
Sign up to request clarification or add additional context in comments.

3 Comments

Still an error which is : ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
It still works when I do not have the same columns name but not where I have the same one !
@Jacques - Answer was edited.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.