pandas add index to column name

Question

I have data frame with about 100 columns that repeat itself because the data is organized by weeks, it looks something like that:

hours	hours	clicks	clicks	days	days	minutes	minutes
week 1	week 2	week 1	week 2	week 1	week 2	week 1	week 2
2	2	2	3	6	2	2	3
1	7	6	3	8	2	9	3

I would like the output to look like this:

hours_w1	hours_w2	clicks_w1	clicks_w2	days_w1	days_w2	minutes_w1	minutes_w2
2	2	2	3	6	2	2	3
1	7	6	3	8	2	9	3

I know I can just rename the columns but because I have over 100 columns I'm looking for a more efficient way.

I tried to use add_suffix but had only managed to add the same suffix to all columns, when what I need is a different index for each week.

any idea how to do this?

Thanks!!

please provide the output of df.head().to_dict() for clarity, it's unclear whether you have a MultiIndex of data as header — mozway
– mozway, Commented Jan 16, 2023 at 16:00

Corralien · Accepted Answer · 2023-01-16 16:47:46Z

2

Extract the suffixes from the first row then add them to the column names and finally remove the first row.

# To fix mangle_dup_cols
df.columns = df.columns.str.split('.').str[0]
suffixes = '_' + df.iloc[0].str[0] + df.iloc[0].str[-1]
df.columns += suffixes
df = df.iloc[1:]

Output:

>>> df
  hours_w1 hours_w2 clicks_w1 clicks_w2 days_w1 days_w2 minutes_w1 minutes_w2
1        2        2         2         3       6       2          2          3
2        1        7         6         3       8       2          9          3

edited Jan 16, 2023 at 16:47

answered Jan 16, 2023 at 16:03

Corralien

121k8 gold badges44 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

kri Over a year ago

thanks! It works well for the first column (with _w1) but for the second column (for week 2), I get this: hours.1_w2. any idea why it add the .1 and how to remove it?

Corralien Over a year ago

The problem is when you read the file with read_csv or read_excel. Try before df.columns = df.columns.str.split('.').str[0]. I updated my answer.

Arya Sadeghi · Accepted Answer · 2023-01-16 16:00:37Z

1

first you should change the first row:

df.iloc[0] = df.iloc[0].apply(lambda x:'w1' if x == 'week 1' else 'w2')

Then you can merge it with the column name like this:

df.columns = [i + '_' + j for i, j in zip(df.columns, df.iloc[0])]

And then you can remove the first row:

df = df.iloc[1:]

answered Jan 16, 2023 at 16:00

Arya Sadeghi

5003 silver badges17 bronze badges

Collectives™ on Stack Overflow

pandas add index to column name

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related