1

I have data frame with about 100 columns that repeat itself because the data is organized by weeks, it looks something like that:

hours hours clicks clicks days days minutes minutes
week 1 week 2 week 1 week 2 week 1 week 2 week 1 week 2
2 2 2 3 6 2 2 3
1 7 6 3 8 2 9 3

I would like the output to look like this:

hours_w1 hours_w2 clicks_w1 clicks_w2 days_w1 days_w2 minutes_w1 minutes_w2
2 2 2 3 6 2 2 3
1 7 6 3 8 2 9 3

I know I can just rename the columns but because I have over 100 columns I'm looking for a more efficient way.

I tried to use add_suffix but had only managed to add the same suffix to all columns, when what I need is a different index for each week.

any idea how to do this?

Thanks!!

1
  • 1
    please provide the output of df.head().to_dict() for clarity, it's unclear whether you have a MultiIndex of data as header Commented Jan 16, 2023 at 16:00

2 Answers 2

2

Extract the suffixes from the first row then add them to the column names and finally remove the first row.

# To fix mangle_dup_cols
df.columns = df.columns.str.split('.').str[0]
suffixes = '_' + df.iloc[0].str[0] + df.iloc[0].str[-1]
df.columns += suffixes
df = df.iloc[1:]

Output:

>>> df
  hours_w1 hours_w2 clicks_w1 clicks_w2 days_w1 days_w2 minutes_w1 minutes_w2
1        2        2         2         3       6       2          2          3
2        1        7         6         3       8       2          9          3
Sign up to request clarification or add additional context in comments.

2 Comments

thanks! It works well for the first column (with _w1) but for the second column (for week 2), I get this: hours.1_w2. any idea why it add the .1 and how to remove it?
The problem is when you read the file with read_csv or read_excel. Try before df.columns = df.columns.str.split('.').str[0]. I updated my answer.
1

first you should change the first row:

df.iloc[0] = df.iloc[0].apply(lambda x:'w1' if x == 'week 1' else 'w2')

Then you can merge it with the column name like this:

df.columns = [i + '_' + j for i, j in zip(df.columns, df.iloc[0])]

And then you can remove the first row:

df = df.iloc[1:]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.