Rename columns with regex in Python

Question

I'm trying to rename over a bunch of columns in my df in Python. Because there are over a 1000 that should be renamed I'm trying to do it with regex since I saw that Python allows you to do this. More specifically, every column ending in _Sum should be renamed, with the _Sum part, replaced by '_max' (ex.: column1_Sum -> column1_max). I've tried following code:

df = df.rename(columns=lambda x: re.sub('(.+)_Sum$','$1_max',x))

But it just replaces every columnname literally with '$1_max'. I've worked previously with regex in other programs and I always thought that $1 captures your previous group, in this case, everything before the '_', so I don't really know what I'm doing wrong here.

Use r"\1_max" instead

The fourth bird
– The fourth bird

2020-08-10 08:55:05 +00:00
Commented Aug 10, 2020 at 8:55 — The fourth bird
– The fourth bird, Commented Aug 10, 2020 at 8:55

Shovalt · Accepted Answer · 2020-08-10 09:06:45Z

4

You don't need the capturing groups for your specific problem. You can simply do:

df.columns = df.columns.str.replace('_Sum$', '_max')

In case you do eventually need capturing groups, you can use something like:

df.columns.str.replace('(.+)_Sum$', lambda x: f'{x.group(1)}_max')

See: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.replace.html

edited Aug 10, 2020 at 9:06

answered Aug 10, 2020 at 8:59

Shovalt

6,8362 gold badges39 silver badges53 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Rename columns with regex in Python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related