Replace string in pandas df column name

Question

I have a dataframe in pandas, with columns named "string_string", I'm trying to rename them by removing the "_" and the following string. For example, I want to change "12527_AC9E5" to "12527". I've tried to use various replace options, and I can replace a specific part of the string (e.g., I can replace all the "_"), but when I introduce wildcards I do not achieve the desired result.

Below are some of the things I thought would work, but don't. If I remove the wild cards they work (i.e, they replace the _).

df = df.rename(columns=lambda x: x.sub('_.+', ''))

df.columns = df.columns.str.replace('_.+','')

Any help appreciated

DeepSpace · Accepted Answer · 2015-11-05 11:50:09Z

18

Just split on '_' and take the first element. You can take advantage of dictionary comprehension:

df = df.rename(columns={col: col.split('_')[0] for col in df.columns})

answered Nov 5, 2015 at 11:50

DeepSpace

82.2k12 gold badges119 silver badges166 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

abissett Over a year ago

Thanks, that answers the question! Can any light be shed on why the replace using wildcards doesn't work? The reason I ask is that I can perform the task I'm trying to do very easily using Perl, but am getting a bit muddled trying to understand python regex stuff........

PyBoss Over a year ago

@DeepSpace what if one of the column doesn't contain '_' , it will give me an error "IndexError: list index out of range" please advise how to handle it

DeepSpace Over a year ago

@PyBoss You should not get that error since splitting on a non-existing character still returns a list with a single element. Please open a new question with your exact issue

Collectives™ on Stack Overflow

Replace string in pandas df column name

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related