8

I have a dataframe in pandas, with columns named "string_string", I'm trying to rename them by removing the "_" and the following string. For example, I want to change "12527_AC9E5" to "12527". I've tried to use various replace options, and I can replace a specific part of the string (e.g., I can replace all the "_"), but when I introduce wildcards I do not achieve the desired result.

Below are some of the things I thought would work, but don't. If I remove the wild cards they work (i.e, they replace the _).

df = df.rename(columns=lambda x: x.sub('_.+', ''))

df.columns = df.columns.str.replace('_.+','')

Any help appreciated

1 Answer 1

18

Just split on '_' and take the first element. You can take advantage of dictionary comprehension:

df = df.rename(columns={col: col.split('_')[0] for col in df.columns})
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks, that answers the question! Can any light be shed on why the replace using wildcards doesn't work? The reason I ask is that I can perform the task I'm trying to do very easily using Perl, but am getting a bit muddled trying to understand python regex stuff........
@DeepSpace what if one of the column doesn't contain '_' , it will give me an error "IndexError: list index out of range" please advise how to handle it
@PyBoss You should not get that error since splitting on a non-existing character still returns a list with a single element. Please open a new question with your exact issue

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.