delete a part of string before a specific pattern

Question

I have a pandas dataframe with a column where I have to retrieve specific names. The only problem is, those names are not always at the same place and all the values of that columns do not have the same length, so I cannot use the split function . However, I have noticed that before those names, there is a always a combination of 4 to 7 digits. I believe it's the identifier for the name.
So how can I use regular expression to go through that column and retrieve the names I need. Here is a example from the jupyter notebook:

 df['info']
 csx_Gb009_broken screen_231400_Iphone 7
 000345_SamsungS8_tfes_Vodafone_is56t34_3G
 Ins45_56003_Huawei P8_

What I want is something like this:

 df['Phones']
 Iphone 7
 SamsungS8
 Huawei P8

I want to have something like the above knowing that those names come before a combination of 4 to 7 digits and end by an underscore.

Wiktor Stribiżew · Accepted Answer · 2018-09-23 21:53:55Z

1

You may use

df['Phones'] = df['info'].str.extract(r'\d{4}_([^_]+)')

The pattern matches:

\d{4} - 4 digits
_ - an underscore
([^_]+) - Capturing group 1 (this value will be returned by str.extract): one or more chars other than _.

See the regex demo.

answered Sep 23, 2018 at 21:53

Wiktor Stribiżew

631k41 gold badges502 silver badges633 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

delete a part of string before a specific pattern

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related