Removing rows with digits and strings in pandas dataframe

Question

I am trying to remove the rows that only have digits or only characters in it. For example, below is the sample pandas dataframe column:

col1:

business
served business
02446681
C96305407PLA
P0116711

In my results, I would need the below values because the first & second rows contain only characters and third row is just digits.

col1:

C96305407PLA
P0116711

Any suggestions would be appreciated !!

BENY · Accepted Answer · 2018-10-10 19:05:35Z

4

Using two str.contains

df[df.business.str.contains('\d+')&df.business.str.contains('[A-Za-z]')]
Out[48]: 
       business
2  C96305407PLA
3      P0116711

answered Oct 10, 2018 at 19:05

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

piRSquared · Accepted Answer · 2018-10-10 19:23:51Z

3

Simpler regex but would allow for a row with '123 456' because both '3 ' and ' 4' satisfy the pattern.

df[df.col1.str.contains('\d\D|\D\d')]

           col1
3  C96305407PLA
4      P0116711

This addresses the shortcoming of the regex above by explicitly forcing the pattern to only match if either a digit/alpha or alpha/digit is found.

df[df.col1.str.contains('(?i)\d[a-z]|[a-z]\d')]

           col1
3  C96305407PLA
4      P0116711

answered Oct 10, 2018 at 19:06

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Vaishali · Accepted Answer · 2018-10-10 19:05:00Z

3

str.extract and drop unnecessary rows.

df['col1'].str.extract('([A-Za-z]+\d+)', expand = False).dropna()

3    C96305407
4     P0116711

answered Oct 10, 2018 at 19:05

Vaishali

38.5k5 gold badges62 silver badges88 bronze badges