4

I am trying to remove the rows that only have digits or only characters in it. For example, below is the sample pandas dataframe column:

col1:

business
served business
02446681
C96305407PLA
P0116711

In my results, I would need the below values because the first & second rows contain only characters and third row is just digits.

col1:

C96305407PLA
P0116711

Any suggestions would be appreciated !!

3 Answers 3

4

Using two str.contains

df[df.business.str.contains('\d+')&df.business.str.contains('[A-Za-z]')]
Out[48]: 
       business
2  C96305407PLA
3      P0116711
Sign up to request clarification or add additional context in comments.

Comments

3

Using pandas.Series.str.contains with regex

Simpler regex but would allow for a row with '123 456' because both '3 ' and ' 4' satisfy the pattern.

df[df.col1.str.contains('\d\D|\D\d')]

           col1
3  C96305407PLA
4      P0116711

This addresses the shortcoming of the regex above by explicitly forcing the pattern to only match if either a digit/alpha or alpha/digit is found.

df[df.col1.str.contains('(?i)\d[a-z]|[a-z]\d')]

           col1
3  C96305407PLA
4      P0116711

Comments

3

str.extract and drop unnecessary rows.

df['col1'].str.extract('([A-Za-z]+\d+)', expand = False).dropna()

3    C96305407
4     P0116711

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.