0

I have a pandas data frame with really long text in a column. I wanted to select all columns that contain ABC. I was able to do this using the following

 df[df['Column'].str.contains('ABC', na=False)]

What I want to do after that is extract all values from this field that contain the prefix and the next 5 letters. S.So after finding a column, I would want to get ABC1234 or ABC7899.

I hope this makes sense.

1 Answer 1

2

You can use str.extract with a regular expression that says to capture any time it sees ABC with 5 following digits

df = pd.DataFrame({'Column':['ABC12345 is in this column', 'Not in this one CCD11111','Also in this one ABC99882']})
df['capture'] = df.Column.str.extract('(ABC\d{5})')
df.dropna(inplace=True)
print(df)

Output

                      Column   capture
0  ABC12345 is in this column  ABC12345
2   Also in this one ABC99882  ABC99882
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks so much. I was really stuck on this.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.