0

I have a dataframe column with strings like this:

df.column1:
0 R$ 27.467.522,00 (Vinte e sete milhões, quatro...
1 NaN
2 R$ 35.314.312,12 (Trinta e cinco milhões, trezentos...
3 R$ 1.231,34 (Mil duzentos e trinta e um reais e...

I want only to get the numbers, disconsidering the decimals, so it gets to look like this:

df.column1:
0 27467522
1 NaN
2 35314312
3 1231

I'm trying to do that with regex:

df['column1']=df['column1'].str.extract('[REGEX CODE]')

However I'm not used with Regex. I tried solutions like:

df['column1']=df['column1'].str.extract('(.*?,)').str.extract('(\d+)')
df['column1']=df['column1'].str.extract('(\s*,.*)').str.extract('(\d+)')

But I haven't been able to make it right. Can someone help?

1 Answer 1

2

Use str.replace then str.extract

df.column1.str.replace('.', '').str.extract(r'(\d+)')

          0
0  27467522
1       NaN
2  35314312
3      1231

Decimals are indicated by commas here, so by replacing periods and using extract to find the first match, the number will be matched, ignoring the decimal.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for your effort but that doesn't pal. We will keep the decimals this way
Can you explain? It seems to me this matches your desired output.
Sorry, you are correct, my confusion here. I didn't get how you made that work but it is working perfectly
if you want to keep the decimals, change the regex to r'(\d+(,\d+)?). This way, it can cover both integers and decimals. Afterwards, you may replace the ',' with '.'

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.