Use regex to extract substring from pandas column

Question

I have columns with values like this:

Col1

1/1/100 'BA1
1/1/102Packe
1/1/102 'to_

And need to extract just 1/1/100 (from the first row) and so on (1/1/102...)

I am using:

df['col1'] = df['col1'].str.extract('(\d+)/(\d+)/(\d+)', expand=True)

But I'm getting only 1.

Not sure why this is not working, is there a problem with regex or I need some kind of mapping?

Wiktor Stribiżew · Accepted Answer · 2019-01-23 10:24:20Z

9

You need to only use a single capturing group:

df['col1'] = df['col1'].str.extract('(\d+/\d+/\d+)', expand=True)
                                     ^           ^

The str.extract method returns the value captured with the first capturing group, and your regex captures the first 1 into that group.

Test:

>>> import pandas as pd
>>> df = pd.DataFrame({"col1":["1/1/100 'BA1", "1/1/102Packe", "1/1/102 'to_"]})
>>> df['col1'].str.extract('(\d+/\d+/\d+)', expand=True)
         0
0  1/1/100
1  1/1/102
2  1/1/102

answered Jan 23, 2019 at 10:24

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Mohamed Thasin ah · Accepted Answer · 2019-01-23 10:30:36Z

1

you can try this also,

df['Col1']=df['Col1'].str.replace('\d+|/','')

Note: Regex is more powerful than .str.replace.

answered Jan 23, 2019 at 10:30

Mohamed Thasin ah

11.2k11 gold badges65 silver badges120 bronze badges

Comments

Samir · Accepted Answer · 2019-01-23 13:13:17Z

1

I suggest this Regex:

df['col1'].str.extract('\b(\d/?)+', expand=True)

answered Jan 23, 2019 at 13:13

Samir

2832 silver badges10 bronze badges

Collectives™ on Stack Overflow

Use regex to extract substring from pandas column

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related