Splitting a string in 2 strings using regular expression

Question

Good morning, I have a question, using Webscraping, I extract an information in string format like this:

"Issued May2018No expiration date"

what I want is to split this string into 2 strings by using regular expression, my idea is: whenever you find 4 digits followed by "No", I want to create the following string:

"Issued May2018 - No expiration date".

In this way, I'm able to use the method "split" applied to "-" in a way that I achieve two strings:

Issued May2018
No expiration date

I was thinking using regex with

\d\d\d\dNo

and it should be able to recognise 2018No, but I don't know how to proceed in order that I can replace it with

May2018 - No expiration date

and set the floor for using the split function

Any suggestions? other approaches are well suggested

The fourth bird · Accepted Answer · 2022-01-10 14:22:47Z

1

You can use a capture group to capture 4 digits followed by matching No

In the replacement use the capture group 1 value followed by - No

import re

s = "Issued May2018No expiration date"
pattern = r"(\d{4})No "
print(re.sub(pattern, r"\1 - No ", s))

Output

Issued May2018 - No expiration date

See a Python demo and a regex demo.

edited Jan 10, 2022 at 14:22

answered Jan 10, 2022 at 14:16

The fourth bird

165k16 gold badges61 silver badges75 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

JvdV Over a year ago

+ But I think OP needs to use re.findall(). His intention is to first adjust the string to then split it on the hyphen. It can be done in a single go rather. No?

Iacopo_Biondini Over a year ago

Thanks for your reply, I have another question: there is a way to write an if statement to say that if that regex does not match, look at another regex? for instance, sometimes it can happen that the information extracted can be like "Issued May2018 • No expiration date", so the regex does not correspond in that case

The fourth bird Over a year ago

@Iacopo_Biondini you don't really need 2 pattern for that, you can use for example (\d{4})\W*No regex101.com/r/KxudVk/1

The fourth bird Over a year ago

@JvdV If the OP wants 2 capture groups, then I would use something along (.*?\d{4})\W*(No .*) regex101.com/r/vA11kc/1

JvdV Over a year ago

@Thefourthbird, yes I think that would work too! It would save him another Split() operation since his end-goal seems to end up with those two strings as per his sample data.

pppig · Accepted Answer · 2022-01-10 14:25:25Z

1

Use re.sub.

\g<1> is represented in the string passed to the repl parameter of re.sub() as the result of a match for reference group 1.

import re

s = "Issued May2018No expiration date"
print(re.sub("(\d{4})(No)", "\g<1> - \g<2>", s))

# 'Issued May2018 - No expiration date'

edited Jan 10, 2022 at 14:25

answered Jan 10, 2022 at 14:19

pppig

1,2951 gold badge9 silver badges12 bronze badges

Comments

Gab · Accepted Answer · 2022-01-10 14:37:50Z

1

import re

string = "Issued May2018No expiration date"

m = re.findall(r"^(.*[0-9]{4})(No.*)$", string)

print(m[0][0] + " - " + m[0][1])

->

Issued May2018 - No expiration date

edited Jan 10, 2022 at 14:37

answered Jan 10, 2022 at 14:21

Gab

3,5381 gold badge13 silver badges25 bronze badges

2 Comments

JvdV Over a year ago

I think this is what OP would be after since it would save him another split operation too. However, you might want to just use re.findall() then. >> print(re.findall(r'^(.*\d{4})(No.*)$', s))

Gab Over a year ago

@JvdV agreed, I have updated the answer

Collectives™ on Stack Overflow

Splitting a string in 2 strings using regular expression

3 Answers 3

5 Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related