3

How do I grab the 123 part of the following string using Python 3 regex module?

....XX (a lot of HTML characters)123

Here the ... Part denotes a long string consisting of HTML characters, words and numbers.

The number 123 is a characteristic of XX. So if anybody could suggest a universal method in which XX can be any letters like AA or AB, it would be more helpful.

Side Note:
I thought of using Perl's \G operator by first identifying XX in the string and then identifying the first number appearing after XX. But it seems \G operator doesn't work in Python 3.

My code:

import re
source='abcd XX blah blah 123 more blah blah'
grade=str(input('Which grade?'))
#here the user inputs XX

match=re.search(grade,source)
match=re.search('\G\D+',source)
#Trying to use the \G operator to get the location of last match.Doesn't work.

match=re.search('\G\d+',source)
#Trying to get the next number after XX.
print(match.group())
12
  • 2
    Could you show your attempt so this problem can become more clear Commented Jun 8, 2013 at 4:52
  • 1
    What do you mean by "grab" it? How about just if '123' in text: print '123'? Commented Jun 8, 2013 at 5:03
  • 1
    stackoverflow.com/questions/2802168/… Commented Jun 8, 2013 at 5:06
  • 2
    You can specify starting position. match = re.search(grade, source); match = re.compile(r'\d+').search(source, match.end()); print(match.group()) Commented Jun 8, 2013 at 5:18
  • 1
    Compiled regular expression's search method accept optional pos parameter. docs.python.org/2/library/re.html#re.RegexObject.search Commented Jun 8, 2013 at 5:29

1 Answer 1

1

Description

This regex will match the string value XX which can be replaced with the user input. The regex will also require that the XX string be surrounded by white space or at the beginning of your sample text which prevents the accidental edge case where XX is found inside a word like EXXON.

(?<=\s|^)\b(xx)\b\s.*?\s\b(\d+)\b(?=\s|$)

enter image description here

Code Example:

I don't know python well enough to offer a proper python example, so I'm including a PHP example to simply show how the regex would work and the captured groups

<?php
$sourcestring="EXXON abcd XX blah blah 123 more blah blah";
preg_match('/(?<=\s|^)\b(xx)\b\s.*?\s\b(\d+)\b(?=\s|$)/im',$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
?>
 
$matches Array:
(
    [0] => XX blah blah 123
    [1] => XX
    [2] => 123
)

If you need the actual string position, then in PHP that would look like

$position = strpos($sourcestring, $matches[0]) 
Sign up to request clarification or add additional context in comments.

2 Comments

Just curious. What did you use to to generate the image?
@ Korylprince, I'm using debuggex.com. Although it doesn't support lookbehinds or atomic groups it's still handy for understanding the expression flow. There is also regexper.com. They do a pretty good job too, but it's not real time as you're typing.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.