0
Input string
---------------
South Africa 109/0 
Australia 100
Sri Lanka 111
Sri Lanka 331/4

Expected Output
---------------
['South Africa', '109', '0']
['Australia', '100']
['Sri Lanka', '111']
['Sri Lanka', '331', '4']

I tried several regex, but couldn't figure out to write the correct one. Space delimiter doesnt help me in this case as the country names may or may not have spaces (South Africa, India). Thanks in Advance

5 Answers 5

2

We could use the regex:

r'(\D+)\s(\d+)(?:/(\d+))?'

("a lot of non-digits, followed by a space, followed by a lot digits, and then optionally followed by a slash and then a lot of digits.")

This will return, e.g.

>>> [re.match(r'(\D+)\s(\d+)(?:/(\d+))?', x).groups() 
...  for x in ['South Africa 109/0', 
...            'Australia 100',
...            'Sri Lanka 111',
...            'Sri Lanka 331/4']]
[('South Africa', '109', '0'), 
 ('Australia', '100', None), 
 ('Sri Lanka', '111', None), 
 ('Sri Lanka', '331', '4')]

Notice the Nones, which you may need to filter out manually.

Sign up to request clarification or add additional context in comments.

2 Comments

Shouldn't you use [\w\s] instead of \D in order to fail on 'Au$tralia' ?
@PierreGM: What if OP wants Bishop's Stortford and Xi'an to succeed? And maybe Áŭ$t®å£ià is really considered valid.
1

Try:

import re
re.split(r"(?<=[a-zA-Z])\s+(?=\d)|(?=\d)\s+(?=[a-zA-Z])|/", "South Africa 109/0")

Comments

0
re.compile("^([\w\s]+)\s(\d+)\/?(\d+)?")

gives you the three groups. We can decompose it

  • A group of only letters and space ([\w\s]+) at the beggining of the line (^)
  • a space
  • a group of digits, at least one (\d+)
  • a / or not
  • a group of digits (potententially None)

2 Comments

This outputs Australia 100 and Sri Lanka 111 in the first group.
No, that gives you an empty group at the end, just like @KennyTM version.
0

This is the regex you need:

for match in re.finditer(r"(?m)^(?P<Country>.*?)\s*(?P<Number1>\d+)\s*?/?\s*?(?P<Number2>\d*?)\s*?$", inputText):
    country = match.group("Country")
    number1 = match.group("Number1")
    number2 = match.group("Number2")

You can see the results here.

And here's the explanation of the pattern:

# ^(?P<Country>.*?)\s*(?P<Number1>\d+)\s*?/?\s*?(?P<Number2>\d*?)\s*?$
# 
# Options: ^ and $ match at line breaks
# 
# Assert position at the beginning of a line (at beginning of the string or after a line break character) «^»
# Match the regular expression below and capture its match into backreference with name “Country” «(?P<Country>.*?)»
#    Match any single character that is not a line break character «.*?»
#       Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
# Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*»
#    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
# Match the regular expression below and capture its match into backreference with name “Number1” «(?P<Number1>\d+)»
#    Match a single digit 0..9 «\d+»
#       Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
# Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*?»
#    Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
# Match the character “/” literally «/?»
#    Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
# Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*?»
#    Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
# Match the regular expression below and capture its match into backreference with name “Number2” «(?P<Number2>\d*?)»
#    Match a single digit 0..9 «\d*?»
#       Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
# Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*?»
#    Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
# Assert position at the end of a line (at the end of the string or before a line break character) «$»

1 Comment

Still gives you three groups on "Australia 101", and your last group is '' by comparison to @KevinTM 's and my solution(None).
0

You've got the answers with regex, but I suggest also considering the available builtin str methods (for this use case anyway):

s = 'South Africa 109/0'
country, numbers = s.rsplit(' ', 1)
# ('South Africa', '109/0')
new_list = [country] + numbers.split('/')
# ['South Africa', '109', '0'] 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.