2

This is not for homework!

Hello,

Just a quick question about Regex formatting.

I have a list of different courses.

L = ['CI101', 'CS164', 'ENGL101', 'I-', 'III-', 'MATH116', 'PSY101']

I was looking for a format to find all the words that start with I, or II, or III. Here is what I did. (I used python fyi)

for course in L:
    if re.search("(I?II?III?)*", course):
        L.pop()

I learned that ? in regex means optional. So I was thinking of making I, II, and III optional and * to include whatever follows. However, it seems like it is not working as I intended. What would be a better working format?

Thanks

5
  • 5
    re.match('^I{1,3}.*$'), please see regex101.com/r/HDS4TX/1. Commented Mar 20, 2019 at 3:01
  • Aha, Thank you! Commented Mar 20, 2019 at 3:15
  • @Yang Do you mind making that an answer so the question can be resolved? Commented Mar 20, 2019 at 3:57
  • 1
    Thanks for reminding me, I've forgot this question :) @NiayeshIsky Commented Mar 20, 2019 at 4:19
  • 1
    Yang's regex is correct, but note that your Python code won't work as intended: the pop() operation will always remove the first element in the list. Consider using a list comprehension like so: [ c for c in courses if re.match("^I{1,3}.*", c) ] Commented Mar 20, 2019 at 4:34

2 Answers 2

3

Here is the regex you should use:

^I{1,3}.*$

click here to see example

^ means the head of a line. I{1,3} means repeat I 1 to 3 times. .* means any other strings. $ means the tail of a line. So this regex will match all the words that start with I, II, or III.

Look at your regex, first, you don't have the ^ mark, so it will match I anywhere. Second, ? will only affect the previous one character, so the first I is optional, but the second I is not, then the third I is optional, the fourth and fifth I are not, the sixth I is optional. Finally, you use parentheses with *, that means the expression in parentheses will repeat many times include 0 time. So it will match 0 I, or at least 3 I.

your regex

Sign up to request clarification or add additional context in comments.

1 Comment

I see. Thanks for the explanation!
1

Instead of search() you can use the function match() that matches the pattern at the beginning of string:

import re

l = ['CI101', 'CS164', 'ENGL101', 'I-', 'III-', 'MATH116', 'PSY101']

pattern = re.compile(r'I{1,3}')

[i for i in l if not pattern.match(i)]
# ['CI101', 'CS164', 'ENGL101', 'MATH116', 'PSY101']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.