Python-Regex, what's going on here?

Question

I've got a book on python recently and it's got a chapter on Regex, there's a section of code which I can't really understand. Can someone explain exactly what's going on here (this section is on Regex groups)?

>>> my_regex = r'(?P<zip>Zip:\s*\d\d\d\d\d)\s*(State:\s*\w\w)'
>>> addrs = "Zip: 10010 State: NY"
>>> y = re.search(my_regex, addrs)
>>> y.groupdict('zip')
{'zip': 'Zip: 10010'}
>>> y.group(2)
'State: NY'

Which part don't you understand? Regex in general, or how python is pulling out the 'zip' group and second (unnamed) group? Adding more detail to your question will get you better, more targeted answers. — Ian Varley
– Ian Varley, Commented Jan 11, 2009 at 18:46
so does it just mean that it creates a group called zip which does what the rest of the line states as in "Zip:\s*\d\d\d\d\d)\s*(State:\s*\w\w)" and then the rest of it creates a dict called groupdict with the Zip and the State I think I get it :) — user33061
– user33061, Commented Jan 11, 2009 at 18:53

SchaeferFFM · Accepted Answer · 2009-01-11 18:54:18Z

8

regex definition:

(?P<zip>...)

Creates a named group "zip"

Zip:\s*

Match "Zip:" and zero or more whitespace characters

\d

Match a digit

\w

Match a word character [A-Za-z0-9_]

y.groupdict('zip')

The groupdict method returns a dictionary with named groups as keys and their matches as values. In this case, the match for the "zip" group gets returned

y.group(2)

Return the match for the second group, which is a unnamed group "(...)"

Hope that helps.

answered Jan 11, 2009 at 18:54

SchaeferFFM

3241 silver badge4 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

tristan · Accepted Answer · 2009-01-11 18:52:39Z

2

The search method will return an object containing the results of your regex pattern.

groupdict returns a dictionnary of groups where the keys are the name of the groups defined by (?P...). Here name is a name for the group.

group returns a list of groups that are matched. "State: NY" is your third group. The first is the entire string and the second is "Zip: 10010".

This was a relatively simple question by the way. I simply looked up the method documentation on google and found this page. Google is your friend.

answered Jan 11, 2009 at 18:52

tristan

5534 silver badges5 bronze badges

Comments

Teifion · Accepted Answer · 2009-01-11 18:57:17Z

1

# my_regex = r' <= this means that the string is a raw string, normally you'd need to use double backslashes
# ( ... ) this groups something
# ? this means that the previous bit was optional, why it's just after a group bracket I know not
# * this means "as many of as you can find"
# \s is whitespace
# \d is a digit, also works with [0-9]
# \w is an alphanumeric character
my_regex = r'(?P<zip>Zip:\s*\d\d\d\d\d)\s*(State:\s*\w\w)'
addrs = "Zip: 10010 State: NY"

# Runs the grep on the string
y = re.search(my_regex, addrs)

answered Jan 11, 2009 at 18:57

Teifion

112k76 gold badges165 silver badges196 bronze badges

Comments

Tim Pietzcker · Accepted Answer · 2009-01-11 18:55:32Z

0

The (?P<identifier>match) syntax is Python's way of implementing named capturing groups. That way, you can access what was matched by match using a name instead of just a sequential number.

Since the first set of parentheses is named zip, you can access its match using the match's groupdict method to get an {identifier: match} pair. Or you could use y.group('zip') if you're only interested in the match (which usually makes sense since you already know the identifier). You could also access the same match using its sequential number (1). The next match is unnamed, so the only way to access it is its number.

answered Jan 11, 2009 at 18:55

Tim Pietzcker

337k59 gold badges520 silver badges572 bronze badges

Comments

Federico A. Ramponi · Accepted Answer · 2009-01-11 19:00:33Z

0

Adding to previous answers: In my opinion you'd better choose one type of groups (named or unnamed) and stick with it. Normally I use named groups. For example:

>>> my_regex = r'(?P<zip>Zip:\s*\d\d\d\d\d)\s*(?P<state>State:\s*\w\w)'
>>> addrs = "Zip: 10010 State: NY"
>>> y = re.search(my_regex, addrs)
>>> print y.groupdict()
{'state': 'State: NY', 'zip': 'Zip: 10010'}

answered Jan 11, 2009 at 19:00

Federico A. Ramponi

47.2k31 gold badges113 silver badges134 bronze badges

Comments

Steve Losh · Accepted Answer · 2009-01-12 18:28:12Z

0

strfriend is your friend:

http://strfriend.com/vis?re=(Zip%3A\s*\d\d\d\d\d)\s*(State%3A\s*\w\w)

EDIT: Why the heck is it making the entire line a link in the actual comment, but not the preview?

answered Jan 12, 2009 at 18:28

Steve Losh

20k2 gold badges54 silver badges44 bronze badges

Collectives™ on Stack Overflow

Python-Regex, what's going on here?

6 Answers 6

Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related