Regular expression search usage in Python 3.6

Question

I am using Python 3.6, and have several thousand text documents that I have scanned from PDF files into a python 3 dictionary as a string. Each document is a separate dictionary entry of a single string. I am trying to use a regular expression search to extract the name and address information from each page. I have identified that the last name is always preceded by “Room #______” and followed by “Last/“ I have tried to do this, but it doesn’t seem to work. I am not at all familiar with lookaround constructs. Can anyone tell me what I’m doing wrong? My final code will have several of these searches, this is only the first.

memberRecord = memberData[1]
memberRegex = re.compile(r'''(
    (?<=Room #______)\w+(?=Last)
    $
    )''', re.VERBOSE)
mo = memberRegex.search(memberRecord)

You do not account for any whitespace or non-word chars in between Room #____, you word and Last. Try Room #______(.*?)Last and when a match is found, grab mo.group(1). — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Apr 23, 2017 at 20:11
Thanks you Wiktor. I was trying to make it too complicated apparently! This worked: memberRegex = re.compile(r'(Room #______)(.*)(Last)') mo = memberRegex.search(memrec) print(mo.group(2)) — Barry Barron
– Barry Barron, Commented Apr 24, 2017 at 0:17

Wiktor Stribiżew · Accepted Answer · 2017-04-24 06:46:12Z

1

You do not account for any whitespace or non-word chars in between Room #____, you word and Last. The value you need can be accessed after a match is found via mo.group(1):

memberRegex = re.compile(r'Room #______(.*?)Last', re.DOTALL)
mo = memberRegex.search(memberRecord)
if mo:
    print(mo.group(1))

Note that re.DOTALL flag will allow . to match across lines and *? lazy quantifier will match as few any characters as possible, up to the first Last. If you need to get to the last occurrence of Last, replace *? with * (greedy quantifier version).

answered Apr 24, 2017 at 6:46

Wiktor Stribiżew

631k41 gold badges502 silver badges633 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Regular expression search usage in Python 3.6

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related