1

I am using Python 3.6, and have several thousand text documents that I have scanned from PDF files into a python 3 dictionary as a string. Each document is a separate dictionary entry of a single string. I am trying to use a regular expression search to extract the name and address information from each page. I have identified that the last name is always preceded by “Room #______” and followed by “Last/“ I have tried to do this, but it doesn’t seem to work. I am not at all familiar with lookaround constructs. Can anyone tell me what I’m doing wrong? My final code will have several of these searches, this is only the first.

memberRecord = memberData[1]
memberRegex = re.compile(r'''(
    (?<=Room #______)\w+(?=Last)
    $
    )''', re.VERBOSE)
mo = memberRegex.search(memberRecord)
2
  • You do not account for any whitespace or non-word chars in between Room #____, you word and Last. Try Room #______(.*?)Last and when a match is found, grab mo.group(1). Commented Apr 23, 2017 at 20:11
  • Thanks you Wiktor. I was trying to make it too complicated apparently! This worked: memberRegex = re.compile(r'(Room #______)(.*)(Last)') mo = memberRegex.search(memrec) print(mo.group(2)) Commented Apr 24, 2017 at 0:17

1 Answer 1

1

You do not account for any whitespace or non-word chars in between Room #____, you word and Last. The value you need can be accessed after a match is found via mo.group(1):

memberRegex = re.compile(r'Room #______(.*?)Last', re.DOTALL)
mo = memberRegex.search(memberRecord)
if mo:
    print(mo.group(1))

Note that re.DOTALL flag will allow . to match across lines and *? lazy quantifier will match as few any characters as possible, up to the first Last. If you need to get to the last occurrence of Last, replace *? with * (greedy quantifier version).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.