2

So I am scraping data from a webpage and the received data usually is as followed:

233989 001 0 / 49 T R 4:15 PM - 5:30 PM 205 IST Building 01/13/14 - 05/02/14 Controls View (814) 865-8947 266200 002 0 / 43 M W F 10:10 AM - 11:00 AM 110 IST Building 01/13/14 - 05/02/14 Controls View (814) 865-8947

I am trying to split the data from the pattern ###### (6 numbers, i.e. 233989) to the phone number which represents the end of the current data line (i.e. (814) 865-8947) Because I know it'll always end with 4 numbers I came up with the expression:

(^[0-9]{1,6}$[^[0-9]{1,4}$]*[0-9]{1,4}$+)+

This does not seem to work though. Can anyone lend a helping hand?

1 Answer 1

1

You could use this:

r'(\d{6}.*?\(\d{3}\) \d{3}-\d{4}) ?'

Then rebuild it on $1\n

Like so: http://regex101.com/r/lG4gG5

Python:

import re

s = '233989 001 0 / 49 T R 4:15 PM - 5:30 PM 205 IST Building 01/13/14 - 05/02/14 Controls View (814) 865-8947 266200 002 0 / 43 M W F 10:10 AM - 11:00 AM 110 IST Building 01/13/14 - 05/02/14 Controls View (814) 865-8947'
spl = re.split(r'(\d{6}.*?\(\d{3}\) \d{3}-\d{4}) ?', s)
for line in spl:
    print line
Sign up to request clarification or add additional context in comments.

1 Comment

Made it a little simpler, didn't need that second capture group at all.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.