Python Regular Expression of long complex string

Question

So I am scraping data from a webpage and the received data usually is as followed:

233989 001 0 / 49 T R 4:15 PM - 5:30 PM 205 IST Building 01/13/14 - 05/02/14 Controls View (814) 865-8947 266200 002 0 / 43 M W F 10:10 AM - 11:00 AM 110 IST Building 01/13/14 - 05/02/14 Controls View (814) 865-8947

I am trying to split the data from the pattern ###### (6 numbers, i.e. 233989) to the phone number which represents the end of the current data line (i.e. (814) 865-8947) Because I know it'll always end with 4 numbers I came up with the expression:

(^[0-9]{1,6}$[^[0-9]{1,4}$]*[0-9]{1,4}$+)+

This does not seem to work though. Can anyone lend a helping hand?

brandonscript · Accepted Answer · 2014-01-15 00:22:05Z

1

You could use this:

r'(\d{6}.*?\(\d{3}\) \d{3}-\d{4}) ?'

Then rebuild it on $1\n

Like so: http://regex101.com/r/lG4gG5

Python:

import re

s = '233989 001 0 / 49 T R 4:15 PM - 5:30 PM 205 IST Building 01/13/14 - 05/02/14 Controls View (814) 865-8947 266200 002 0 / 43 M W F 10:10 AM - 11:00 AM 110 IST Building 01/13/14 - 05/02/14 Controls View (814) 865-8947'
spl = re.split(r'(\d{6}.*?\(\d{3}\) \d{3}-\d{4}) ?', s)
for line in spl:
    print line

edited Jan 15, 2014 at 0:22

answered Jan 15, 2014 at 0:15

brandonscript

73.8k35 gold badges179 silver badges240 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

brandonscript Over a year ago

Made it a little simpler, didn't need that second capture group at all.

Collectives™ on Stack Overflow

Python Regular Expression of long complex string

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related