Im using Python 2.7, BeautifulSoup4, regex, and requests on windows 7.
I've scraped some code from a website and I am having problems parsing and extracting the bits I want and storing them in a dictionary. What I'm after is text that is presented as follows in the code:
@CAD_DTA\">I WANT THIS@G@H@CAD_LBL
there are about 50-60 short strings I want to extract and store and they are all preceded by @CAD_DTA\"> and followed by @G@H@CAD_LBL in the code. These strings are all of variable length
I've tried:
re.search('@CAD_DTA\">(.+?)@G@H@CAD_LBL',result.text)
where result is the output of s.post(url, data = cookie, headers = {'referer': my_referer})
Ive also tried passing str(result.text)
but re.search keeps returning None. It's odd because if I literally copy and paste the content of result.text into a string and pass that through re.search it works fine.
Ive tried using re.search('@CAD_DTA">(.+?)@G@H@CAD_LBL',result.text) in case the \ is being treated as an escape or something. I dunno.
Can someone point me in the right direction?
re.search(r'@CAD_DTA\\">(.+?)@G@H@CAD_LBL',result.text)should work then.