I'm trying to keep linebreaks reading from a txt file when I print the content into an HTML one.
I get results from boilerpipe in this way:
class BottomPipeResult :
AGENT_ID = "Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20100101 Firefox/7.0.1"
BOTTOMPIPE_URL = "http://boilerpipe-web.appspot.com/extract?url={0}&extractor=LargestContentExtractor&output=text"
#BOTTOMPIPE_URL = "http://boilerpipe-web.appspot.com/extract?url={0}&extractor=ArticleExtractor&output=htmlFragment"
_myBPPage = ""
# scrape and get results from bottompipe
def scrapeResult(self, theURL, user_agent=AGENT_ID) :
request = urllib2.Request(self.BOTTOMPIPE_URL.format(theURL))
if user_agent:
request.add_header("User-Agent", user_agent)
pagefile = urllib2.urlopen(request)
realurl = pagefile.geturl()
f = pagefile
self._myBPPAge = f.read()
return(self._myBPPAge)
but when I reprint them to html I loose all the linebreaks.
Here's the code I use to write into HTML
f = open('./../../entries-new.html', 'a')
f.write(BottomPipeResult.scrapeResult(myLinkResult))
f.close()
Here an example of booilerpipe text result:
http://boilerpipe-web.appspot.com/extract?url=http%3A%2F%2Fresult.com&extractor=ArticleExtractor&output=text
i tried this but it doesn't work:
myLinkResult = re.sub('\n','<br />', myLinkResult)
Any suggestion?
Thanks
myLinkResult = re.sub('\n','<br />', myLinkResult )doesn't make any sense at all. It's not the HTML content. It's the URL being requested. Which doesn't have any\nin the URL. Nor does it have any effect on the HTML or the output.