Python: How to keep breaks reading from a txt file when printing to html

Question

I'm trying to keep linebreaks reading from a txt file when I print the content into an HTML one.

I get results from boilerpipe in this way:

class BottomPipeResult :

    AGENT_ID   = "Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20100101 Firefox/7.0.1"

    BOTTOMPIPE_URL = "http://boilerpipe-web.appspot.com/extract?url={0}&extractor=LargestContentExtractor&output=text"

    #BOTTOMPIPE_URL = "http://boilerpipe-web.appspot.com/extract?url={0}&extractor=ArticleExtractor&output=htmlFragment"

    _myBPPage = ""

    # scrape and get results from bottompipe
    def scrapeResult(self, theURL, user_agent=AGENT_ID) :
        request = urllib2.Request(self.BOTTOMPIPE_URL.format(theURL))
        if user_agent:
            request.add_header("User-Agent", user_agent)
            pagefile = urllib2.urlopen(request)
            realurl = pagefile.geturl()
            f = pagefile
            self._myBPPAge = f.read()
        return(self._myBPPAge)

but when I reprint them to html I loose all the linebreaks.

Here's the code I use to write into HTML

f = open('./../../entries-new.html', 'a')
f.write(BottomPipeResult.scrapeResult(myLinkResult))
f.close()

Here an example of booilerpipe text result:

http://boilerpipe-web.appspot.com/extract?url=http%3A%2F%2Fresult.com&extractor=ArticleExtractor&output=text

i tried this but it doesn't work:

myLinkResult = re.sub('\n','<br />', myLinkResult)

Any suggestion?

Thanks

"Any suggestion?" Yes. Define the problem you're having. What's not working? "trying to keep breaks" doesn't mean much. Or rather, it could mean almost anything. Word breaks, line breaks, coffee breaks. Please be more specific. Include code. And clearly state what doesn't work in your code. — S.Lott
– S.Lott, Commented Feb 20, 2012 at 22:14
Sorry, you're right. I edited the question. Hope now it's clearer. — slwr
– slwr, Commented Feb 20, 2012 at 22:18
Where is the code where you "reprint them to html"? I have a sneaking suspicion that you don't realize "html" ignores whitespace for the most part. — gfortune
– gfortune, Commented Feb 20, 2012 at 22:25
I added the html part. I'm actually aware that HTML ignores whitespaces, but I thought it would keep linebreaks. But I'm probably wrong. — slwr
– slwr, Commented Feb 20, 2012 at 22:31
myLinkResult = re.sub('\n','<br />', myLinkResult ) doesn't make any sense at all. It's not the HTML content. It's the URL being requested. Which doesn't have any \n in the URL. Nor does it have any effect on the HTML or the output. — S.Lott
– S.Lott, Commented Feb 20, 2012 at 23:08

Samuel Fraser · Accepted Answer · 2012-02-20 22:52:43Z

1

You could wrap the text in a <pre> tag. This tells the HTML that the text is pre-formatted.

eg:

<pre>Your text
With line feeds
and other things
</pre>

edited Feb 20, 2012 at 22:52

answered Feb 20, 2012 at 22:45

Samuel Fraser

563 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

gfortune · Accepted Answer · 2012-02-20 22:50:47Z

0

I modified your code just a touch so it was runnable and it seems to "work" properly for me. The resulting output has line endings where expected. I'm seeing some encoding issues, but no line ending issues.

Code

import urllib2

class BottomPipeResult :

    AGENT_ID   = "Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20100101 Firefox/7.0.1"
    BOTTOMPIPE_URL = "http://boilerpipe-web.appspot.com/extract?url={0}&extractor=LargestContentExtractor&output=text"
    _myBPPage = ""

    # scrape and get results from bottompipe
    def scrapeResult(self, theURL, user_agent=AGENT_ID) :
        request = urllib2.Request(self.BOTTOMPIPE_URL.format(theURL))
        if user_agent:
            request.add_header("User-Agent", user_agent)
            pagefile = urllib2.urlopen(request)
            realurl = pagefile.geturl()
            f = pagefile
            self._myBPPAge = f.read()
        return(self._myBPPAge) 


bpr = BottomPipeResult()
myLinkResult = 'http://result.com'

f = open('out.html', 'a')
f.write(bpr.scrapeResult(myLinkResult))
f.close()

Output

Result-Expand.flv
We want to help your company grow. Our Result offices around the world can help you expand your business faster and more cost efficiently. And at the same time bring the experience of having expanded more than 150 companies during the past ten years.
Result can help you grow in your local market, regionally, or globally through our team of experienced business builders, our industry know-how and our know-who.
Our services range from well designed expansion strategies to assuming operational responsibility for turning these strategies into successful business.
We donâ€™t see ourselves as mere consultantsÂ  who give you a strategy presentation and then leave you to your own devices. We prefer to be considered as an extended, entirely practical arm of your management team. Weâ€™re hands-on and heads-on. Weâ€™re business builders.
Weâ€™re co-entrepreneurs. This is also reflected in our compensation structure â€“ a significant part of our compensation is result Â based.

Making the results more "HTML" like

As far as html output is concerned, you probably want to wrap each line in a <p> paragraph tag.

output = BottomPipeResult.scrapeResult(myLinkResult) 
f.write('\n'.join(['<p>' + x + '</p>' for x in output.split('\n')]))

answered Feb 20, 2012 at 22:50

gfortune

2,63916 silver badges14 bronze badges

2 Comments

slwr Over a year ago

Thanks gfortune, but in order to work in html it should have the <br /> tag. As you rendered it it keeps the breaks only in the sourcecode.

gfortune Over a year ago

The first piece of code indeed only breaks in the textual output and definitely collapses to one line when displayed as HTML. If you replace the second to last line with the code from the bottom of my post, it will wrap each "line" in paragraph tags which should result in reasonable HTML output. As I noted in the comments on your question, you need to be very precise about where you want to see "line breaks." If you want them in the HTML view of the data, you will need to add tags as necessary. If you want the line breaks in the text output, I think you'll find they are already there.

Collectives™ on Stack Overflow

Python: How to keep breaks reading from a txt file when printing to html

2 Answers 2

Comments

Code

Output

Making the results more "HTML" like

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Code

Output

Making the results more "HTML" like

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related