1

I have a small utility that is to be used to produce a readout of an RSS feed in plain text. Here is representative code:

#!/usr/bin/python

# /usr/lib/xscreensaver/phosphor -scale 3 -program 'python newsfeed.py | tee /dev/stderr | festival --tts'

import sys
import os
import feedparser
from subprocess import call

def printLine():
    terminalRows, terminalColumns = os.popen('stty size', 'r').read().split()
    for i in range(0, int(terminalColumns)):
        sys.stdout.write("-")
    print("\n")

feed = feedparser.parse('http://home.web.cern.ch/scientists/updates/feed')

for post in feed.entries:
    printLine()
    print post.title + "\n"
    print post.description + "\n"
printLine()

When this run, the output looks like this:

-----------------------------------------------------------------------------------------------------

LHC seminar: Higgs boson width

<div class="field-body">
    <p>Constraints on the total Higgs boson width, Gamma_H, are presented using off-shell production and decay to ZZ in the 4l and 2l2nu final states. The analysis is based on data collected in 2012 by the CMS experiment at the LHC, corresponding to an integrated luminosity of L = 19.7/fb at a centre-of-mass energy of 8 TeV. The combined analysis of the 4l and 2l2nu events at high mass with the 4l measurement of the Higgs boson peak at 125.6 GeV leads to an upper limit on the Higgs boson width of Gamma_H &lt; 4.2 x Gamma_H(SM) at the 95% confidence level, assuming Gamma_H(SM) = 4.15 MeV. This result considerably improves over previous experimental constraints from direct measurements at the Higgs resonance peak.</p>
<h2><a href="https://indico.cern.ch/event/313506/">Watch the webcast at 11am CET</a></h2>
  </div>

-----------------------------------------------------------------------------------------------------

Neutrinos and nucleons

<p class="field-byline-taxonomy">
<a href="http://home.web.cern.ch/authors/christine-sutton">Christine Sutton</a></p>
  <div class="field-body">
    <p>On 7 April 1934 the journal <em>Nature</em> published a paper in which Hans Bethe and Rudolf Peierls made a first calculation of the neutrino cross-section and concluded that "it seems highly improbable that, even for cosmic ray energies, the cross-section becomes large enough to allow the process to be observed". Forty years on, neutrino cross-sections were not only being measured with the <a href="http://home.web.cern.ch/about/experiments/gargamelle">Gargamelle</a> bubble chamber at CERN's <a href="http://home.web.cern.ch/about/accelerators/proton-synchrotron">Proton Synchrotron</a>, they were helping to reveal a more fundamental layer to nature - the quarks.</p>
<p><strong>Read more:</strong> "<a href="http://cerncourier.com/cws/article/cern/56605">Neutrinos and nucleons</a>"- <em>CERN Courier</em></p>
  </div>

-----------------------------------------------------------------------------------------------------

What would be a sensible way that is possibly generalisable to most RSS feeds of turning this into plain text without the HTML code?

1 Answer 1

1

You could try the python module beautifulsoup4 (available through pip). This question might guide you on how to use it for your purpose.

As a start:

from bs4 import BeautifulSoup
soup = BeautifulSoup(post.description)
texts = soup.findAll(text = True)
print ''.join(texts)

which shows

Christine Sutton

On 7 April 1934 the journal Nature published a paper in which Hans Bethe and Rudolf Peierls made a first calculation of the neutrino cross-section and concluded that "it seems highly improbable that, even for cosmic ray energies, the cross-section becomes large enough to allow the process to be observed". Forty years on, neutrino cross-sections were not only being measured with the Gargamelle bubble chamber at CERN's Proton Synchrotron, they were helping to reveal a more fundamental layer to nature - the quarks.
Read more: "Neutrinos and nucleons"- CERN Courier
Sign up to request clarification or add additional context in comments.

1 Comment

Ah, this works very well. Thank you very much for your guidance!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.