0

Yes, yes, I've weighed using an xml parser instead of regular expressions, but this seems to be a simplistic enough case that it's suitable:

from BeautifulSoup import BeautifulSoup
from urllib import urlopen

tempSite = 'http://www.sumkindawebsiterighthur.com'
theTempSite = urlopen(tempSite).read()
currentTempSite = BeautifulSoup(theTempSite)
Email = currentTempSite.findAll('tr', valign="top") 
print Email[0] 

Currently results with:

<tr valign="top">
<td><p>Phone Number:</p></td>
<td>&nbsp;</td>
<td><p>706-878-8888</p></td>
</tr>

I'm trying to remove all markup (tr, td, p,   would be nice too) and result:

Phone Number: 706-878-8888

My problem is over-exclusion AND multiple lines being regex'd, looking for an answer that outputs on a single line.

4
  • 2
    You don't need an XML parser if you already have a DOM with BeautifulSoup. Surely you can recursively iterate over the subnodes and concatenate the inner text of each? (I've never used BeautifulSoup) Commented Jan 26, 2012 at 19:17
  • I'm getting an empty list (Email = []), is that the correct URL? Commented Jan 26, 2012 at 19:18
  • 1
    Haha no, not the correct site. Keeping someones information private. THere's got to be a simple solution for this though. Commented Jan 26, 2012 at 19:22
  • 3
    +1 for @Cameron. Don't use regex for this, try a bit further with BeautifulSoup, you get a better result, and learn "the right way" to do this sort of stuff. Commented Jan 26, 2012 at 19:25

1 Answer 1

2

If your results are really always that simple, the following regex will put 'Phone Number:' in capture group 1 and the number in capture group 2 as long as the re.DOTALL flag is set:

.*(Phone Number:).*?([-\d]+).*

You can then call re.sub() on your string with the replacement \1 \2.

Here is a complete example that returns what you want:

>>> s = """<tr valign="top">
... <td><p>Phone Number:</p></td>
... <td>&nbsp;</td>
... <td><p>706-878-8888</p></td>
... </tr>"""
>>> regex = re.compile(r'.*(Phone Number:).*?([-\d]+).*', re.DOTALL)
>>> regex.sub(r'\1 \2', s)
'Phone Number: 706-878-8888'
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.