How to get html tags?

Question

Say I have a text file like this:

<html><head>Headline<html><head>more words
</script>even more words</script>
<html><head>Headline<html><head>more words
</script>even more words</script>

How would I get just the tags into a list like this:

<html>
<head>
<html>
<head>
</script>
</script>
<html>
<head>
<html>
<head>
</script>
</script>

Is this a continuation of your other question? If it is, you should really edit your other question, rather than re-post — inspectorG4dget
– inspectorG4dget, Commented Dec 14, 2010 at 5:01

inspectorG4dget · Accepted Answer · 2010-12-15 06:34:51Z

6

I think this is what you want:

html_string = ''.join(input_file.readlines())
matches = re.findall('<.*?>', html_string)
for m in matches:
    print m

Hope this helps

edited Dec 15, 2010 at 6:34

answered Dec 14, 2010 at 4:59

inspectorG4dget

115k30 gold badges159 silver badges253 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Bramble Over a year ago

i think you mean: re.findall('<.*?>', html_string)

inspectorG4dget Over a year ago

@JackNull: You're absolutely right. The extra double quotes are a typo and have been retro-actively fixed

Community · Accepted Answer · 2017-05-23 12:22:43Z

4

Python has a HTMLParser module for this.

Here is some code which does what you want:

from HTMLParser import HTMLParser

class MyHTMLParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        print "<%s>"%tag

    def handle_endtag(self, tag):
        print "</%s>"%tag

parser = MyHTMLParser();
parser.feed("""<html><head>Headline<html><head>more words
        </script>even more words</script>
        <html><head>Headline<html><head>more words
        </script>even more words</script>
        """)

Enter your string in parser.feed

Output:

$ python htmlparser.py 
<html>
<head>
<html>
<head>
</script>
</script>
<html>
<head>
<html>
<head>
</script>
</script>

This discussion on SO should help: Using HTMLParser in Python efficiently

edited May 23, 2017 at 12:22

CommunityBot

11 silver badge

answered Dec 14, 2010 at 5:09

dheerosaur

15.2k6 gold badges33 silver badges31 bronze badges

Collectives™ on Stack Overflow

How to get html tags?

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related