0

Let's say you have a varied HTML block like this:

<div class="container">
  <div class="sub-container">
    <a href="example.com">Blue</a>
  </div>
  Black
  </br>
  <div class="sub-container">
    <a href="example.com">Yellow</a>
  </div>
  <div class="sub-container">
    <a href="example.com">Pink</a>
  </div>
  Orange
  </br>
</div>

What would your approach be, using python, to extract the colours from this HTML block?

2
  • Why mark as negative without comment? Commented Jun 29, 2018 at 4:48
  • 1
    Perhaps the downvoter (not me) thinks that you should explain what the problem is and what your own approach is. Commented Jun 29, 2018 at 5:05

2 Answers 2

2

You can use .text to get all the colors from your sample html.

Ex:

from bs4 import BeautifulSoup
s = """<div class="container">
  <div class="sub-container">
    <a href="example.com">Blue</a>
  </div>
  Black
  </br>
  <div class="sub-container">
    <a href="example.com">Yellow</a>
  </div>
  <div class="sub-container">
    <a href="example.com">Pink</a>
  </div>
  Orange
  </br>
</div>"""
soup = BeautifulSoup(s, "html.parser")
print(soup.text.strip().replace(" ", ""))

Output:

Blue

Black


Yellow


Pink

Orange
Sign up to request clarification or add additional context in comments.

Comments

0

To extract a tag in html with regex, you might want to try this:

<(\w+)[\s\w\d=\-+\.]*>(.*)</\1\s*>

And then use group 2 to find everything inside of that tag. You could also change the start of the regex to:

<(a) (etc...)

And that will only match a tags.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.