1

I want to parse HTML and turn them into string templates. In the example below, I seeked out elements marked with x-inner and they became template placeholders in the final string. Also x-attrsite also became a template placeholder (with a different command of course).

Input:

<div class="x,y,z" x-attrsite>
  <div x-inner></div>
  <div>
    <div x-inner></div>
  </div>
</div>

Desired output:

<div class="x,y,z" {attrsite}>{inner}<div>{inner}</div></div>

I know there is HTMLParser and BeautifulSoup, but I am at a loss on how to extract the strings before and after the x-* markers and to escape those strings for templating.


Existing curly braces are handled sanely, like this sample:

<div x-maybe-highlighted> The template string "there are {n} message{suffix}" can be used.</div>

1 Answer 1

2

BeautifulSoup can handle the case:

  • find all div elements with x-attrsite attribute, remove the attribute and add {attrsite} attribute with a value None (produces an attribute with no value)
  • find all div elements with x-inner attribute and use replace_with() to replace the element with a text {inner}

Implementation:

from bs4 import BeautifulSoup

data = """
<div class="x,y,z" x-attrsite>
  <div x-inner></div>
  <div>
    <div x-inner></div>
  </div>
</div>
"""

soup = BeautifulSoup(data, 'html.parser')

for div in soup.find_all('div', {'x-attrsite': True}):
    del div['x-attrsite']
    div['{attrsite}'] = None

for div in soup.find_all('div', {'x-inner': True}):
    div.replace_with('{inner}')

print(soup.prettify())

Prints:

<div class="x,y,z" {attrsite}>
 {inner}
 <div>
  {inner}
 </div>
</div>
Sign up to request clarification or add additional context in comments.

3 Comments

Also, how to escape strings like "{hostname}" in the HTML which would conflict with python templating?
@aitchnyu sounds like a thing that should be handled separately, may be with a regex. Do you have an example input for this case? Thanks.
@aitchnyu in this example you basically want to replace {n} with {{n}} and {suffix} with {{suffix}}, right?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.