1

I'm trying to figure out how can I add attribute id=ID_<number> to all tags in html snippet and remove another attributes.

For example:

<div class="...">...</div>

to:

<div id="DIV_1">...</div>

The DIV is a tag name in uppercase and _1 means ordering. So if this <div> would be a second tag, it would have DIV_2 id. The ordering is in DFS meaning so if the <div id="DIV_2">..</div> has some child like <div id="DIV_2"><ul class=".." style="..">...</ul></div>, the ul tag would have id: UL_3.

I tried to find all tags and then, remove their attributes and one by one add them their ID's.

re.findall(r'<([a-z][a-z0-9]*)\b[^>]*>',snippet)

This finds all tags. My idea is like:

for i,tag in enumerate(tags):

    remove_all_attributes_from_tag
    get name of the tag and add set attribute "{}_{}".format(tag_name.upper,i)

But can't figure out how to continue.

The snippet:

<div id="wtab" class="pd_cont" style="display: table;"><div class="pd_colmn"><h4>Display</h4><span>5.20-inch</span></div><div class="pd_colmn"><h4>Processor</h4><span>2GHz octa-core</span></div><div class="pd_colmn"><h4>Front Camera</h4><span>8-megapixel</span></div><div class="pd_colmn"><h4>Resolution</h4><span>1080x1920 pixels</span></div><div class="pd_colmn"><h4>RAM</h4><span>3GB</span></div><div class="pd_colmn"><h4>OS</h4><span>Android 6.0</span></div><div class="pd_colmn"><h4>Storage</h4><span>32GB</span></div><div class="pd_colmn"><h4>Rear Camera</h4><span>16-megapixel</span></div><div class="pd_colmn"><h4>Battery Capacity</h4><span>2650mAh</span></div></div>

1 Answer 1

1

First replace all tag attributes with the id structure and a unique identifier. In a second step replace the unique identifier one by one in a loop.

Code

import re
html_orig = '<div id="wtab" class="pd_cont" style="display: table;"><div class="pd_colmn"><h4>Display</h4><span>5.20-inch</span></div><div class="pd_colmn"><h4>Processor</h4><span>2GHz octa-core</span></div><div class="pd_colmn"><h4>Front Camera</h4><span>8-megapixel</span></div><div class="pd_colmn"><h4>Resolution</h4><span>1080x1920 pixels</span></div><div class="pd_colmn"><h4>RAM</h4><span>3GB</span></div><div class="pd_colmn"><h4>OS</h4><span>Android 6.0</span></div><div class="pd_colmn"><h4>Storage</h4><span>32GB</span></div><div class="pd_colmn"><h4>Rear Camera</h4><span>16-megapixel</span></div><div class="pd_colmn"><h4>Battery Capacity</h4><span>2650mAh</span></div></div>'
html_edit = re.sub('(<[\w\d]+)(\s?[\w\d\s=;"_:]*)(>)',
                   '\g<1> id="DIV_!id!\g<3>', html_orig)
i = 1
while True:
    sub = re.subn('!id!', str(i), html_edit, count=1)
    if sub[1] == 0:
        break
    html_edit = sub[0]
    i += 1

re.subn() returns a tuple including the number of subs, this enables the break condition for the loop.

Result

'<div id="DIV_1><div id="DIV_2><h4 id="DIV_3>Display</h4><span id="DIV_4>5.20-inch</span></div><div id="DIV_5><h4 id="DIV_6>Processor</h4><span id="DIV_7>2GHz octa-core</span></div><div id="DIV_8><h4 id="DIV_9>Front Camera</h4><span id="DIV_10>8-megapixel</span></div><div id="DIV_11><h4 id="DIV_12>Resolution</h4><span id="DIV_13>1080x1920 pixels</span></div><div id="DIV_14><h4 id="DIV_15>RAM</h4><span id="DIV_16>3GB</span></div><div id="DIV_17><h4 id="DIV_18>OS</h4><span id="DIV_19>Android 6.0</span></div><div id="DIV_20><h4 id="DIV_21>Storage</h4><span id="DIV_22>32GB</span></div><div id="DIV_23><h4 id="DIV_24>Rear Camera</h4><span id="DIV_25>16-megapixel</span></div><div id="DIV_26><h4 id="DIV_27>Battery Capacity</h4><span id="DIV_28>2650mAh</span></div></div>'
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, this works almost perfectly. Just one thing. If there is for example id="wtab-8585" instead of id="wtab", it starts to replace IDS from the second tag. <div id="wtab-8585" class="pd_cont" style="display: table;"><div id="DIV_1><h4 id="DIV_2>Display<...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.