I'm trying to figure out how can I add attribute id=ID_<number> to all tags in html snippet and remove another attributes.
For example:
<div class="...">...</div>
to:
<div id="DIV_1">...</div>
The DIV is a tag name in uppercase and _1 means ordering. So if this <div> would be a second tag, it would have DIV_2 id. The ordering is in DFS meaning so if the <div id="DIV_2">..</div> has some child like <div id="DIV_2"><ul class=".." style="..">...</ul></div>, the ul tag would have id: UL_3.
I tried to find all tags and then, remove their attributes and one by one add them their ID's.
re.findall(r'<([a-z][a-z0-9]*)\b[^>]*>',snippet)
This finds all tags. My idea is like:
for i,tag in enumerate(tags):
remove_all_attributes_from_tag
get name of the tag and add set attribute "{}_{}".format(tag_name.upper,i)
But can't figure out how to continue.
The snippet:
<div id="wtab" class="pd_cont" style="display: table;"><div class="pd_colmn"><h4>Display</h4><span>5.20-inch</span></div><div class="pd_colmn"><h4>Processor</h4><span>2GHz octa-core</span></div><div class="pd_colmn"><h4>Front Camera</h4><span>8-megapixel</span></div><div class="pd_colmn"><h4>Resolution</h4><span>1080x1920 pixels</span></div><div class="pd_colmn"><h4>RAM</h4><span>3GB</span></div><div class="pd_colmn"><h4>OS</h4><span>Android 6.0</span></div><div class="pd_colmn"><h4>Storage</h4><span>32GB</span></div><div class="pd_colmn"><h4>Rear Camera</h4><span>16-megapixel</span></div><div class="pd_colmn"><h4>Battery Capacity</h4><span>2650mAh</span></div></div>