Substitute a block in an HTML Snippet using python regex

Question

Given the generic HTML snippet in text, is there any way to substitute block 1 by block 2 :

 Text2 
 Text2

So far this is as far as I could get using python and regex.

text =  '<p>Text1</p> <br/ >Text2 <br /> <p> </p> <br/>'
pattern = "<br />(?!<p>|</p>)<br />"
matches = [ match for match in re.finditer(pattern, text) ]
#matches = [ '<p>Text1</p> <br/ >Text2 <br /> <p> </p> <br/>' ]

It matches the whole text but I'm interested only in substituting in one go(one line). Is that a good approach, or perhaps you'd rather prefer capture what's inside, that is, "Text2" and insert inside of a   block within the desired final_text?.

final_text = '<p>Text1</p> <p>Text2 </p> <p> </p> <br/>'

Regex not match noting. Try find (?s)<br\s*/>(.*?)<br\s*/> replace \1 — user13469682
– user13469682, Commented May 15, 2020 at 18:26

dabingsou · Accepted Answer · 2020-05-16 00:21:57Z

1

The following example is to give you an idea that you can implement by yourself.

from simplified_scrapy.core.regex_helper import replaceReg,regSearch
html = '''
<p>Text1</p> <br />Text2 <br /> <p> </p> <br/>
<p>Text11</p> <br />Text12 <br /> <p> </p> <br/>
'''
while True: # Use cycle to process one by one
    o = regSearch(html,"<br\s*/>[^<]*<br\s*/>") # Take out the data to be replaced
    if not o: break
    n = replaceReg(o,"<br\s*/>","<p>",1) # Replace start
    n = replaceReg(n,"<br\s*/>","</p>",1) # Replace end
    html = html.replace(o,n)
print (html)

Result:

<p>Text1</p> <p>Text2 </p> <p> </p> <br/>
<p>Text11</p> <p>Text12 </p> <p> </p> <br/>

answered May 16, 2020 at 0:21

dabingsou

2,4691 gold badge8 silver badges8 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Substitute a block in an HTML Snippet using python regex

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related