0

Given the generic HTML snippet in text, is there any way to substitute block 1 by block 2 :

  1. <br /> Text2 <br />
  2. <p> Text2 </p>

So far this is as far as I could get using python and regex.

text =  '<p>Text1</p> <br/ >Text2 <br /> <p> </p> <br/>'
pattern = "<br />(?!<p>|</p>)<br />"
matches = [ match for match in re.finditer(pattern, text) ]
#matches = [ '<p>Text1</p> <br/ >Text2 <br /> <p> </p> <br/>' ]

It matches the whole text but I'm interested only in substituting in one go(one line). Is that a good approach, or perhaps you'd rather prefer capture what's inside, that is, "Text2" and insert inside of a <p> </p> block within the desired final_text?.

final_text = '<p>Text1</p> <p>Text2 </p> <p> </p> <br/>'
1
  • Regex not match noting. Try find (?s)<br\s*/>(.*?)<br\s*/> replace <p>\1</p> Commented May 15, 2020 at 18:26

1 Answer 1

1

The following example is to give you an idea that you can implement by yourself.

from simplified_scrapy.core.regex_helper import replaceReg,regSearch
html = '''
<p>Text1</p> <br />Text2 <br /> <p> </p> <br/>
<p>Text11</p> <br />Text12 <br /> <p> </p> <br/>
'''
while True: # Use cycle to process one by one
    o = regSearch(html,"<br\s*/>[^<]*<br\s*/>") # Take out the data to be replaced
    if not o: break
    n = replaceReg(o,"<br\s*/>","<p>",1) # Replace start
    n = replaceReg(n,"<br\s*/>","</p>",1) # Replace end
    html = html.replace(o,n)
print (html)

Result:

<p>Text1</p> <p>Text2 </p> <p> </p> <br/>
<p>Text11</p> <p>Text12 </p> <p> </p> <br/>
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.