The following is the target string.
July 17, 2007 –<br> September 25, 2009 <br> June 2007 - July 2010
I am trying to add a newline before <br> tags which DOES NOT follow -. Thus, the resulting string should be:
July 17, 2007 –<br> September 25, 2009 \n<br> June 2007 - July 2010
I tried the following regular expression to no avail.
re.sub(r'([^-])(\s*<br)',r'\1\n\2', astring)
gives me
July 17, 2007 –\n<br> September 25, 2009\n <br> June 2007 - July 2010
What is the solution?
UPDATE:
I am not actually parsing the HTML with regular expressions. I know that HTML + RegEx combo will plummet me to insanity.
I am using lxml to parse HTML already.
However, what I am not able to understand is why regex can't catch the -\s*< pattern.