I am trying to create a column in a view in Snowflake that replaces any string between strings that I care about with nothing.
This is essentially for the purpose of stripping html formatting out of text. As an example:
<ul>
<li>Text I care about 1
<li>Text I care about 2</li>
<li>Text I care about 3</li>
</ul>
Would should end up like this:
Text I care about 1
Text I care about 2
Text I care about 3
Based on the patterns I am seeing, I think that if I can eliminate any string starting with <, and ending with >, I should be able to achieve the result I am looking for.
In testing on different sites it seems like expression
REGEXP_REPLACE(originaltext, '<.+?>','') should, work, but when attempting in Snowflake it seems to be cutting off the last 'Text I care about' in some cases, and in other cases just is not showing any results at all. I am not sure if there is a syntax difference or something else off in the version of regex snowflake is using, but any advice would be appreciated.