0

I need help in removing outer tag from text .Since it is not text but Html tag it is not working as per given solution: original text :

<HIDE><HIDE>Anti-</HIDE></HIDE>Skull

i need to remove outer and tag from text .

which may turn into :

<HIDE>Anti-</HIDE>Skull
9
  • Show what you have tried so far and where you are stuck. So far, it is just a duplicate of: stackoverflow.com/questions/26672269/… Commented May 31, 2017 at 10:05
  • I definately agree what you saying Commented May 31, 2017 at 10:20
  • Select regexp_replace('<HIDE><HIDE>Anti-</HIDE></HIDE>Skull', '/(.*)\1', '/\1') As checktext From dual Commented May 31, 2017 at 10:21
  • and as per your suggestion i tried : Commented May 31, 2017 at 10:21
  • 1
    When I tried your first regex, I removed the slashes and added <> and it worked: '(<.*>)\1', '\1'. Result was <HIDE>Anti-</HIDE>Skull. Does that solve your problem? Commented May 31, 2017 at 11:55

2 Answers 2

1

Just to wrap up what we discussed in the comments:

SELECT regexp_replace('<HIDE><HIDE>Anti-</HIDE></HIDE>Skull', '(<.*>)\1+', '\1') AS checktext FROM dual

regexp_replace replaces part of the input that matches a regular expression with another string. If you use (<.*>)\1+ as the regex, it will match strings that start and end with <>, followed by the same thing again multiple times. Using \1 as the replacement string will then replace the whole thing with just one copy of that string. If you have trouble to understand how the regex works, please read about "capturing groups".

Please read MT0's answer for an approach that is more robust in case your html becomes more complicated.

Sign up to request clarification or add additional context in comments.

Comments

0

You can do this using XML functions:

WITH data ( xmlstring ) AS (
  SELECT '<html><body><HIDE><HIDE>Anti-</HIDE></HIDE>Skull</body></html>' FROM DUAL
)
SELECT UPDATEXML(
         xml,
         '//HIDE/HIDE/..',
         XMLQUERY( '//HIDE/HIDE' PASSING xml RETURNING CONTENT )
       ).getClobVal() AS updatedXML
FROM   ( SELECT XMLTYPE( xmlstring ) AS xml FROM data );

Outputs:

<html><body><HIDE>Anti-</HIDE>Skull</body></html>

You should also be able to use the more generic XPath '//*[name()=../name()]/..'; however, while XMLQUERY parses it successfully, UPDATEXML does not.

Update:

WITH data ( xmlstring ) AS (
  SELECT '<HIDE><HIDE>Anti-</HIDE></HIDE>Skull' FROM DUAL
)
SELECT SUBSTR(
         updatedXML,
         7,
         LENGTH( updatedXML ) - 13
       )
FROM   (
  SELECT UPDATEXML(
           xml,
           '//HIDE/HIDE/..',
           XMLQUERY( '//HIDE/HIDE' PASSING xml RETURNING CONTENT )
         ).getClobVal() AS updatedXML
  FROM   (
    SELECT XMLTYPE( '<root>' || xmlstring || '</root>' ) AS xml
    FROM data
  )
);

Update 2:

Before using a regular expression, you should understand the limitations - a regular expression cannot parse arbitrary HTML but will only work on a limited subset. In this case it will not match attributes or match the correct level of tags (and you should use an XML parser for that):

WITH data ( html ) AS (
  SELECT '<HIDE><HIDE>Anti-</HIDE></HIDE>Skull' FROM DUAL UNION ALL
  SELECT '<HIDE><HIDE>first</HIDE></HIDE>between<HIDE><HIDE>second</HIDE></HIDE>' FROM DUAL UNION ALL
  SELECT '<HIDE><HIDE>before<HIDE><HIDE>inner</HIDE></HIDE>after</HIDE></HIDE>outer' FROM DUAL
)
SELECT REGEXP_REPLACE(
         html,
         '<([A-Za-z][A-Za-z0-9-]*)><\1>(.*)</\1></\1>',
         '<\1>\2</\1>'
       )
FROM   data;

Output:

<HIDE>Anti-</HIDE>Skull
<HIDE>first</HIDE></HIDE>between<HIDE><HIDE>second</HIDE>
<HIDE>before<HIDE><HIDE>inner</HIDE></HIDE>after</HIDE>outer

The regular expression will match the simple case but you will find it difficult (or impossible) to correctly handle both the second and third cases.

4 Comments

sorry for the confusion but cant it be done using simple regex.
@AnimeshSheolikar I'm sure it could be done in the simple example but you should consider that the tags may have attributes and could have further descendants inside those that would complicate the regular expression. Such as '<HIDE L="1"><HIDE L="2">before<HIDE L="3"><HIDE L="4">inner</HIDE></HIDE>after</HIDE></HIDE>other' a regular expression will not match the start and end tags at the same level. You should probably read the second and third posts from this question before you look for a regular expression that will solve you problem.
I understand your concern regarding attribute but its only simple in given text no further descendants will be there.
@AnimeshSheolikar Updated again

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.