0

I have a data stored in as HTML table as CLOB type in a table, I had tried to parse it like below with reference from Parse HTML table with Oracle

with tbl as
(
    SELECT ROW_ID,xmltype(SUBSTR(qe.NOTE, 0, INSTR(qe.NOTE, '<br>')-1)) xml_data
    FROM MY_Table qe
    WHERE EVENT='note' 
)
select
    ROW_ID,rtrim (xmlagg (xmlelement (e, x.data
        || '|')
      ORDER BY x.row_number).extract ('//text()'), '|') AS Data   
from
    tbl
cross join
    xmltable('/table/tr'
        passing tbl.xml_data
        columns
            row_number FOR ORDINALITY,
            data varchar2(2000) path 'td[1]'
            
    ) x
WHERE x.data IS NOT NULL
GROUP BY ROW_ID

the above is working for some values but it is breaking giving the error ORA-06502: PL/SQL: numeric or value error ORA-06512: at "SYS.XMLTYPE", line 272 as said in that reference itself We can try parsing an HTML file using a XML parser, but it’s likely to fail. The reason is that HTML documents can have the following HTML features that XML parsers don’t understand. -- https://stackoverflow.com/a/32573619/266304

Is there a HTML parser in Oracle instead of XML Parser, I had tried to search but nothing worked out

Anyone please help

3
  • 3
    You can use a Java HTML parser and embed the classes into the database using CREATE JAVA commands or the loadjava utility. Commented Jun 16, 2021 at 8:59
  • I have only Read only rights in the database is it possible there ? Commented Jun 16, 2021 at 9:54
  • No, you would need to get the DBA to import the Java classes after you have created them if you were using a Java solution. Commented Jun 16, 2021 at 10:45

1 Answer 1

0

the above is working for some values but it is breaking giving the error ORA-06502: PL/SQL: numeric or value error ORA-06512: at "SYS.XMLTYPE", line 272

It will get that error if you have any NOTE values which do not have a <br> tag, because this:

SUBSTR(qe.NOTE, 0, INSTR(qe.NOTE, '<br>')-1)

will then be null, and xmltype() then throw that error.

If - and it's a big if - all of the notes start with a simple table with no embedded problematic tags, and that may or may not be followed be a line break tag, then you can use a case expression to only do the substr when needed:

with tbl as
(
    SELECT ROW_ID,
      xmltype(
        CASE
          WHEN INSTR(qe.NOTE, '<br>') > 0
          THEN SUBSTR(qe.NOTE, 0, INSTR(qe.NOTE, '<br>')-1)
          ELSE qe.NOTE
        END
      ) xml_data
    FROM MY_Table qe
    WHERE EVENT='note' 
)
...

Or perhaps slightly more robustly, look for and extract a table:

with tbl as
(
    SELECT ROW_ID,
      xmltype(SUBSTR(qe.NOTE, INSTR(qe.NOTE, '<table>'), INSTR(qe.NOTE, '</table>') + 7)) xml_data
    FROM MY_Table qe
    WHERE EVENT='note'
    AND INSTR(qe.NOTE, '<table>') > 0
)
...

db<>fiddle

But as you already know, this kind of approach is fraught with problems.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Alex it worked but with errors because of special Characters I had replaced the special characters

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.