1

I had tried to Parse a HTML table in oracle and try to extract the data inside it using the query

with tbl as
(
    select xmltype('<table class="table table-striped " border=''1''>
 <caption>ACO Details</caption>
 <tr><th><b>Record</b></th><th><b>Changed ACO(s)</b></th></tr>
 <tr><td> Added   ACO</td><td>TDE</td></tr></table>
 <br>
 <table style=''width:60%;'' border=''1''>
 <caption>Price Breakdown</caption>
 <tr style=''background-color: darkgray;text-align:right;''>
 <th style=''width:50%;text-align: center;''><b>Price</b></th>
 <th style=''width:30%;text-align: center;''><b>Difference</b></th></tr>
 <tr><td> Base Price</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Extras Price</td><td style=''text-align:right;''>43.98</td></tr>
 <tr><td> Total Discount</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Medical Price</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Postage</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Admin Fee</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Transaction Fee</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Auto Renewal Adjustment</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Stamp Duty</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Competitor Price Matching Discount</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Charge Adjustment</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Total Price</td><td style=''text-align:right;''>43.98</td></tr></table>
 <br/>
 <table style=''width:60%;'' border=''1''>
 <caption>Reason</caption><tr><td>Change Done in Web</td></tr>
 </table>
 <br/>') xml_data from dual
)
select
    x.class, x.caption
from
    tbl
cross join
    xmltable('/table/caption/tr'
        passing tbl.xml_data
        columns
            class varchar2(10) path '@class',
            caption varchar2(3) path 'caption[1]'
    ) x
where
    x.caption is not NULL;

but this throws the error

ORA-31011: XML parsing failed
ORA-19202: Error occurred in XML processing
LPX-00245: extra data after end of document
Error at line 5
ORA-06512: at "SYS.XMLTYPE", line 310
ORA-06512: at line 1

Can someone please point me what I am doing wrong

2
  • 1
    HTML is not XML. this is not valid xml. Commented Mar 14, 2020 at 20:37
  • @OldProgrammer I had refereed this Answer - stackoverflow.com/questions/58076244/… and tried the above query in it it parses the HTML table only Commented Mar 14, 2020 at 20:40

1 Answer 1

2

Your HTML is invalid XML.

  • You have a <br> tag that does not have a matching closing tag; and
  • Your HTML is a fragment that does not have a single root element.

Put a leading <html><body> and training </body></html> and replace <br> with <br /> and the ORA-31011 exception goes away.

with tbl as
(
    select xmltype('<html><body><table class="table table-striped " border=''1''>
 <caption>ACO Details</caption>
 <tr><th><b>Record</b></th><th><b>Changed ACO(s)</b></th></tr>
 <tr><td> Added   ACO</td><td>TDE</td></tr></table>
 <br />
 <table style=''width:60%;'' border=''1''>
 <caption>Price Breakdown</caption>
 <tr style=''background-color: darkgray;text-align:right;''>
 <th style=''width:50%;text-align: center;''><b>Price</b></th>
 <th style=''width:30%;text-align: center;''><b>Difference</b></th></tr>
 <tr><td> Base Price</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Extras Price</td><td style=''text-align:right;''>43.98</td></tr>
 <tr><td> Total Discount</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Medical Price</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Postage</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Admin Fee</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Transaction Fee</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Auto Renewal Adjustment</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Stamp Duty</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Competitor Price Matching Discount</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Charge Adjustment</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Total Price</td><td style=''text-align:right;''>43.98</td></tr></table>
 <br/>
 <table style=''width:60%;'' border=''1''>
 <caption>Reason</caption><tr><td>Change Done in Web</td></tr>
 </table>
 <br/></body></html>') xml_data from dual
)
SELECT x.class,
       x.caption
FROM   tbl
       CROSS JOIN
       XMLTABLE(
         '/html/body/table'
         PASSING tbl.xml_data
         COLUMNS
            class   VARCHAR2(50) PATH './@class',
            caption VARCHAR2(30) PATH './caption/text()'
       ) x;

outputs:

CLASS                | CAPTION        
:------------------- | :--------------
table table-striped  | ACO Details    
null                 | Price Breakdown
null                 | Reason         

db<>fiddle here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.