0

I'm currently producing XML files from Oracle SQL but have come up against a problem when opening up the XML file in a web browser, I get the below message. This indicates the offending character

 XML Parsing Error: not well-formed
 Location: file://///data2/data/Download/d7prdv1/prsrepreports/test_error_2.xml
 Line Number 80, Column 29:          <musicTitle>NATURES SHADOW LOUNGE</musicTitle>

which is due to the musicTitle field having an apostrophe (i.e. the offending apostrophe is shown in the following "NATURE'S SHADOW LOUNGE" ).

After doing a dump of the string above it was found that the apostrophe was in fact chr(146) which is a different apostrophe.

How do I overcome the apostrophe problem in my XML output, as this could happen in other fields? I've added an extract of the SQL below just for info purposes, would I need to replace the apostrophe with a space? The data is loaded into the database from different companies from around the world, so they could be using differing characters.

  XMLELEMENT ("musicWork", -- start level 6 tag for music title
            XMLFOREST (cc.title AS "musicTitle"), -- start level 7 tag for music title       
                  XMLFOREST (
                          XMLFOREST ( cc.source_album_title AS "albumTitle",
                                      cc.product_album_promo_title AS "promoTitle",
                                      cc.label AS "label",
                                      cc.catalogue_no AS "catalogNumber",
                                      cc.isrc AS "isrc") AS "recordingInfo" 
                                    )  -- end level 7 tag for music title

I was thinking of creating the below function, replacing each invalid character with a NULL. Is this the way to go and how would I put chr(146) in the below FUNCTION rather than the character, to ensure I'm getting the offending character?

    FUNCTION CONVERT_VALUE (INPUT_STRING IN VARCHAR2)
    RETURN VARCHAR2
    IS
        l_string_converted   VARCHAR2(300);
        BEGIN

        l_string_converted := REGEXP_REPLACE (INPUT_STRING, '*|£|~|^|_', '', 1, 0, 'i');

        RETURN l_string_converted;

    END CONVERT_VALUE; 

So the CONVERT_VALUE function would then be called within my SQL XML script.

   XMLELEMENT ("musicWork", -- start level 6 tag for music title
            XMLFOREST (CONVERT_VALUE(cc.title) AS "musicTitle"), -- start level 7 tag for music title       
                  XMLFOREST (
                          XMLFOREST ( CONVERT_VALUE(cc.source_album_title) AS "albumTitle",
                                      CONVERT_VALUE(cc.product_album_promo_title) AS "promoTitle",
                                      cc.label AS "label",
                                      cc.catalogue_no AS "catalogNumber",
                                      cc.isrc AS "isrc") AS "recordingInfo" 
                                    )  -- end level 7 tag for music

thanks in advance.

14
  • Hi Shaun, could you provide the Unicode code point of this character? It doesn't look like the regular apostrophe, which is actually allowed in element content. It may be a character that is not allowed in XML at all, or it may be an encoding issue. Commented Aug 10, 2016 at 15:08
  • An actual apostrophe should be encoded as &apos;. What do you see if you run the query directly in SQL*Plus? What does select dump(title) from... show for that row in the original table? (You can add that as an edit to the question so it can be formatted) Commented Aug 10, 2016 at 15:22
  • 1
    ... and if you are not familiar with dump() - if you try the query Alex suggested, with the condition WHERE title LIKE '%SHADOW LOUNGE' so you only get the desired row - if I dump NATURE'S SHADOW LOUNGE I get Typ=96 Len=22: 78,65,84,85,82,69,39,83,32,83,72,65,68,79,87,32,76,79,85,78,71,69 The seventh code is 39, that is the ASCII code for the "normal" single quote. Is that what you get, too? Commented Aug 10, 2016 at 15:26
  • Thanks! Assuming notepad++ opens with the right encoding, it looks like &#x2013;, so it's in the allowed range [#x20-#xD7FF]. However, this is an en dash, not an apostrophe so it may still be a different encoding. Does Notepad++ tell you the encoding it used to open it, and does the XML file produced by Oracle begin with a byte order mark (#xFEFF) and/or declare its encoding on the first line? Also something coming to my mind: does it help to change the encoding used by the browser in which this file is opened to the same as notepad++? With no BOM/encoding declared, it has to be read as UTF-8. Commented Aug 10, 2016 at 15:32
  • I select the record from the database and the field record just looks like a normal apostrophe which seems strange Commented Aug 10, 2016 at 15:35

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.