2

I'm trying to write an XML file encoded in UTF-8 using data from an Oracle 9 database (should also work on Oracle 11) set up with NLS_CHARACTERSET = US7ASCII, NLS_LANGUAGE = AMERICAN. I use the XMLELEMENT and xmlattributes functions in order to construct a clob, and then I create a file from this clob.

Here's a simple example:

declare
xmlval  clob;
begin
    SELECT XMLELEMENT( "Parent", 
      XMLELEMENT( "Address", xmlattributes( unistr( 'N°27' ) as "Street", unistr( '77800' ) as "PostCode", unistr( 'Paris' ) as "City" ) )
         ).extract('/*').getclobVal()
    INTO xmlval
    FROM DUAL;

    dbms_xslprocessor.clob2file( xmlval, 'DIRXMLTMP', 'file.xml', nls_charset_id('AL32UTF8') );
end;

The tables in the database can contain several non-ascii characters as the client uses, I think, the Windows 1252 character code set.

Currently, I have to use the unistr function, otherwise, the procedure crashes when a field contains non-ascii characters.

Now, this code can generate the xml files but the non-ascii characters are replaced with the '?' character : 'N°27' becomes 'N?27'.

I've tried to play with the convert function in order to modify the string 'N°27' or the variable xmlval, for example :

convert( xmlval, 'WE8MSWIN1252', 'US7ASCII' )
convert( 'N°27', 'US7ASCII', 'WE8MSWIN1252' )

But I still get 'N?27' in the resulting file.

Is it possible to display these specific characters in the generated file from a us7ascii database?

4
  • Instead of UNISTR try to use ASCIISTR. It converts national character set to '\xxxx' representation. And UNISTR converts '\xxxx' back to national character set Commented Oct 24, 2014 at 7:13
  • Thanks for the tip but, for some reasons, I always get \FFFD for all the different specific characters I've tried : 'éè°' -> '\FFFD\FFFD\FFFD' Commented Oct 24, 2014 at 21:11
  • I created some function to encode the special characters in us7ascii similarly to what asciistr is supposed to do, and the corresponding decode function. Commented Oct 27, 2014 at 19:56
  • I finally can get my special characters in the clob xmlval! But I haven't managed to write the characters correctly encoded in UTF-8 in the xml file using the dbms_xslprocessor.clob2file function. I can create a correct Win1252 file but each time I try to convert data to UTF-8, all special characters end up encoded as 'EF BF BD' Commented Oct 27, 2014 at 22:49

1 Answer 1

2

Finally, I got some workaround:

1- create a function to encode characters above 127 as the string representing the corresponding hexadecimal code surrounded by specific delimiters: encodeSpecialChars('°') -> '#B0#'

2- create the function to decode the encoded strings: decodeSpecialChars('#B0#') -> '°'

3- create the XML clob by filtering all fields

4- decode the clob

5- convert the clob's raw data in UTF-8

6- save the data to a raw file using utl_file and utl_raw packages

declare
xmlval  clob;
begin
    SELECT XMLELEMENT( "Address", xmlattributes( encodeSpecialChars( 'N°27' ) as "Street", encodeSpecialChars( 'Frébault' ) as "City" )
         ).extract('/*').getclobVal()
    INTO xmlval
    FROM DUAL;

    -- <Address Street="N#B0#27" City="Fr#E9#bault"/>
    xmlval := decodeSpecialChars( xmlval );
    -- <Address Street="N°27" City="Frébault"/>     -- encoded in Windows-1252

    l_output := utl_file.fopen( 'DIRXMLTMP', 'fff.xml', 'w' );
    utl_file.PUT_RAW( l_output, UTL_RAW.convert( UTL_RAW.CAST_TO_RAW( xmlval ), 'FRENCH_FRANCE.AL32UTF8', 'FRENCH_FRANCE.WE8MSWIN1252' ) ); 
    utl_file.fclose( l_output );
end;
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.