1

I am try to transform from XML (UTF-8 encoding) to CSV (win-1251 encoding) - I get an error

net.sf.saxon.trans.DynamicError: Output character not available in this encoding (decimal 160)

I understand that in the xml text there is a character with code 160 which is not in win-1251.

Tried to clear XML before transformation process, but it doesn't help

        Charset charset = Charset.forName("windows-1251");
        CharsetDecoder decoder = charset.newDecoder();
        CharsetEncoder encoder = charset.newEncoder();
        encoder.onUnmappableCharacter(CodingErrorAction.REPLACE);
        String result = s;

        try {
            ByteBuffer bbuf = encoder.encode(CharBuffer.wrap(s));
            CharBuffer cbuf = decoder.decode(bbuf);
            result = cbuf.toString();
        } catch (CharacterCodingException cce) {
            log.error("Exception during character encoding/decoding: " + cce.getMessage());
        }

Please tell me the best way to solve this problem?

my xsl sample

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE csv-style [
<!ENTITY semicolons     ';;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;'>
<!ENTITY commas         ',,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,'>
]>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" >
<xsl:output method="text" indent="no" omit-xml-declaration="yes" encoding="windows-1251"/>

<xsl:param name="delim">semicolon</xsl:param>
<xsl:param name="showHead">yes</xsl:param>
<xsl:variable name="delimStr">
    <xsl:choose>
        <xsl:when test="$delim = 'comma'">&commas;</xsl:when>
        <xsl:otherwise>&semicolons;</xsl:otherwise>
    </xsl:choose>
</xsl:variable>

<xsl:template match="blocks">
    <xsl:apply-templates select="*"/>
</xsl:template>

<xsl:template match="description|pair|foot|body/table/head">
<!-- don't do anything just skip it-->
</xsl:template>

<xsl:template match="table">
    <xsl:apply-templates select="table|head|body"/>
</xsl:template>

<xsl:template match="col">
    <xsl:if test="position()=1">
        <xsl:value-of select="substring($delimStr, 1, @id - 1)"/>
    </xsl:if>
<xsl:choose>
    <xsl:when test="@value">
        <xsl:text>&quot;</xsl:text><xsl:variable name="escape">
        <xsl:call-template name="_replace_string">
            <xsl:with-param name="string" select="@value" />
        </xsl:call-template>
    </xsl:variable>
    <xsl:value-of select="$escape" /><xsl:text>&quot;</xsl:text>

    </xsl:when>
    <xsl:otherwise>
        <xsl:text>""</xsl:text>
        <xsl:apply-templates/>
    </xsl:otherwise>
</xsl:choose>
<xsl:choose>
    <xsl:when test="position()=last()">
        <xsl:value-of select="substring($delimStr, 1, ancestor::table[1]/@colNum - @id)"/>
    </xsl:when>
    <xsl:otherwise>
        <xsl:value-of select="substring($delimStr, 1, following-sibling::col[1]/@id - @id)"/>
    </xsl:otherwise>
</xsl:choose>
</xsl:template> <!-- col -->

<xsl:template match="row">
    <xsl:if test="col[@value][1]">
        <xsl:apply-templates select="col"/>
        <xsl:text>&#10;</xsl:text>
    </xsl:if>
</xsl:template>

<xsl:template match="head">
    <xsl:if test="$showHead = 'yes'">
        <xsl:apply-templates select="*"/>
    </xsl:if>
</xsl:template>

<xsl:template match="body">
    <xsl:apply-templates select="*"/>
</xsl:template>

<xsl:template name="_replace_string">
    <xsl:param name="string" select="''"/>
    <xsl:variable name="find">"</xsl:variable>
    <xsl:variable name="replace">""</xsl:variable>
    <xsl:choose>
        <xsl:when test="contains($string,$find)">
            <xsl:value-of select="concat(substring-before($string,$find),$replace)"/>
            <xsl:call-template name="_replace_string">
                <xsl:with-param name="string" select="substring-after($string,$find)"/>
                <xsl:with-param name="find" select="$find"/>
                <xsl:with-param name="replace" select="$replace"/>
            </xsl:call-template>
        </xsl:when>
        <xsl:otherwise>
            <xsl:value-of select="$string"/>
        </xsl:otherwise>
    </xsl:choose>
</xsl:template>

</xsl:stylesheet>

my xml sample

<?xml version="1.0" encoding="UTF-8" ?><blocks type="report"><functions><func num="4" text=" nameOf_10031"></func><func num="5" text="name Of_10071"></func><func num="6" text="name Of_10006"></func></functions><description name="[441] testesttest with 160 "><rows total="44" start="1" end="44" show-data="yes"></rows><columns count="10"><column id="1" type="4" position="1" width="" format="&apos;dd.mm.yyyy&apos;"></column><column id="2" type="4" position="2" width="" format="&apos;dd.mm.yyyy&apos;"></column><column id="3" type="3" position="3" width=""></column><column id="4" type="2" position="4" width=""></column><column id="5" type="2" position="5" width=""></column><column id="6" type="2" position="6" width=""></column><column id="7" type="2" position="7" width=""></column><column id="8" type="2" position="8" width=""></column><column id="9" type="2" position="9" width=""></column><column id="10" type="2" position="10" width=""></column></columns></description><pair name="ReportName" value="test test test "></pair><table colNum="10" id="12561"><head><row><col id="1" value="test test test"></col><col id="2" value=" test test test"></col><col id="3" value="test test test"></col><col id="4" value="test test test"></col><col id="5" value="test test test"></col><col id="6" value="test test test"></col><col id="7" value="test test test"></col><col id="8" value=" test test test"></col><col id="9" value="test test test"></col><col id="10" value="test test test"></col></row></head><body><row num="1"><col id="1" value="01.07.2006"></col><col id="2"></col><col id="3" value="53363"></col><col id="4" value="65187" record-id="65187"></col><col id="5" value="53363" record-id="53368"></col><col id="6" value="test test test" record-id="1974"></col><col id="7"></col><col id="8"></col><col id="9" value="test test test"></col><col id="10"></col></row></body></table></blocks>

when i try

java -cp saxon-9.1.0.8.jar net.sf.saxon.Transform -t -s:myxml.xml -xsl:myxsl.xsl -o:result.csv

i get an same error (160)

Saxon 9.1.0.8J from Saxonica
Java version 1.8.0_333
Warning: at xsl:stylesheet on line 11 column 81 of myxsl.xsl:
  Running an XSLT 1.0 stylesheet with an XSLT 2.0 processor
Stylesheet compilation time: 378 milliseconds
Processing file:/D:/111/myxml2.xml
Building tree for file:/D:/111/myxml2.xml using class net.sf.saxon.tinytree.TinyBuilder
Tree built in 4 milliseconds
Tree size: 46 nodes, 0 characters, 99 attributes
Loading net.sf.saxon.event.MessageEmitter
Error at xsl:value-of on line 46 of myxsl.xsl:
  Output character not available in this encoding (decimal 160)
  at xsl:apply-templates (file:/D:/111/myxsl.xsl#66)
     processing /blocks/table[1]/head[1]/row[1]/col[2]
  at xsl:apply-templates (file:/D:/111/myxsl.xsl#73)
     processing /blocks/table[1]/head[1]/row[1]
  at xsl:apply-templates (file:/D:/111/myxsl.xsl#32)
     processing /blocks/table[1]/head[1]
  at xsl:apply-templates (file:/D:/111/myxsl.xsl#24)
     processing /blocks/table[1]
  in built-in template rule
Transformation failed: Run-time errors were reported

When I use a newer version, for example Saxon-HE-10.3.jar, there are no problems, but unfortunately I can't upgrade to it

6
  • 2
    Can you show us your code using Saxon giving the error? And minimal XML and XSLT samples? I am kind of confused because en.wikipedia.org/wiki/Windows-1251 suggests the Unicode character 160 (non breaking space) is part of Windows-1251. Commented Jun 23, 2022 at 13:05
  • It works for me: trying java net.sf.saxon.Query -qs:"'x&#xa0;x'" !method=text !encoding=windows-1251. You'll have to give more detail of exactly what you are doing. Commented Jun 23, 2022 at 13:42
  • @MartinHonnen yes, i did it Commented Jun 23, 2022 at 13:56
  • @Michael Kay how can i use this option in my code? Commented Jun 23, 2022 at 13:58
  • 1
    I am afraid I can't tell why you get that error with that rather very old version of Saxon. As I checked, the character is part of that encoding and current versions of Saxon don't have any problems to write it out with that encoding. As for workarounds, you could try whether a character map works. But really I am guessing, that version of Saxon is too old for me to know, and I still can't tell why that character is giving that error for the named encoding. Commented Jun 23, 2022 at 15:00

2 Answers 2

1

A character map mapping e.g the non-breaking space 160 to a normal space 32 would be

  <xsl:character-map name="m1">
    <xsl:output-character character="&#160;" string=" "/>
  </xsl:character-map>

  <xsl:output use-character-maps="m1"/>

Character maps are supported since XSLT 2 and Saxon 8.9 I think was the first version to implement the 2.0 standard so 9.1 should cover that.

Sign up to request clarification or add additional context in comments.

1 Comment

There, in fact, not only this symbol, but there are a number of others that are also in win1251, I would not want to specify each one, because with each new document, new unknowns may appear
1

You are using a very old (and unsupported) version of Saxon. In Saxon 9.1 (released in 2009) the software maintained its own data tables for character encoding, rather than getting it all from the JDK. According to the definition of CP1251 used in the Saxon 9.1 data tables, there is no mapping for the Unicode codepoint 160. The relevant source code contains a link to the URI http://www.microsoft.com/globaldev/reference/sbcs/1251.htm as its source of information, but that web page is no longer available.

Sorry we can't help you more, but this kind of thing happens if you don't upgrade your software from time to time.

Your best way forward is probably to output the data in UTF-8 encoding and then use some other utility to convert the CSV file from UTF-8 to CP1251.

1 Comment

Thanks for the information and ideas, I'll think about how best to proceed

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.