0

I have written a method to check my XML strings for &.

I need to modify the method to include the following:

< &lt

> &gt

\ &guot

& &amp

\ &apos

Here is the method

private String xmlEscape(String s) {
    try {
        return s.replaceAll("&(?!amp;)", "&amp;");
    }
    catch (PatternSyntaxException pse) {
        return s;
    }
} // end xmlEscape()

Here is the way I am using it

 sb.append("            <Host>" + xmlEscape(url.getHost()) + "</Host>\n");

How can I modify my method to incorporate the rest of the symbols?

EDIT

I think I must not have phrase the question correctly. In the xmlEscape() method I am wanting to check the string for the following chars < > ' " &, if they are found I want to replace the found char with the correct char.

Example: if there is a char & the char would be replaced with &amp; in the string.

Can you do something as simple as

try {
   s.replaceAll("&(?!amp;)", "&amp;");
   s.replaceAll("<", "&lt;");
   s.replaceAll(">", "&gt;");
   s.replaceAll("'", "&apos;");
   s.replaceAll("\"", "&quot;");
   return s;
}
catch (PatternSyntaxException pse) {
   return s;
}   
4
  • Hm, no one hinders you to call replaceAll more than just once... maybe I just do not understand the question?! Commented Sep 24, 2012 at 16:51
  • 1
    This one might help as well (2nd hit on google): stackoverflow.com/questions/439298/… Commented Sep 24, 2012 at 16:53
  • The negative lookahead ((!?amp;)) is a bug. The input is presumably plain text. Suppose the input is "In XML, to get an ampersand you need to write '&amp;'". Your code will incorrectly leave the & as is. Commented Sep 24, 2012 at 16:57
  • You should most definitely NOT be rewriting this yourself. There are several pre-existing libraries that provide fully debugged and RFC-compliant versions of this function. Don't reinvent the wheel. Commented Sep 24, 2012 at 22:51

2 Answers 2

4

You may want to consider using Apache commons StringEscapeUtils.escapeXml method or one of the many other XML escape utilities out there. That gives you a correct escaping to XML content without worrying about missing something when you need to escape something else but a host name.

Sign up to request clarification or add additional context in comments.

1 Comment

This should be in the form of a statement, or it risks getting marked "not an answer".
2

Alternatively have you considered using the StAX (JSR-173) APIs to compose your XML document rather than appending strings together (an implementation is included in the JDK/JRE)? This will handle all the necessary character escaping for you:

package forum12569441;

import java.io.*;
import javax.xml.stream.*;

public class Demo {

    public static void main(String[] args) throws Exception {
        // WRITE THE XML
        XMLOutputFactory xof = XMLOutputFactory.newFactory();

        StringWriter sw = new StringWriter();
        XMLStreamWriter xsw = xof.createXMLStreamWriter(sw);
        xsw.writeStartDocument();
        xsw.writeStartElement("foo");
        xsw.writeCharacters("<>\"&'");
        xsw.writeEndDocument();

        String xml = sw.toString();
        System.out.println(xml);

        // READ THE XML
        XMLInputFactory xif = XMLInputFactory.newFactory();
        XMLStreamReader xsr = xif.createXMLStreamReader(new StringReader(xml));
        xsr.nextTag(); // Advance to "foo" element
        System.out.println(xsr.getElementText());
    }

}

Output

<?xml version="1.0" ?><foo>&lt;&gt;"&amp;'</foo>
<>"&'

1 Comment

+1 - if you want XML then use an XML tool that knows all the rules, otherwise it's far too easy to end up with something not well-formed that other XML tools can't parse.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.