The most obvious way is with XPath. This is included in Java - no extra libraries. While there are many ways to get to what you want I wrote a quick test:
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.xml.sax.SAXException;
public class XPathDemo {
private static final String xmlString = "<a>\n" +
" <b>\n" +
" <c>val</c>\n" +
" <d x=\"x-val\" z=\"z-val\"><e>e-val</e><f>lot of irrelevant fields</f></d>\n" +
" <g>lot of irrelevant fields</g>\n" +
" </b>\n" +
"</a>";
public static void main(String[] argv) throws IOException, SAXException, ParserConfigurationException, XPathExpressionException {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.parse(new ByteArrayInputStream(xmlString.getBytes(StandardCharsets.UTF_8)));
XPath xpath = XPathFactory.newInstance().newXPath();
String c_value = (String) xpath.evaluate("/a/b/c/text()", document, XPathConstants.STRING);
System.out.println( "value of c is \"" + c_value + "\"");
String x_value = (String) xpath.evaluate("/a/b/d/@x", document, XPathConstants.STRING);
System.out.println( "value of x is \"" + x_value + "\"");
String z_value = (String) xpath.evaluate("/a/b/d/@z", document, XPathConstants.STRING);
System.out.println( "value of z is \"" + z_value + "\"");
String e_value = (String) xpath.evaluate("/a/b/d/e/text()", document, XPathConstants.STRING);
System.out.println( "value of e is \"" + e_value + "\"");
}
}
Output:
value of c is "val"
value of x is "x-val"
value of z is "z-val"
value of e is "e-val"
This is a super simple example. It gets harder when you have the same basic structure repeated many times. I'd read up on XPath Syntax as it is very powerful but can be a bit of a pain to get what you want sometimes.
There are a few caveats that you should know about:
- You need valid XML. What you posted is not and wouldn't work.
- This will read the entire document into memory. That's fine if you have a few thousand lines. But if you've got a 10GB document you may need another way.
@JsonIgnoreProperties(ignoreUnknown = true), see baeldung.com/jackson-deserialize-json-unknown-properties@Jsonprefixed annotations in combination with xml parsing...