I need to process a large KML file (>3 MiBs). To inspect it, I would need to look into it, but there is so many Style and StyleMap nodes that manual browsing becomes impossible. I have decided to remove the unnecessary nodes programmatically with Node.js. It is rather easy to parse an XML file with Node.js for example by using sax or xmldom. But the tricky part seems to be how to exclude certain nodes and their children and keep all the others. It becomes a rather complex task with sax because the output is XML so all kept nodes, their attributes and children must be processed. I feel there should be a simpler and more robust solution. Any suggestions and code snippets?
-
2Search any xml-parser package on npm, include it, read your file, remove certain nodes, save to file and voilà. What exactly are you asking?xDreamCoding– xDreamCoding2017-10-07 23:34:37 +00:00Commented Oct 7, 2017 at 23:34
-
@xDreamCoding Thanks, I was looking for a general approach, that you briefly described, and a code snippet. Especially the part how the nodes should be removed. I edited the question to be more specific. I found that xpath might be able to do this. If it works well, I guess I will implement a npm module for this.Akseli Palén– Akseli Palén2017-10-08 10:37:25 +00:00Commented Oct 8, 2017 at 10:37
-
You want to transform the XML file. XSLT is your friend.Tomalak– Tomalak2017-10-08 10:49:48 +00:00Commented Oct 8, 2017 at 10:49
-
you can select nodes you want with camaroAnh Thang Bui– Anh Thang Bui2017-10-16 16:10:28 +00:00Commented Oct 16, 2017 at 16:10
-
@AnhThangBui Thanks for the tip. However, the problem is a bit different. I do not know all the nodes and props beforehand because the file I'm try to process is so large that I cannot inspect it. I just want to remove matching nodes and keep all the rest, regardless their names, props, or children.Akseli Palén– Akseli Palén2017-10-17 18:21:33 +00:00Commented Oct 17, 2017 at 18:21
Add a comment
|
1 Answer
One way is to use xmldom and xpath. First, fetch the nodes to remove by using xpath and XPath expressions. It returns an array of xmldom nodes that can be removed from the DOM tree. For example to remove all book nodes:
var xmldom = require('xmldom');
var xpath = require('xpath');
var parser = new xmldom.DOMParser();
var serializer = new xmldom.XMLSerializer();
var xmlIn = '<bookstore>' +
'<book>Animal Farm</book>' +
'<book>Nineteen Eighty-Four</book>' +
'<essay>Reflections on Writing</essay>' +
'</bookstore>';
var root = parser.parseFromString(xmlIn, 'text/xml');
var nodes = xpath.select('//book', root);
nodes.forEach(function (n) {
n.parentNode.removeChild(n);
});
var xmlOut = serializer.serializeToString(root);
However, dealing with namespaces, multiple XPath expressions, and indentation preservation is a struggle. Therefore I created a NPM module filterxml to lift the weights.
var filterxml = require('filterxml')
var patterns = ['//book'];
var namespaces = {};
filterxml(xmlIn, patterns, namespaces, function (err, xmlOut) {
console.log(xmlOut);
});
Will output:
<bookstore><essay>Reflections on Writing</essay></bookstore>