I work on a node.js application for processing and loading large amounts of geospatial data from files into a JSON document database.
The source data is in the form of large (up to 10's of GB) XML documents. I used sax.js to parse the source documents, which gives me JavaScript objects representative of the XML structure:
{ name: 'gml:featureMember',
attributes: {},
isSelfClosing: false,
parent: null,
children:
[ '\r\n ',
{ name: 'AX_BesondereFlurstuecksgrenze',
attributes: { 'gml:id': 'DEHHALKAn0007s8z' },
isSelfClosing: false,
children:
[ '\r\n ',
{ name: 'gml:identifier',
attributes: { codeSpace: 'http://...' },
isSelfClosing: false,
children: [ 'urn:adv:oid:...' ] },
'\r\n ',
{ name: 'lebenszeitintervall',
attributes: {},
isSelfClosing: false,
children:
[ '\r\n ',
{ name: 'AA_Lebenszeitintervall',
attributes: {},
isSelfClosing: false,
children:
[ '\r\n ',
{ name: 'beginnt',
attributes: {},
isSelfClosing: false,
children: [ '2010-03-07T08:32:05Z' ] },
'\r\n ' ] },
'\r\n ' ] },
...
However, sax.js apparently gives no access to the current fragment. So I am looking for a way to get an XML Fragment from sax.js or a different stream parser. As I am on Windows, I would like to use only modules that don't require compilation.