So I'm trying to figure out a way to get the difference between two XML trees (examples below) but can't come up with anything. I need the outcome to be an array of differences, with each element in the array containing the node that was changed, how it was changed (added, deleted), and the path to the node.
Edit: Forgot to mention, the order of the XML needs to not matter. I tried using npm/dom-compare, but it doesn't quite give the desired result (with the examples below) because it doesn't expect to see the new tag (dir photos) but gives no information about it past that it found an unexpected tag.
1.
<dir name="rootDir">
<dir name="childDir">
<file name="hello.jpg"/>
</dir>
<file name="linux.txt"/>
<file name="img.png"/>
</dir>
2.
<dir name="rootDir">
<dir name="childDir">
<file name="hello.jpg"/>
<file name="interesting.brain"/>
</dir>
<dir name="photos">
<file name="me.dng"/>
</dir>
<file name="img.png"/>
</dir>
My XML sources will only ever contain and tags.
For example on the two XML docs above, compare(1, 2) should result in: (For my purposes there is no change 'changed', e.g if a files name is changed then it is a new file and the old one is treated as if it were removed not moved, and dirs are not included if their files change).
[
{node: '<file name="interesting.brain"/>', path: '/rootDir/childDir' change: 'added'},
{node: '<dir name="photos">', path: '/rootDir', change: 'added'}
{node: '<file name="linux.txt"/>', path: '/rootDir', change: 'deleted'}
]
My first thought was to first parse the XML strings into JS objects using fast-xml-parser, which results in the following objects:
1.
{ dir: [
{
name: 'rootDir',
dir: [
{
name: 'childDir',
file: [
{ name: 'hello.jpg' }
]
}
],
file: [
{ name: 'linux.txt' },
{ name: 'img.png' }
]
}
] }
2.
{ dir: [
{
name: 'rootDir',
dir: [
{
name: 'childDir',
file: [
{ name: 'hello.jpg' },
{ name: 'interesting.brain' }
]
},
{
name: 'photos',
file: [
{ name: 'me.dng' }
]
}
],
file: [
{ name: 'img.png' }
]
}
] }
However this results in extra complications because the resulting format uses arrays as well as objects, which at the very least increases the mental workload in figuring out how to diff both. It also is probably quite a bit slower since obviously you have to parse the XML string first, not to mention adding in a 3rd party library.
Looking for any advice or a pseudocode algorithm that I can use to solve this problem. Should note I'm using Typescript and targeting ES6 / Node.js.
Cheers.