2

First, let me say that I have enjoyed reading dozens of tips about merging multiple XML files. I've also enjoyed implementing a good number of them. But I still haven't achieved my goal.

I don't want to simply merge XML files so that one is repeated after another in the resulting XML file. I have groups with repeating elements that need to each be merged:

<SAN>
  <EQLHosts>
    <WindowsHosts>
      <WindowsHost>
        more data and structures down here...
      </WindowsHost>
    </WindowsHosts>
    <LinuxHosts>
      <LinuxHost>
        ...and here...
      </LinuxHost>
    </LinuxHosts>
  </EQLHosts>
</SAN>

Each of the individual XML files might have Windows and/or Linux hosts. So if XML file 1 has data for Windows host A, B and C, and XML file 2 has data for Windows hosts D, E and F, the resulting XML should look like:

<SAN>
  <EQLHosts>
    <WindowsHosts>
      <WindowsHost>
        <Name>A</Name>
      </WindowsHost>
      <WindowsHost>
        <Name>B</Name>
      </WindowsHost>
      <WindowsHost>
        <Name>C</Name>
      </WindowsHost>
      <WindowsHost>
        <Name>D</Name>
      </WindowsHost>
      <WindowsHost>
        <Name>E</Name>
      </WindowsHost>
      <WindowsHost>
        <Name>F</Name>
      </WindowsHost>
    </WindowsHosts>
    <LinuxHosts>
      <LinuxHost/>
    </LinuxHosts>
  </EQLHosts>
</SAN>

I have used this XSLT, among others, to get this to work:

<?xml version="1.0"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:variable name="file1" select="document('CorralData1.xml')"/>
  <xsl:variable name="file2" select="document('CorralData2.xml')"/>
  <xsl:variable name="file3" select="document('CorralData3.xml')"/>

  <xsl:template match="/">
    <SAN>
      <xsl:copy-of select="/SAN/*"/>
      <xsl:copy-of select="$file1/SAN/*"/>
      <xsl:copy-of select="$file2/SAN/*"/>
      <xsl:copy-of select="$file3/SAN/*"/>
    </SAN>
  </xsl:template>

</xsl:stylesheet>

This file produces a combined XSLT, with all data all the way down the tree included correctly, but with multiple instances of WindowsHosts. Don't want that.

Is there a way to tell XSLT how to do this with a minimum of syntax, or do I need to add each element and sub-element specifically in the XSLT file?


I should have checked. But I went ahead and used collection() and got a solution to work perfectly using the Saxon HE XSLT processor.

But I'm running in an InfoPath environment, and there's only an XSLT 1.0 processor. Does anyone have a recommendation for replacing the collection() command in an XSLT 1.0 environment? Can I go back to using document() in some way?


So I now have this file...

<?xml version="1.0" encoding="windows-1252"?>

<files>
    <file name="CorralData1.xml"/>
    <file name="CorralData2.xml"/>
</files>

...which I use with a stylesheet containing...

<xsl:variable name="windowsHosts" select="/SAN/WindowsHosts/WindowsHost"/>
<xsl:variable name="vmwareHosts" select="/SAN/VMwareHosts/VMwareHost"/>
<xsl:variable name="linuxHosts" select="/SAN/LinuxHosts/LinuxHost"/>

<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="/">
    <xsl:for-each select="/files/file">
        <xsl:apply-templates select="document(@name)/SAN"/>
    </xsl:for-each>
    <SAN>
        <EQLHosts>
            <WindowsHosts>
                <xsl:for-each select="$windowsHosts">
                    <xsl:copy-of select="."/>
                </xsl:for-each>
            </WindowsHosts>
            <VMwareHosts>
                <xsl:for-each select="$vmwareHosts">
                    <xsl:copy-of select="."/>
                </xsl:for-each>                 
            </VMwareHosts>
            <LinuxHosts>
                <xsl:for-each select="$linuxHosts">
                    <xsl:copy-of select="."/>
                </xsl:for-each>                 
            </LinuxHosts>
        </EQLHosts>
    </SAN>
</xsl:template>

...but this gets me multiple /SAN roots. I'm close but something's still a little off.

2 Answers 2

3

What I would do is use distinct-values() to get each unique host name. You could also use collection() to make it a little easier. (Usage may differ depending on the implementation. I used Saxon 9.4.)

Example...

Input files in the directory "input_dir"...

CorralData1.xml

<SAN>
    <EQLHosts>
        <WindowsHosts>
            <WindowsHost>
                <Name>Windows-A</Name>
            </WindowsHost>
            <WindowsHost>
                <Name>Windows-B</Name>
            </WindowsHost>
        </WindowsHosts>
        <LinuxHosts>
            <LinuxHost>
                <Name>Linux-A</Name>
            </LinuxHost>
            <LinuxHost>
                <Name>Linux-B</Name>
            </LinuxHost>
        </LinuxHosts>
    </EQLHosts>
</SAN>

CorralData2.xml (Windows-A and Windows-B are repeated)

<SAN>
    <EQLHosts>
        <WindowsHosts>
            <WindowsHost>
                <Name>Windows-C</Name>
            </WindowsHost>
            <WindowsHost>
                <Name>Windows-D</Name>
            </WindowsHost>
            <WindowsHost>
                <Name>Windows-A</Name>
            </WindowsHost>
            <WindowsHost>
                <Name>Windows-B</Name>
            </WindowsHost>
        </WindowsHosts>
        <LinuxHosts>
            <LinuxHost>
                <Name>Linux-C</Name>
            </LinuxHost>
            <LinuxHost>
                <Name>Linux-D</Name>
            </LinuxHost>
        </LinuxHosts>
    </EQLHosts>
</SAN>

CorralData3.xml (Windows-A and Windows-B are repeated)

<SAN>
    <EQLHosts>
        <WindowsHosts>
            <WindowsHost>
                <Name>Windows-E</Name>
            </WindowsHost>
            <WindowsHost>
                <Name>Windows-F</Name>
            </WindowsHost>
            <WindowsHost>
                <Name>Windows-A</Name>
            </WindowsHost>
            <WindowsHost>
                <Name>Windows-B</Name>
            </WindowsHost>          
        </WindowsHosts>
        <LinuxHosts>
            <LinuxHost>
                <Name>Linux-E</Name>
            </LinuxHost>
            <LinuxHost>
                <Name>Linux-F</Name>
            </LinuxHost>
        </LinuxHosts>
    </EQLHosts>
</SAN>

XSLT 2.0

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:variable name="collection">
        <xsl:copy-of select="collection('input_dir?strip-space=yes;select=*.xml')/*"/>
    </xsl:variable>
    <xsl:variable name="windowsHosts" select="distinct-values($collection/SAN/EQLHosts/WindowsHosts/WindowsHost/Name)"/>
    <xsl:variable name="linuxHosts" select="distinct-values($collection/SAN/EQLHosts/LinuxHosts/LinuxHost/Name)"/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="/">
        <SAN>
            <EQLHosts>
                <WindowsHosts>
                    <xsl:for-each select="$windowsHosts">
                        <xsl:apply-templates select="($collection/SAN/EQLHosts/WindowsHosts/WindowsHost[Name=current()])[1]"/>
                    </xsl:for-each>
                </WindowsHosts>
                <LinuxHosts>
                    <xsl:for-each select="$linuxHosts">
                        <xsl:apply-templates select="($collection/SAN/EQLHosts/LinuxHosts/LinuxHost[Name=current()])[1]"/>
                    </xsl:for-each>                 
                </LinuxHosts>
            </EQLHosts>
        </SAN>
    </xsl:template>

</xsl:stylesheet>

Output

<SAN>
    <EQLHosts>
        <WindowsHosts>
            <WindowsHost>
                <Name>Windows-A</Name>
            </WindowsHost>
            <WindowsHost>
                <Name>Windows-B</Name>
            </WindowsHost>
            <WindowsHost>
                <Name>Windows-C</Name>
            </WindowsHost>
            <WindowsHost>
                <Name>Windows-D</Name>
            </WindowsHost>
            <WindowsHost>
                <Name>Windows-E</Name>
            </WindowsHost>
            <WindowsHost>
                <Name>Windows-F</Name>
            </WindowsHost>
        </WindowsHosts>
        <LinuxHosts>
            <LinuxHost>
                <Name>Linux-A</Name>
            </LinuxHost>
            <LinuxHost>
                <Name>Linux-B</Name>
            </LinuxHost>
            <LinuxHost>
                <Name>Linux-C</Name>
            </LinuxHost>
            <LinuxHost>
                <Name>Linux-D</Name>
            </LinuxHost>
            <LinuxHost>
                <Name>Linux-E</Name>
            </LinuxHost>
            <LinuxHost>
                <Name>Linux-F</Name>
            </LinuxHost>
        </LinuxHosts>
    </EQLHosts>
</SAN>
Sign up to request clarification or add additional context in comments.

3 Comments

Why is the variable not simply set up as <xsl:variable name="collection" select="collection('input_dir?strip-space=yes;select=*.xml')/*"/>? Doing a copy-of seems unnecessary and inefficient.
@MartinHonnen - When I did the select originally, I was having trouble with the xpaths on the variable. I was in a rush so I used the copy-of which ends up creating a result tree fragment which made my xpaths work. It should be refactored, but I wasn't too worried about it.
Unlike XSLT 2.0 with collection(), which gave me the correct results, I can't get XSLT 1.0 to quite get where I want. I've added new information to my original question. Any help would be appreciated.
1

I used two XSLT files for this operation. The first simply appends all the files:

<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="/">
    <SAN>
        <xsl:apply-templates select="document('MainDataSource.xml')/SAN/*"/>
        <xsl:apply-templates select="document('CorralData1.xml')/SAN/*"/>
        <xsl:apply-templates select="document('CorralData2.xml')/SAN/*"/>
    </SAN>
</xsl:template>

and the second merges the data by group:

<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="*">
    <SAN>
        <ClientProfile>
        </ClientProfile>
        <STACKMEMBERS>
            <xsl:for-each select="/SAN/STACKMEMBERS/STACKMEMBER">
                <xsl:copy-of select="."/>
            </xsl:for-each>
        </STACKMEMBERS>
        <Force10StackMembers>
            <xsl:for-each select="/SAN/Force10StackMembers/Force10StackMember">
                <xsl:copy-of select="."/>
            </xsl:for-each>
        </Force10StackMembers>
    </SAN>
</xsl:template>

2 Comments

I have a novice addendum question. How can you make the document name a variable, to gather all of the *.xml files within a directory?
@LOlliffe If you have an XML file that lists all the files within a directory, you can xsl:for-each over that (having stored the previous select="." in a variable outside xsl:for-each's scope). Otherwise, you're going to have to use something else since XSLT doesn't have file IO built into it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.