1

I am new to XML and XSLT, I want to filter some information from an XML file. Based on match on some tag values in the XML file. The solution I have works when the XML file only contains 1 or 2 Person tag information. But when working with a bigger xml file having more Person information. It fails and only the last person is getting transformed as required.

This is my XML File as follows:

<People>
<Person>
    <required-tag1>some-information</required-tag1>
    <required-tag2>some-information</required-tag2>
    <tag3>not important info</tag3>
    <tag4>not important info</tag4>
    <first-name>Mike</first-name>
    <last-name>Hewitt</last-name>
    <licenses>
        <license>
            <number>938387</number>
            <state xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">TX</state>
            <field xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">Health</field>
        </license>
        <license>
            <number>938387</number>
            <state xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">IL</state>
            <field xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">Health</field>
        </license>
    </licenses>
    <appointments>
        <appointment-info>
            <code>5124</code>
            <number>14920329324</number>
            <licensed-states>
                <state>TX</state>
            </licensed-states>
        </appointment-info>
    </appointments>
</Person>
<Person>
    <required-tag1>some-information</required-tag1>
    <required-tag2>some-information</required-tag2>
    <tag3>not important info</tag3>
    <tag4>not important info</tag4>
    <first-name>John</first-name>
    <last-name>Jhonny</last-name>
    <licenses>
        <license>
            <number>1762539</number>
            <state xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">TX</state>
            <field xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">Health</field>
        </license>
        <license>
            <number>1762539</number>
            <state xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">NY</state>
            <field xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">Health</field>
        </license>
    </licenses>
    <appointments>
        <appointment-info>
            <code>5124</code>
            <number>14920329324</number>
            <licensed-states>
                <state>TX</state>
            </licensed-states>
        </appointment-info>
    </appointments>
</Person>
    <Person>
    <required-tag1>some-information</required-tag1>
    <required-tag2>some-information</required-tag2>
    <tag3>not important info</tag3>
    <tag4>not important info</tag4>
    <first-name>Danny</first-name>
    <last-name>Hewitt</last-name>
    <licenses>
        <license>
            <number>17294083</number>
            <state xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">IL</state>
            <field xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">Health</field>
        </license>
    </licenses>
    <appointments>
        <appointment-info>
            <code>5124</code>
            <number>14920329324</number>
            <licensed-states>
                <state>IL</state>
            </licensed-states>
        </appointment-info>
    </appointments>
</Person>
<Person>
    <required-tag1>some-information</required-tag1>
    <required-tag2>some-information</required-tag2>
    <tag3>not important info</tag3>
    <tag4>not important info</tag4>
    <first-name>Russel</first-name>
    <last-name>Jhonny</last-name>
    <licenses>
        <license>
            <number>840790</number>
            <state xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">TX</state>
            <field xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">Health</field>
        </license>
        <license>
            <number>840790</number>
            <state xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">NY</state>
            <field xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">Health</field>
        </license>
        <license>
            <number>840790</number>
            <state xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">CA</state>
            <field xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">Health</field>
        </license>
    </licenses>
    <appointments>
        <appointment-info>
            <code>5124</code>
            <number>14920329324</number>
            <licensed-states>
                <state>TX</state>
                <state>NY</state>
            </licensed-states>
        </appointment-info>
    </appointments>
</Person>
</People>

What I want to basically do is that, if a Person is licensed in a state for example TX. And has appointment information in that state for example TX, filter that from licenses. If that is the only license information then filter the person.

And the new xml should contain information of required tags. And only Licenses which didn't match with licenses in appointment licenses state and filter the person who matched all licenses.

This is what I am expecting as output:

<People>
<Person>
    <required-tag1>some-information</required-tag1>
    <required-tag2>some-information</required-tag2>
    <first-name>Mike</first-name>
    <last-name>Hewitt</last-name>
    <licenses>
        <license>
            <number>938387</number>
            <state xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">IL</state>
            <field xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">Health</field>
        </license>
    </licenses>
</Person>
<Person>
    <required-tag1>some-information</required-tag1>
    <required-tag2>some-information</required-tag2>
    <first-name>John</first-name>
    <last-name>Jhonny</last-name>
    <licenses>
        <license>
            <number>1762539</number>
            <state xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">NY</state>
            <field xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">Health</field>
        </license>
    </licenses>
</Person>
<Person>
    <required-tag1>some-information</required-tag1>
    <required-tag2>some-information</required-tag2>
    <first-name>John</first-name>
    <last-name>Jhonny</last-name>
    <licenses>
        <license>
            <number>840790</number>
            <state xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">CA</state>
            <field xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">Health</field>
        </license>
    </licenses>
</Person>
</People>

The Third person who matched with all the licenses for that state is filtered. Currently I am just using one state in the example, but if there are multiple states it should be able to filter that information.

How to write an XSLT to filter this information. I am using XSLT Version 1.0

Currently I am able to apply this XSLT to get the required tags for transformation. But I don't know how to filter for Licenses States, it works on a smaller file, but fails when I am working on a much more bigger file. I will really appreciate it if someone can guide me as I am not understanding what is going wrong and where is it failing.

This is the XSLT I am using as follows:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>

<!--Identity transform (aka identity template). This will match
and copy attributes and nodes (element, text, comment and
processing-instruction) without changing them. Unless a more
specific template matches, everything will get handled by this
template.-->    
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<!--This template will match the "Person" element node. The "xsl:copy"
creates the new "Person" element. The "xsl:apply-templates" tells
the processor to apply templates to any attributes (of Person) or
elements listed in the "select". (Other elements will not be 
processed.) I used the union operator in the "select" so I wouldn't
have to write multiple "xsl:apply-templates".-->
<xsl:template match="Person">
    <xsl:copy>
        <xsl:apply-templates select="@*|first-name|last-name|
            required-tag1|required-tag2|licenses"/>
    </xsl:copy>
</xsl:template>

<!--This template will match any "license" element nodes that have a child 
"state" element whose value matches a "state" element node that is a 
child of "licensed-states". Since the "xsl:template" is empty, nothing 
is output or processed further.-->
<xsl:template match="license[state=//licensed-states/state]"/>

</xsl:stylesheet>

And this is what I am getting as output, which is wrong.

<?xml version="1.0" encoding="UTF-8"?>
<People>
<Person>
  <required-tag1>some-information</required-tag1>
  <required-tag2>some-information</required-tag2>
  <first-name>Mike</first-name>
  <last-name>Hewitt</last-name>
  <licenses/>
</Person>
<Person>
   <required-tag1>some-information</required-tag1>
  <required-tag2>some-information</required-tag2>
  <first-name>John</first-name>
  <last-name>Jhonny</last-name>
  <licenses/>
</Person>
<Person>
  <required-tag1>some-information</required-tag1>
  <required-tag2>some-information</required-tag2>
  <first-name>Danny</first-name>
  <last-name>Hewitt</last-name>
  <licenses/>
</Person>
<Person>
  <required-tag1>some-information</required-tag1>
  <required-tag2>some-information</required-tag2>
  <first-name>Russel</first-name>
  <last-name>Jhonny</last-name>
  <licenses>
     <license>
        <number>840790</number>
        <state xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">CA</state>
        <field xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">Health</field>
     </license>
  </licenses>
</Person>
</People>

I just don't know what is wrong, because when I delete the last two person information from the xml file and test it using the same XSLT it works perfect. And I don't know how to delete the information for the Person who matched all licenses.

5
  • 1
    possible duplicate of Apply XSLT on XML to filter information based on tag match in same file Commented Sep 28, 2014 at 19:44
  • @michael.hor257k Hello Sir, I am the one who had asked that question, I thought I should ask a new question as the solution fails, when working with bigger XML files. Will you be able to guide me anything why the solution fails? Commented Sep 28, 2014 at 19:47
  • What do you mean by bigger? Is it really the size of the file or does is contain other characteristics which are not in the small file causing the XSLT to fail. From my experience XSLT is VERY unlikely to fail due to size. Did you try to increase the size stepwise to find the "culprit"? Commented Sep 28, 2014 at 20:05
  • @MarcusRickert thanks for the prompt reply, by size I mean more information in the file. The above xml file contains 4 Person information, if I am working with 2 person information, the xslt works perfect. I have made this as a test file. But I really want to transform a 4000 person information file. Commented Sep 28, 2014 at 20:13
  • @MarcusRickert I have tried to increase the file information stepwise, but it fails on the working information. Like if I use only one person information works, two person information works, when I am working with 3 person information, the first person transformation is wrong, second person is as expected and third person is wrong, I am unable to find the culprit. Commented Sep 28, 2014 at 20:15

3 Answers 3

1

One obvious problem:

'state=//licensed-states/state' is going to examine all states in the document, not just the ones specific to this user. Rather than searching the entire document from root (which is what // at the front of the path does), give a relative path from this state to the area you want to examine. At the very least, you need to say that you're looking only within the same Person:

<xsl:template match="license[state=ancestor::Person//licensed-states/state]"/>

Faster performance would be to give the relative path more explicitly:

<xsl:template match="license[state=ancestor::Person/appointments/appointment-info/licensed-states/state]"/>

or, since you know the Person is two levels above the license,

<xsl:template match="license[state=../../appointments/appointment-info/licensed-states/state]"/>

where .. is a shorthand for parent::*.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for the prompt reply. I am new to XSLT and I didn't know that // will search the entire document. Using your recommendation for using the faster performance I am able to achieve what I am looking for, but I also get the person who have all the licenses matched Ex. <licenses /> how can I not include them? I want to only keep the Person who matched and are having non empty licenses tag. Thank you very much for helping me. I appreciate all the help.
0

If you want to look up items in another part of the XML, consider using an xsl:key to do this. In your case, you want to look up licensed states for a person. This requires a little bit more effort as you need to use a concatenated key, consisting of both a unique identifier for the Person and the state value

<xsl:key name="state" match="licensed-states/state" use="concat(generate-id(ancestor::Person), '|', .)" /

generate-id() is a function that generates a unique id for a node. (If there is some 'id' attribute or element for Person in the XML, you could use that instead).

Now, you want to exclude persons where all states have appointments. To do this, you will need to make it a double-negative, and exclude all persons which don't have a state that isn't in the appointments

<xsl:template match="Person[not(licenses/license[not(key('state', concat(generate-id(ancestor::Person), '|', state)))])]"/>

Excluding licences in a state is slightly simpler

<xsl:template match="license[key('state', concat(generate-id(ancestor::Person), '|', state))]"/>

Try this XSLT

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:key name="state" match="licensed-states/state" use="concat(generate-id(ancestor::Person), '|', .)" />

<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="Person[not(licenses/license[not(key('state', concat(generate-id(ancestor::Person), '|', state)))])]"/>

<xsl:template match="license[key('state', concat(generate-id(ancestor::Person), '|', state))]"/>

<xsl:template match="appointments" />

</xsl:stylesheet>

Also note how I have removed the xsl:apply-templates for specific tags, like firstname, but instead used <xsl:template match="appointments" /> to exclude appointments, so all child nodes of Person except appointments are copied.

Comments

0
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>

<!--Identity transform (aka identity template). This will match
and copy attributes and nodes (element, text, comment and
processing-instruction) without changing them. Unless a more
specific template matches, everything will get handled by this
template.-->    
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<!--This template will match the "Person" element node. The "xsl:copy"
creates the new "Person" element. The "xsl:apply-templates" tells
the processor to apply templates to any attributes (of Person) or
elements listed in the "select". (Other elements will not be 
processed.) I used the union operator in the "select" so I wouldn't
have to write multiple "xsl:apply-templates".-->
<xsl:template match="Person">
    <xsl:copy>
        <xsl:apply-templates select="@*|first-name|last-name|
            required-tag1|required-tag2|licenses"/>
    </xsl:copy>
</xsl:template>

<!--This template will match any "license" element nodes that have a child 
"state" element whose value matches a "state" element node that is a 
child of "licensed-states". 
This template will also match the "Person" element node if the number of
"state" elements that don't have a corresponding "licensed-state"
is equal to zero. ("filtered person who matched all licenses"
requirement.)
Since the "xsl:template" is empty, nothing 
is output or processed further.-->
<xsl:template match="license[state=../..//licensed-states/state]|
Person[count(licenses/license[not(state=../..//licensed-states/state)])=0]"/>

</xsl:stylesheet>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.