Grouping values with same attributes using xslt-1.0

Question

Given this input XML:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<agrisResources xmlns:ags="http://purl.org/agmes/1.1/" xmlns:dc="http://purl.org/dc/elements/1.1/">
    <agrisResource bibliographicLevel="AM" ags:ARN="^aSF17^b00003">
        <dc:subject xml:lang="en">Penaeidae</dc:subject>
        <dc:subject xml:lang="en">Vibrio harveyi</dc:subject>
        <dc:subject xml:lang="en">Vibrio parahaemolyticus</dc:subject>
        <dc:subject>
            <ags:subjectClassification scheme="ags:ASC">ASFA-1</ags:subjectClassification>
            <ags:subjectThesaurus xml:lang="en" scheme="ags:ASFAT">Bacterial diseases</ags:subjectThesaurus>
            <ags:subjectThesaurus xml:lang="en" scheme="ags:ASFAT">Fish diseases</ags:subjectThesaurus>
            <ags:subjectThesaurus xml:lang="en" scheme="ags:ASFAT">Genes</ags:subjectThesaurus>
        </dc:subject>
    </agrisResource>
</agrisResources>

I would like to group items with the same attributes, so the output would be like this:

<dc:subject xml:lang="en">Penaeidae||Vibrio harveyi||Vibrio parahaemolyticus</dc:subject>
<dc:subject>
    <ags:subjectClassification scheme="ags:ASC">ASFA-1</ags:subjectClassification>
    <ags:subjectThesaurus xml:lang="en" scheme="ags:ASFAT">Bacterial diseases||Fish diseases||Genes</ags:subjectThesaurus>
</dc:subject>

Basically, my rule for the grouping is to combine the values of the nodes if that node have multiple values, eg dc:subject, and ags:subjectThesaurus. I specify in my title to group values with same attributes because I'm not really sure if it is possible to just group them by their tags without specifying their attributes to differentiate them.

In other words, differentiate

<dc:subject>Penaeidae</dc:subject>

from

<dc:subject>
    <ags:subjectThesaurus>Bacterial diseases</ags:subjectThesaurus>
</dc:subject>

UPDATE

INPUT XML

<?xml version="1.0" encoding="ISO-8859-1" ?>
<agrisResources xmlns:ags="http://purl.org/agmes/1.1/" xmlns:dc="http://purl.org/dc/elements/1.1/">
    <agrisResource bibliographicLevel="AM" ags:ARN="^aSF17^b00003">
        <dc:creator>
            <ags:creatorPersonal>Doe, John</ags:creatorPersonal>
            <ags:creatorPersonal>Smith, Jason T.</ags:creatorPersonal>
            <ags:creatorPersonal>Doe, Jane E.</ags:creatorPersonal>
        </dc:creator>
        <dc:subject xml:lang="en">Penaeidae</dc:subject>
        <dc:subject xml:lang="en">Vibrio harveyi</dc:subject>
        <dc:subject xml:lang="en">Vibrio parahaemolyticus</dc:subject>
        <dc:subject>
            <ags:subjectClassification scheme="ags:ASC">ASFA-1</ags:subjectClassification>
            <ags:subjectThesaurus xml:lang="en" scheme="ags:ASFAT">Bacterial diseases</ags:subjectThesaurus>
            <ags:subjectThesaurus xml:lang="en" scheme="ags:ASFAT">Fish diseases</ags:subjectThesaurus>
            <ags:subjectThesaurus xml:lang="en" scheme="ags:ASFAT">Genes</ags:subjectThesaurus>
        </dc:subject>
    </agrisResource>
</agrisResources>

Desired Output

Rules on grouping: Combine the values using double pipe || as separator for repeating elements, eg <ags:creatorPersonal>, <dc:subject xml:lang="en"> and <ags:subjectThesaurus xml:lang="en" scheme="ags:ASFAT">. Leave other elements as is that does not meet that rule.

<?xml version="1.0" encoding="ISO-8859-1" ?>
<agrisResources xmlns:ags="http://purl.org/agmes/1.1/" xmlns:dc="http://purl.org/dc/elements/1.1/">
    <agrisResource bibliographicLevel="AM" ags:ARN="^aSF17^b00003">
        <dc:creator>
            <ags:creatorPersonal>Doe, John||Smith, Jason T.||Doe, Jane E.</ags:creatorPersonal>
        </dc:creator>
        <dc:subject xml:lang="en">Penaeidae||Vibrio harveyi||Vibrio parahaemolyticus</dc:subject>
        <dc:subject>
            <ags:subjectClassification scheme="ags:ASC">ASFA-1</ags:subjectClassification>
            <ags:subjectThesaurus xml:lang="en" scheme="ags:ASFAT">Bacterial diseases||Fish diseases||Genes</ags:subjectThesaurus>
        </dc:subject>
    </agrisResource>
</agrisResources>

Below is my code based from this answer:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:dc="http://purl.org/dc/terms/"
            xmlns:ags="http://purl.org/agmes/1.1/"
            xmlns:agls="http://www.naa.gov.au/recordkeeping/gov_online/agls/1.2"
            xmlns:dcterms="http://purl.org/dc/terms/">
    <xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="ags:subjectThesaurus|dc:subject">
        <xsl:copy>
            <xsl:apply-templates select="@* | text()"/>
                <xsl:call-template name="NextSibling"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="ags:subjectThesaurus[@scheme = preceding-sibling::*[1][self::ags:subjectThesaurus]/@scheme]|dc:subject[@xml:lang = preceding-sibling::*[1][self::dc:subject]/@xml:lang]"/>

    <xsl:template match="ags:subjectThesaurus|dc:subject" mode="includeSib">
        <xsl:value-of select="concat('||', .)"/>
            <xsl:call-template name="NextSibling"/>
        </xsl:template>

    <xsl:template name="NextSibling">
        <xsl:apply-templates select="following-sibling::*[1][self::ags:subjectThesaurus and @scheme = current()/@scheme]|following-sibling::*[1][self::dc:subject and @xml:lang = current()/@xml:lang]" mode="includeSib"/>
    </xsl:template>
</xsl:stylesheet>

My only problem is that it is only transforming the ags:subjectThesaurus but not the dc:subject node. My output looks like this:

<dc:subject xml:lang="en">Penaeidae</dc:subject>
<dc:subject xml:lang="en">Vibrio harveyi</dc:subject>
<dc:subject xml:lang="en">Vibrio parahaemolyticus</dc:subject>
<dc:subject>
    <ags:subjectThesaurus xml:lang="en" scheme="ags:ASFAT">Bacterial diseases||Fish diseases||Genes</ags:subjectThesaurus>
</dc:subject>

How can I modify my code such that it will also group the dc:subject node with the same xml:lang attribute?

EDIT

Based on the suggestion of michael.hor257k and from this answer to use the Muenchian method, below is what I tried:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:dc="http://purl.org/dc/terms/"
            xmlns:ags="http://purl.org/agmes/1.1/"
            xmlns:agls="http://www.naa.gov.au/recordkeeping/gov_online/agls/1.2"
            xmlns:dcterms="http://purl.org/dc/terms/">
    <xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
    <xsl:strip-space elements="*"/>
    <xsl:key name="kNodeSubject" match="dc:subject[@xml:lang]" use="@xml:lang"/>
    <xsl:key name="subjectThesaurus" match="dc:subject/ags:subjectThesaurus" use="@scheme"/>
    <xsl:template match="node() | @*">
        <xsl:copy>
            <xsl:apply-templates select="node() | @*"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="dc:subject[generate-id() = generate-id(key('kNodeSubject', @xml:lang)[1])]">
        <xsl:copy>
            <xsl:apply-templates select="@*"/>
            <xsl:apply-templates select="key('kNodeSubject', @xml:lang)" mode="concat"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="dc:subject/ags:subjectThesaurus[generate-id() = generate-id(key('subjectThesaurus', @scheme)[1])]">
        <xsl:copy>
            <xsl:apply-templates select="@*"/>
            <xsl:apply-templates select="key('subjectThesaurus', @scheme)" mode="concat"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="dc:subject|subjectThesaurus" mode="concat">
        <xsl:value-of select="."/>
            <xsl:if test="position() != last()">
                <xsl:text>||</xsl:text>
            </xsl:if>
    </xsl:template>

    <xsl:template match="dc:subject"/>
    <xsl:template match="ags:subjectThesaurus"/>
</xsl:stylesheet>

When I applied the code above, the nodes ags:subjectThesaurus are gone and the values of <dc:subject xml:lang="en"> are not grouped either. I don't know if I have the match right, I used the match="dc:subject[@xml:lang]" for the <xsl:key name="kNodeSubject" because the node ags:subjectThesaurus is the child of <dc:subject>.

Thanks in advance.

I suggest you pick a better starting point: jenitennison.com/xslt/grouping/muenchian.html — michael.hor257k
– michael.hor257k, Commented Aug 14, 2017 at 4:13
@michael.hor257k, please see my updated post, I'm not sure what to use to match <dc:subject> and ags:subjectThesaurus. Thanks in advance. — euler
– euler, Commented Aug 17, 2017 at 0:58
Please post a minimal reproducible example. There's no ags:subjectThesaurus in the posted input. — michael.hor257k
– michael.hor257k, Commented Aug 17, 2017 at 4:00
@michael.hor257k, in my example input there are 3 ags:subjectThesaurus node eg <ags:subjectThesaurus xml:lang="en" scheme="ags:ASFAT">Bacterial diseases</ags:subjectThesaurus>. The other values of ags:subjectThesaurus are Fish diseases and Genes. — euler
– euler, Commented Aug 17, 2017 at 4:07
But there is no root element and the prefixes are not bound to a namespace. IOW, your code cannot be run as is. — michael.hor257k
– michael.hor257k, Commented Aug 17, 2017 at 4:28

michael.hor257k · Accepted Answer · 2017-08-17 08:12:00Z

1

Consider the following example:

XML

<root xmlns:dc="http://purl.org/dc/terms/" xmlns:ags="http://purl.org/agmes/1.1/">
  <dc:subject xml:lang="en">Penaeidae</dc:subject>
  <dc:subject xml:lang="en">Vibrio harveyi</dc:subject>
  <dc:subject xml:lang="fr">Franca premier</dc:subject>
  <dc:subject xml:lang="fr">Franca deux</dc:subject>
  <dc:subject xml:lang="en">Vibrio parahaemolyticus</dc:subject>
  <dc:subject>
    <ags:subjectThesaurus xml:lang="en" scheme="ags:ASFAT">Bacterial diseases</ags:subjectThesaurus>
    <ags:subjectThesaurus xml:lang="en" scheme="ags:ASFAT">Fish diseases</ags:subjectThesaurus>
    <ags:subjectThesaurus xml:lang="en" scheme="ags:ASFAT">Genes</ags:subjectThesaurus>
    <ags:subjectThesaurus xml:lang="en" scheme="ags:B">Bees</ags:subjectThesaurus>
    <ags:subjectThesaurus xml:lang="en" scheme="ags:B">Birds</ags:subjectThesaurus>
  </dc:subject>
</root>

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:dc="http://purl.org/dc/terms/"
xmlns:ags="http://purl.org/agmes/1.1/">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:key name="subj-by-lang" match="dc:subject[@xml:lang]" use="@xml:lang"/>
<xsl:key name="thes-by-scheme" match="ags:subjectThesaurus" use="@scheme"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="root">
    <xsl:copy>
        <!-- group subjects by lang -->
        <xsl:for-each select="dc:subject[@xml:lang][count(. | key('subj-by-lang', @xml:lang)[1]) = 1]">
             <dc:subject xml:lang="{@xml:lang}">
                <xsl:for-each select="key('subj-by-lang', @xml:lang)">
                    <xsl:value-of select="."/>
                    <xsl:if test="position() != last()">
                        <xsl:text>||</xsl:text>
                    </xsl:if>
                </xsl:for-each>
             </dc:subject>  
        </xsl:for-each>
        <!-- process other nodes -->
        <xsl:apply-templates select="node()[not(self::dc:subject[@xml:lang])]"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="dc:subject">
    <xsl:copy>
        <!-- group thesauri by scheme    -->
        <xsl:for-each select="ags:subjectThesaurus[count(. | key('thes-by-scheme', @scheme)[1]) = 1]">
             <dc:subjectThesaurus xml:lang="{@xml:lang}" scheme="{@scheme}">
                <xsl:for-each select="key('thes-by-scheme', @scheme)">
                    <xsl:value-of select="."/>
                    <xsl:if test="position() != last()">
                        <xsl:text>||</xsl:text>
                    </xsl:if>
                </xsl:for-each>
             </dc:subjectThesaurus> 
        </xsl:for-each>
    </xsl:copy>
</xsl:template>

</xsl:stylesheet>

Result

<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:dc="http://purl.org/dc/terms/" xmlns:ags="http://purl.org/agmes/1.1/">
  <dc:subject xml:lang="en">Penaeidae||Vibrio harveyi||Vibrio parahaemolyticus</dc:subject>
  <dc:subject xml:lang="fr">Franca premier||Franca deux</dc:subject>
  <dc:subject>
    <dc:subjectThesaurus xml:lang="en" scheme="ags:ASFAT">Bacterial diseases||Fish diseases||Genes</dc:subjectThesaurus>
    <dc:subjectThesaurus xml:lang="en" scheme="ags:B">Bees||Birds</dc:subjectThesaurus>
  </dc:subject>
</root>

Added:

Based on your clarifications, I suspect you want to do something much simpler: just join together some leaf nodes (i.e. nodes with no child elements) and leave the others as is.

Here's an example joining the dc:subject leaf nodes within agrisResource:

<xsl:template match="agrisResource">
    <xsl:copy>
        <!-- join subjects with no children -->
        <dc:subject>
            <!-- copy the attributes of the first subject with no children -->
            <xsl:copy-of select="dc:subject[not(*)][1]/@*"/>
            <!-- concat the values of all subjects with any attributes -->
            <xsl:for-each select="dc:subject[not(*)]">
                <xsl:value-of select="."/>
                <xsl:if test="position() != last()">
                    <xsl:text>||</xsl:text>
                </xsl:if>
            </xsl:for-each>
         </dc:subject>  
        <!-- process other nodes -->
        <xsl:apply-templates select="node()[not(self::dc:subject[not(*)])]"/>
    </xsl:copy>
</xsl:template>

This could be generalized by using a key based on an element's name.

edited Aug 17, 2017 at 8:12

answered Aug 17, 2017 at 4:55

michael.hor257k

118k6 gold badges36 silver badges55 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

michael.hor257k Over a year ago

P.S. Do note that keys work across the entire document. If you want to group nodes only within their parent element, then you must also include the parent's unique id in the key.

euler Over a year ago

I have updated my input example with root elements and prefixes bound to a namespace. Please note that I only use the xml:lang attribute because I am not sure if it is possible to differentiate dc:subject without child nodes. In my real xml input file, there are no instance where xml:lang is not equal to en. Thanks in advance.

euler Over a year ago

Unfortunately, when I apply your latest xsl code in my sample input xml, the output is this:

<agrisResource xmlns:ags="http://purl.org/agmes/1.1/" xmlns:dc="http://purl.org/dc/elements/1.1/"><dc:subject xmlns:dc="http://purl.org/dc/terms/" xmlns:agls="http://www.naa.gov.au/recordkeeping/gov_online/agls/1.2" xmlns:dcterms="http://purl.org/dc/terms/"/>PenaeidaeVibrio harveyiVibrio parahaemolyticusASFA-1Bacterial diseasesFish diseasesGenes</agrisResource>

.

euler Over a year ago

Great! By the way, you have this line to process other nodes: <xsl:apply-templates select="node()[not(self::dc:subject[not(*)])]"/>. How can I modify it such that it will NOT process other nodes eg dc:subjectThesaurus and dc:creator? Please see what I've tried to accomplish this: xsltransform.net/ehVYZNr/3. Many thanks again.

michael.hor257k Over a year ago

You can add more conditions to the predicate, e.g. not(self::a or self::b)]. Comments are not a convenient place to answer follow-up questions.

Collectives™ on Stack Overflow

Grouping values with same attributes using xslt-1.0

1 Answer 1

Added:

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Added:

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related