0

I'm trying to write a query which returns a node based on whether a specific string exists in the same parent-node.

An example of my XML, which has a root-element "books", to give you an idea of how it looks like.

Edit: Added a larger piece of the data. Omitted some child-elements of "bok" since they aren't relevant. This is in Norwegian, so in case you're wondering:

fagbøker = textbooks

bok = book

tittel = title

forfatter = author

fornavn, mellomnavn, etternavn = forename, middlename, lastname

forlag = publisher

<?xml version="1.0" encoding="UTF-8"?>

<?xml-model href="Fagb%C3%B8ker.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<fagbøker>
  <bok isbn="9780321197849">
    <tittel>An Introduction to Database Systems</tittel>
    <forfatter>
        <fornavn>Christopher</fornavn>
        <mellomnavn>J.</mellomnavn>
        <etternavn>Date</etternavn>
    </forfatter>
    <fagfelt>
        <felt>Databaser</felt> 
    </fagfelt>
    <forlag>Pearson</forlag>
</bok>

<bok isbn="9780321392794">
    <tittel>Data Structures in Java: From Abstract Data Types to the Java Collections
        Framework</tittel>
    <forfatter>
        <fornavn>Simon</fornavn>
        <etternavn>Gray</etternavn>
    </forfatter>
    <fagfelt>
        <felt>Programmering</felt>
    </fagfelt>
    <forlag>Pearson</forlag>

</bok>

<bok isbn="0321165810">
    <tittel>XQuery: The XML Query Language</tittel>
    <forfatter>
        <fornavn>Brundage</fornavn>
        <etternavn>Michael</etternavn>
    </forfatter>
    <fagfelt>
        <felt>XML</felt>
    </fagfelt>
    <forlag>Addison-Wesley Professional</forlag>
</bok>

<bok isbn="0201730472">
    <tittel>Discrete Mathematics for Computing</tittel>
    <forfatter>
        <fornavn>Rod</fornavn>
        <etternavn>Haggarty</etternavn>
    </forfatter>
    <fagfelt>
        <felt>Matematikk</felt>
    </fagfelt>
    <forlag>Addison-Wesley Professional</forlag>
</bok>

<bok isbn="0321417461">
    <tittel>Prolog Programming for Artificial Intelligence</tittel>
    <forfatter>
        <fornavn>Ivan</fornavn>
        <etternavn>Bratko</etternavn>
    </forfatter>
    <fagfelt>
        <felt>Kunstig intelligens</felt>
    </fagfelt>
    <forlag>Pearson Education Canada</forlag>
</bok>

<bok isbn="3540126899">
    <tittel>A Programming Logic</tittel>
    <forfatter>
        <fornavn>Robert</fornavn>
        <mellomnavn>L.</mellomnavn>
        <etternavn>Constable</etternavn>
    </forfatter>
    <fagfelt>
        <felt>Programmering</felt>
    </fagfelt>
    <forlag>Springer-Verlag London LTD</forlag>
 </bok>
</fagbøker>

The query is supposed to find publishers who haven't published a book in the field "database". I have several other books in my XML which are not about databases, as well publishers who haven't published anything in the field.

 for $publisher in distinct-values(
 let $doc := doc('../Fagboker.xml')
 where every $field in $doc
    satisfies $field//fieldname != 'Database'
 return
    $doc//publisher)

 return
    <publisher>{$publisher}</publisher>

This returns every publisher I have. The publisher in the XML above is the only publisher I do not want in the result. The publisher in the above XML has published several other books, but only one about databases, and thus it is not supposed to come up in the result.

Anyone got an idea of how I could solve this?

2 Answers 2

1

The problem is that if any single publisher matches, you are returning every publisher in the database here:

...
return
  $doc//publisher)

Instead, you can iterate over each distinct publisher name, then filter by only those publishers whose fields do not contain Database:

let $doc := doc('../Fagboker.xml')
for $publisher in distinct-values($doc//forlag)
where (
  every $field in $doc//bok[forlag = $publisher]//fagfelt
  satisfies not($field//felt = 'Databaser')
  ) 
return <publisher>{$publisher}</publisher>

Returns:

<publisher>Addison-Wesley Professional</publisher>
<publisher>Pearson Education Canada</publisher>
<publisher>Programmering</publisher>

One tricky thing to note about != is that it will return true if any item on either side does not equal any other item. So 1 != (1, 2) is true because 1 != 2. If you want to return true only if no items on either side are equal, then use = and negate it. Otherwise you will get false positives whenever a publisher contains a Database field and another field.

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you. This query returns the exact same XML as mine, though. Beginning to think that there's something wrong with my XML, just can't see what it could be.
You'll need to share a larger set of test data. It's impossible to say otherwise
@Sebastian When I update the test data and query with your changes, it appears to return the output you are asking for.
Thank you! I had a tiny little difference in the let-expression, for some reason.. I'm completely new to XML/Xquery, so these little things get past me sometimes.
1

The query is supposed to find publishers who haven't published a book in the field "database".

This is trivial and straight-forward in XPath alone. Maybe you are thinking too complicated.

$doc//publisher[not(../field = 'Database')]

And that can be made unique and sorted easily in a second step.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.