3

If I have three sets of data, say:

<note><from>Me</from><to>someone</to><message>hello</message></note>

<note><from>Me</from><to></to><message>Need milk & eggs</message></note>

<note><from>Me</from><message>Need milk & eggs</message></note>

and I'm using simplexml is there a way to have simple xml check that there's an empty/absent tag automatically?

I would like the output to be:

FROM    TO     MESSAGE
Me    someone    hello
Me    NULL    Need milk & eggs
Me    NULL    Need milk & eggs

Right now I'm doing it manually and I quickly realised that it's going to take a very long time to do it for long xml files.

My current sample code:

$xml = simplexml_load_string($string);
if ($xml->from != "") {$out .= $xml->from."\t"} else {$out .= "NULL\t";}
//repeat for all children, checking by name

Sometimes the order is different as well, there might be a xml with:

<note><message>pick up cd</message><from>me</from></note>

so iterating through the children and checking by index count doesn't work.

The actual xml files I'm working with are thousands of lines each, so I obviously can't just code in every tag.

4
  • I would argue that your XML is not well-formed. Technically it should have minimized tags to represent empty fields. Commented Jun 3, 2011 at 14:37
  • and you mean absent rather than empty tags or missing tags.? Commented Jun 3, 2011 at 14:38
  • Could you clarify what your problem exactly is? Are you trying to validate, or to load XML and add in defaults (like the text "NULL") fields that aren't specified? Commented Jun 3, 2011 at 16:10
  • I have a lot of xml sheets, one of which is a "template" file that has every single field filled in (attributes as well). I have put all the node from the xml file into one line of text, tab delimited to easily exporte into a database. The problem is after the template file, all the other xml files may not have every piece of information (denoted by either absent tags or empty tags) and I want to be able to automatically detect those absent and empty tags and enter in a tab delimited NULL so that the column has a NULL in my db. Commented Jun 3, 2011 at 18:07

2 Answers 2

2

It sounds like you need a DTD (Document Type Definition), which will define the required format of the XML file, and specify which elements are required, optional, what they can contain, etc.

DTDs can be used to validate an XML file before you do any processing with it.

Unfortunately, PHP's simplexml library doesn't do anything with DTD, but the DomDocument library does, so you may want to use that instead.

I'll leave it as a separate excersise for you to research how to create a DTD file. If you need more help with that, I'd suggest asking it as a separate question.

Sign up to request clarification or add additional context in comments.

3 Comments

The xmls I'm working with have a DTD url attached, however looking at the DOM functions, is there a way to print out the missing/empty optinal tags? I think I'm left with the same problem where I don't know whether a tag is there or not without specifically querying the node name, I'll only know whether it's valid for that tag to not be there.
@inTide: I didn't realise you had a DTD for it already. If the DTD specifies that a tag is mandatory, then you'll get an error if the tag is missing. However, if it isn't mandatory in the DTD then there won't be an error, so if that's the case, it maybe won't help you. I guess you could define your own DTD for it with stricter rules, and validate against that, but I'm not sure if that's really the right approach.
DTDs are kind of nasty. You will almost certainly prefer RelaxNG for validation (which PHP also supports).
1

You could use the DOMDocument instead. I have created a quick demo that splits the <note> elements into an array using the XML tag names as keys. You could then iterate the resultant array to create your output.

I corrected the invalid XML by replacing the ampersand with the HTML entity equivalent (&amp;).

<?php
    libxml_use_internal_errors(true);
    $xml = <<<XML
<notes>
<note><from>Me</from><to>someone</to><message>hello</message></note>
<note><from>Me</from><to></to><message>Need milk &amp; eggs</message></note>
<note><from>Me</from><message>Need milk &amp; eggs</message></note>
<note><message>pick up cd</message><from>me</from></note>
</notes>
XML;

    function getNotes($nodelist) {
        $notes = array();

        foreach ($nodelist as $node) {
            $noteParts = array();

            foreach ($node->childNodes as $child) {
                $noteParts[$child->tagName] = $child->nodeValue;
            }

            $notes[] = $noteParts;
        }

        return $notes;
    }

    $dom = new DOMDocument();
    $dom->recover = true;
    $dom->loadXML($xml);
    $xpath = new DOMXPath($dom);
    $nodelist = $xpath->query("//note");
    $notes = getNotes($nodelist);

    print_r($notes);
?>

Edit: If you change to $noteParts = array(); to $noteParts = array('from' => null, 'to' => null, 'message' => null); then it will always create the full set of keys.

3 Comments

From the print_r the output array has missing note tags for the last 2 data sets instead of empty values with note keys, though DOM::Recover seems interesting.
Heh, that will teach me for testing it with only the first 2 notes :-) I have added a new array declaration to the answer above that fixes it. The DOM::Recover actually allowed the invalid XML (the &) to be parsed. I only updated your input XML because when I was writing the demo I forgot it and wondered why it was not working!
Hm... I think DOM is definitely the way to go from both of these answers, gonna play around with it with larger data sets.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.