9

I am having some grief with an XML feed that I am being sent. I know it is invalid, but the development cycle of the sending program is such that it is not worth waiting for them to be able to correct the error. So I am looking for a work around for it, some way to get PHP to let me read the XML and merge/drop the invalid attribute entries while keeping all the others.

The fault is that I have duplicate attributes on an XML node. I have been using simpleXML to read the files and process them into a useful values, but this line just breaks the system outright. The offending XML looks like this

<dCategory dec="1102" dup="45" dup="4576" loc="274" mov="31493" prf="23469" unq="240031" xxx="7861" />

What I would really like is the PHP equivalent of C#'s .MoveToNextAttribute() on the XML reader. I can't seem to find anything that doesn't just blow up when presented with the duplicate attribute.

Anyone help out on this?

The answers linked to address errors in characters within the XML itself. e.g. & not appearing as &. The problem here is that the structure of the XML is broken, not the content. The answer in that thread returns

 parser error : Attribute attr1 redefined

when presented with the XML

<open-1 attr1="atr1" attr1="atr1">Text</open-1>

Which is what I am trying to parse.

5
  • Do you have XMLReader installed? Commented Jan 19, 2016 at 15:56
  • Yes I have, although surely it is going to need valid XML as well? Commented Jan 20, 2016 at 9:16
  • 2
    Possible duplicate of PHP - Processing Invalid XML Commented Jan 20, 2016 at 9:35
  • @Khainestar any feedback yet? Please let us know. Commented Jan 20, 2016 at 12:55
  • The other thread is invalid characters in the feed itself. Mine is about the feed structure. So I wouldn't mark it as a duplicate. I think I am going to have to abandon the XML parsers and write my own. Then I can ignore the correct rules for XML and parse the output given. Which is never ideal. The best I have from the built in parsers, returns the XML upto the error. Unfortunately the error is in the second node in some instances. Commented Jan 20, 2016 at 14:07

1 Answer 1

1

You could use tidy to clean up your input :

<?php

$buffer = '<?xml version="1.0" encoding="UTF-8"?><open-1 attr1="atr1" attr1="atr1">Text</open-1>';

$config = [
 'indent' => true,
 'output-xml' => true,
 'input-xml' => true,
];

$tidy = tidy_parse_string($buffer, $config, 'UTF8');
$tidy->cleanRepair();
echo $tidy;

Will output :

 <?xml version="1.0" encoding="utf-8"?>
 <open-1 attr1="atr1">Text</open-1>
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.