4

I am processing an external xml document using the method described here ( How to use XMLReader in PHP? ), but I'm coming across this error:

...parser error : Entity 'Atilde' not defined in...

and similar, such as

cent, acirc, not

The error occurs on the $z->expand() function. If I comment that out, it occurs on the $z->next() function.

I know the problem field and have tried to edit it with base64_encode before expanding, but it's readonly.

EDIT: the problem string is:

...ââ¬Â...

end edit

Thank you for any help given.

2
  • you know the error are caused by HTML entities, then you should no user the XML parser. Try the DOMDocument instead (like the question you have included) Commented Sep 1, 2011 at 15:54
  • It's a large xml document, so I can't have all the memory used up Commented Sep 1, 2011 at 15:59

4 Answers 4

2

XML does only know the entities lt, gt, amp, apos, and quot. So any other entity reference will raise an error. (Note that character references and entity references are not the same.)

You can use strtr to convert any HTML entity reference that is not also known in XML:

$trans = array_map('utf8_encode', array_flip(array_diff(get_html_translation_table(HTML_ENTITIES), get_html_translation_table(HTML_SPECIALCHARS))));
$output = strtr($input, $trans);

get_html_translation_table returns an array for the mapping of character onto entity reference. get_html_translation_table(HTML_ENTITIES) returns a mapping for all entities while get_html_translation_table(HTML_SPECIALCHARS) returns only those mentioned above. array_diff will give the difference, so all entities without those mentioned above. array_flip inverts the key/value association and applying array_map with utf8_encode will convert the values from ISO 8859-1 to UTF-8.

Sign up to request clarification or add additional context in comments.

3 Comments

What should I use as the input please? I tried it with $z->expand() as the input and got the error: "Catchable fatal error: Object of class DOMElement could not be converted to string in..."
Oh, wait a second. Does setting $z->setProperty(XMLReader::SUBST_ENTITIES, true); before $z->open(…) work?
It came up with an "undefined" error, but I found setParserProperty, which I guess you meant. Unfortunately it didn't work. Thank you anyway, I appreciate your time and effort. I guess I could always tell the xml feed supplier to fix it, but they'll just ignore me.
1

Maybe xml_set_external_entity_ref_handler is the solution for your case:

http://php.net/manual/en/example.xml-external-entity.php

http://www.php.net/manual/en/function.xml-set-external-entity-ref-handler.php

1 Comment

I can't get it to work with XMLReader. Do you have an example.
0

Encountered the same problem..

My solution was opening the XML file in notepad++, search and replaced the characters to readable ones.

Not a beautiful solution but it works;)

Comments

0

This is a flaw in the original XML but it's not uncommon. I didn't have much luck with the solutions here (other than Wout van der Vegt's), so here's the "make a new XML that is fixed" approach:

// Needs PHP 5.4.0+

$file = "xmldata_with_entities.xml";
$file2 = "xmldata_converted.xml";

$handle1 = fopen($file, "r");
$handle2 = fopen($file2, "w");
if ($handle1) {
    while (($line = fgets($handle1)) !== false) {
        fwrite($handle2, html_entity_decode($line,ENT_HTML5));
    }
}
fclose($handle1);
fclose($handle2);

Obviously you could then use $file2 in XMLReader.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.