1

The feed in question is: http://api.inoads.com/snowstorm/feed.xml

Here is the PHP code I am using for the generation:

<?php

$database =  'xxxx';
$dbconnect = mysql_pconnect('xxxx', 'xxxx', 'xxxx');
mysql_select_db($database, $dbconnect);

$query = "SELECT * FROM the_queue WHERE id LIKE '%'    ORDER BY id DESC LIMIT 25";
$result = mysql_query($query, $dbconnect);

while ($line = mysql_fetch_assoc($result))
        {
            $return[] = $line;
        }

$now = date("D, d M Y H:i:s T");

$output = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>
            <rss version=\"2.0\">
                <channel>
                    <title>The Queue</title>
                    <link>http://readapp.net</link>
                    <description>A curated reading list.</description>
                    <language>en-us</language>
                    <pubDate>$now</pubDate>
                    <lastBuildDate>$now</lastBuildDate>
            ";

foreach ($return as $line)
{
    $output .= "<item><title>".htmlspecialchars($line['title'])."</title>
    <description>".htmlspecialchars($line['description'])."</description>
                    <link>".htmlspecialchars($line['link'])."</link>
                    <pubDate>".htmlspecialchars($line['pubDate'])."</pubDate>
                </item>";
}
$output .= "</channel></rss>";

$fh = fopen('feed.xml', 'w');
fwrite($fh, $output);
?>

What might be causing the error?

Here's a link from the feed validator: http://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fapi.inoads.com%2Fsnowstorm%2Ffeed.xml

9
  • Welcome to Stack Overflow! You are not doing any error checking in your queries. You need to do that after a mysql_query() call. Otherwise, your script will break if the query fails. How to do this is outlined in the manual on mysql_query() or in this reference question. Commented Dec 10, 2011 at 9:24
  • What string encoding is your data in? You need to specify that in the <?xml> tag. For example <?xml version="1.0" encoding="..."?> Commented Dec 10, 2011 at 9:26
  • @AbhiBeckert UTF-8 - I've revised the post above to reflect this Commented Dec 10, 2011 at 9:41
  • @deceze There are issues with quotes and question marks - I've updated the post to show this. Commented Dec 10, 2011 at 9:43
  • The mysql extension is outdated and on its way to deprecation. New code should use mysqli or PDO, both of which have important advantages, such as support for prepared statements. Commented Dec 10, 2011 at 9:46

5 Answers 5

3

You said the XML file is UTF-8, but when I download it and open it in my text editor it auto-detects the windows latin1 encoding, and the quotes display perfectly.

If I force my text editor to use UTF-8, it shows an error message because there are illegal characters for the UTF-8 encoding.

Therefore, your data is not UTF-8, it is latin1. You need to find out exactly where that's happening. It could be any one, or several of:

is the HTML page where the content is typed in by the user set to UTF-8?

If not, the browser will be sending latin1 quotes. To fix this, the first tag in your <head> needs to be:

<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  ...
</head>

is every browser correctly respecting your UTF-8 setting in that page's HTML?

If you specify UTF-8 and the page contains characters illegal in that encoding, some browsers might decide to use a different encoding despite the <meta> tag. How to check this is different in every browser.

is the MySQL connection when inserting into the database set to use UTF-8?

You need to be using UTF-8 here, or else MySQL may try to convert the encoding for you, often corrupting them. Set the encoding with:

$database =  'xxxx';
$dbconnect = mysql_pconnect('xxxx', 'xxxx', 'xxxx');
mysql_select_db($database, $dbconnect);
mysql_query('SET NAMES utf8', $dbconnect);

is the MySQL table (and individual column) set to use UTF-8?

Again, to avoid MySQL doing it's own buggy conversion, you need to make sure it's using UTF-8 for the table and also the individual comment. Do a structure dump of the database and check for:

CREATE TABLE `the_queue` (
  ...
) ... DEFAULT CHARSET=utf8;

And also make sure there isn't something like this on any of the columns:

`description` varchar(255) CHARACTER SET latin1,

is the MySQL connection when reading the database set to use UTF-8?

Your read connection also needs to be utf8. So double check that.

are you doing anything in the PHP that cannot handle UTF-8?

PHP has some functions which cannot be used on utf-8 strings, as it will corrupt them. One of those functions is htmlentities() so make sure you always use htmlspecialchars(). The easiest way to test this is to start commenting out big chunks of your code to see where the encoding is breaking.

Sign up to request clarification or add additional context in comments.

Comments

1

There is one problem here:

$output = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>
...

There is a string containing "?>". This is the finalization marker for php. It will give you an error.

You can avoid these problems this way:

$output = "<?xml version=\"1.0\" encoding=\"UTF-8\"?".">
...

1 Comment

The PHP parser is perfectly capable of handling PHP close tags embedded in strings, whether single-quoted, double-quoted, nowdoc or heredoc.
1

The point of htmlentities is to replace all characters that have define HTML character entities with those entities. If you really don't want any character entities (as your desired result suggests), don't use htmlentities.

By default, htmlentities uses the latin-1 charset, so it chokes on the smart quotes (indeed, all multibyte characters), which is where you see the question marks. One fix is to use htmlspecialchars to convert a much more limited set of characters (&, <, >, ' and "). This will still convert the double quotes because, well, that's the point of htmlspecialchars, unless you specify the ENT_NOQUOTES as the second argument. Another fix is to specify the character set as the third argument (this isn't exclusive of using htmlspecialchars).

The fourth argument to either specifies whether or not to encode already encoded characters. Whether or not do double-encode depends on the source data.

$line['description'] = '"Dave, stop. Stop, will you? Stop, Dave. Will you stop, Dave?” ... “Dave, my mind is going,” HAL says, forlornly. “I can feel it. I can feel it.”';

echo "<description>" . htmlspecialchars($line['description'], ENT_NOQUOTES, 'UTF-8', false) . "</description>";

See also:

8 Comments

I have tried this, but I'm getting the following error: This feed does not validate. 'utf8' codec can't decode byte 0x94 in position 606: unexpected code byte (maybe a high-bit character?)
I receive the following warning: expects parameter 2 to be long
What would be the reason if this code resulted in only this particular item's description in the feed being empty?
The error message about decoding the byte stream might be a bug in the validator. Note if you paste the feed document into the "Validate by direct input" form, no invalid character error is generated. Alternatively, use htmlspecialchars to replace the smart quotes with named character entities (ENT_NOQUOTES only applies to plain single and double quotes). However, some feed readers are reported to have problems with named character entities; it might be better to ignore the validator error message.
Now we get to the real issue. This question suffers from the XY problem. What you should do is create a minimal test case. Forget the code that generates the feed and create the smallest possible, static RSS file that causes your app to crash. Then create the smallest possible app that crashes on the static RSS feed. With that focused example, you might see the actual reason why it's crashing. If not, create a new question asking why the parser crashes on your feed, using your minimal sample.
|
0

Problem is that you are holding this string with quotes in database (as I assume). If it is true, PHP is removing quotes (which is proper), because of not causing bugs (SQL injection ex). So you have to remove quotes in DB and while generating XML file just add them. It is the simplest in my opinion. And try avoid double quotes ". You should use single ones '. In double PHP parser additionally checks what is in. So try to remove qoutes from DB and add them while generating XML. Should help.

1 Comment

No, the replacement of "Dave with &quot;Dave is just a matter of what way escaping is done for the XML, and the two of them are equivalent. The replacement of Dave?” with Dave?? is more likely an encoding matter, and since isn't treated specially by either PHP or SQL, not a matter of any injection avoidance.
0

Another error that you have it´s the format of the date. The date must be in format RFC-822, it must be in a format like this "Wed, 02 Oct 2002 08:00:00 EST", not "July/August 2008".

1 Comment

The date was manually entered, but thanks for pointing it out.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.