0

I'm scraping rss feed data from rss xml. Some of the strings have quotes in them. I'm running the strings through htmlentities() before i stick them in the database. Then when i try to display that same information in the browser, the quotes show up as "â??". The character is stored as "& acirc; ??s"(no spaces) in the database

the header of my page

<!DOCTYPE HTML>
<html>
<head>
    <meta charset="utf-8">

I'm sure other entities are not displaying correctly. How should i go about correcting this?

An example feed with the quotes around "Agawi": http://feeds.feedburner.com/TechCrunch/gaming

1
  • Your database should also use a utf-8 charset, also i think the proper tag is <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> Commented Aug 25, 2012 at 23:45

2 Answers 2

1

If you use PHP, this routine could be useful.

It adds to the standard get_html_translation_table() the codes of the characters usually M$ Word replaces into typed text.
Otherwise those characters would never be displayed correctly in html output, as your trouble.

function get_html_translation_table_CP1252() {
    $trans = get_html_translation_table(HTML_ENTITIES);
    $trans[chr(130)] = '&sbquo;';    // Single Low-9 Quotation Mark
    $trans[chr(131)] = '&fnof;';    // Latin Small Letter F With Hook
    $trans[chr(132)] = '&bdquo;';    // Double Low-9 Quotation Mark
    $trans[chr(133)] = '&hellip;';    // Horizontal Ellipsis
    $trans[chr(134)] = '&dagger;';    // Dagger
    $trans[chr(135)] = '&Dagger;';    // Double Dagger
    $trans[chr(136)] = '&circ;';    // Modifier Letter Circumflex Accent
    $trans[chr(137)] = '&permil;';    // Per Mille Sign
    $trans[chr(138)] = '&Scaron;';    // Latin Capital Letter S With Caron
    $trans[chr(139)] = '&lsaquo;';    // Single Left-Pointing Angle Quotation Mark
    $trans[chr(140)] = '&OElig;    ';    // Latin Capital Ligature OE
    $trans[chr(145)] = '&lsquo;';    // Left Single Quotation Mark
    $trans[chr(146)] = '&rsquo;';    // Right Single Quotation Mark
    $trans[chr(147)] = '&ldquo;';    // Left Double Quotation Mark
    $trans[chr(148)] = '&rdquo;';    // Right Double Quotation Mark
    $trans[chr(149)] = '&bull;';    // Bullet
    $trans[chr(150)] = '&ndash;';    // En Dash
    $trans[chr(151)] = '&mdash;';    // Em Dash
    $trans[chr(152)] = '&tilde;';    // Small Tilde
    $trans[chr(153)] = '&trade;';    // Trade Mark Sign
    $trans[chr(154)] = '&scaron;';    // Latin Small Letter S With Caron
    $trans[chr(155)] = '&rsaquo;';    // Single Right-Pointing Angle Quotation Mark
    $trans[chr(156)] = '&oelig;';    // Latin Small Ligature OE
    $trans[chr(159)] = '&Yuml;';    // Latin Capital Letter Y With Diaeresis
    ksort($trans);
    return $trans;
}

$trans = get_html_translation_table_CP1252();
$feed = strtr($feed, $trans);
Sign up to request clarification or add additional context in comments.

Comments

0

Yes, because those are not regular quotes , more like Microsoft Word quotes. You should take feedburner's example and transform them into &ldquo; and &rdquo; manually.

for example

$feed = str_replace('“', '&ldquo;', $feed);
$feed = str_replace('”', '&rdquo;', $feed);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.