3

from a PHP script I'm downloading a RSS feed like:

$fp = fopen('http://news.google.es/news?cf=all&ned=es_ve&hl=es&output=rss','r') 
 or die('Error reading RSS data.'); 

The feed is an spanish news feed, after I downloaded the file I parsed all the info into one var that have only the content of the tag <description> of every <item>. Well, the issue is that when I echo the var all the information have an html enconding like:

echo($result); // this print: el ministerio pãºblico investigarã¡ la publicaciã³n en la primera pã¡gina

Well I can create a HUGE case instance that searchs for every char can change it for the correspongind one, like: ã¡ for Á and so and so, but there is no way to do this with a single function??? or even better, there is no way to download the content to $fp without the html encoding? Thanks!

Actual code:

<?php
$acumula="";
$insideitem = false; 
$tag = ''; 
$title = ''; 
$description = ''; 
$link = ''; 

function startElement($parser, $name, $attrs) { 
 global $insideitem, $tag, $title, $description, $link; 
 if ($insideitem) { 
  $tag = $name; 
 } elseif ($name == 'ITEM') { 
  $insideitem = true; 
 } 
} 




function endElement($parser, $name) { 
 global $insideitem, $tag, $title, $description, $link, $acumula; 
 if ($name == 'ITEM') { 
  $acumula = $acumula . (trim($title)) . "<br>" . (trim($description)); 
  $title = ''; 
  $description = ''; 
  $link = ''; 
  $insideitem = false; 
 } 
} 

function characterData($parser, $data) { 
 global $insideitem, $tag, $title, $description, $link; 
 if ($insideitem) { 
 switch ($tag) { 
  case 'TITLE': 
  $title .= $data; 
  break; 
  case 'DESCRIPTION': 
  $description .= $data; 
  break; 
  case 'LINK': 
  $link .= $data; 
  break; 
 } 
 } 
} 

$xml_parser = xml_parser_create(); 
xml_set_element_handler($xml_parser, 'startElement', 'endElement'); 
xml_set_character_data_handler($xml_parser, "characterData"); 
$fp = fopen('http://news.google.es/news?cf=all&ned=es_ve&hl=es&output=rss','r') 
or die('Error reading RSS data.'); 
while ($data = fread($fp, 4096)) { 
 xml_parse($xml_parser, $data, feof($fp)) 
  or die(sprintf('XML error: %s at line %d', 
 xml_error_string(xml_get_error_code($xml_parser)), 
 xml_get_current_line_number($xml_parser))); 
} 
//echo $acumula;
fclose($fp); 
xml_parser_free($xml_parser); 
echo($acumula); // THIS IS $RESULT!
?>
1
  • what you want exacly to do with this conversion? this is encoding string conversion? Commented Aug 15, 2010 at 14:10

2 Answers 2

3

EDIT

Since you're already using the XML parser, you're guaranteed the encoding is UTF-8.

If your page is encoded in ISO-8859-1, or even ASCII, you can do this to convert:

$result = mb_convert_encoding($result, "HTML-ENTITIES", "UTF-8");

Use a library that handles this for you, e.g. the DOM extension or SimpleXML. Example:

$d = new DOMDocument();
$d->load('http://news.google.es/news?cf=all&ned=es_ve&hl=es&output=rss');
//now all the data you get will be encoded in UTF-8

Example with SimpleXML:

$url = 'http://news.google.es/news?cf=all&ned=es_ve&hl=es&output=rss';
if ($sxml = simplexml_load_file($url)) {
    echo htmlspecialchars($sxml->channel->title); //UTF-8
}
Sign up to request clarification or add additional context in comments.

4 Comments

how can i make this new code compatible with my old one? Check the question, i edited.
@Dom The code you posted doesn't show how you obtain $result.
yes, i implemented with the code you posted but i still getting enconded chars, here is the code: maracay.dyndns.org/contar/code.txt and here is the page maracay.dyndns.org/contar/index.php
@Dom Works here partially with encoding ISO-8859-1: codepad.viper-7.com/0HCLc3 If you don't do the conversion, there are many more wrong cases (see this codepad.viper-7.com/CKIaPD). It seems the problem is that the RSS feed is itself corrupted (has mixed encodings). Not much you can do about that.
0

You can use DOMDocument from PHP to strip HTML encoding tags. And use encoding conversion functions also from PHP to change encoding of this sting.

1 Comment

@DomingoSL: You can use simplest php.net/manual/en/function.strip-tags.php to strip tags;-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.