0

I have a string as below

<p>&nbsp;Hello World, this is StackOverflow&#39;s question details page</p>

I want to extract text from above HTML as Hello World, this is StackOverflow's question details page notice that I want to remove the &nbsp; as well.

How we can achieve this in PHP, I tried few functions, strip_tags, html_entity_decode etc, but all are failing in some conditions.

Please help, Thanks!

Edited my code which I am trying is as below, but its not working :( It leaves the &nbsp; and &#39; this type of characters.

$TMP_DESCR = trim(strip_tags($rs['description']));
5
  • What conditions, don't leave us guessing!? Commented Feb 2, 2011 at 11:45
  • as @jakenoble says would help if you posted your sample code & output & errors. Commented Feb 2, 2011 at 11:46
  • If the shown string is part of a full HTML page or a larger snippet containing additional markup, please see Best Methods to parse HTML Commented Feb 2, 2011 at 11:47
  • guys added my code, please check! Commented Feb 2, 2011 at 11:53
  • @Gordon its not a big html, I just want to do it with simple methods :( Commented Feb 2, 2011 at 11:54

4 Answers 4

1

Below worked for me...had to do a str_replace on the non-breaking space though.

$string = "<p>&nbsp;Hello World, this is StackOverflow&#39;s question details page</p>";
echo htmlspecialchars_decode(trim(strip_tags(str_replace('&nbsp;', '', $string))), ENT_QUOTES);
Sign up to request clarification or add additional context in comments.

1 Comment

yes, that's working for me as well. If there is no solution for &nbsp; then its fine, we can go with replace. Thanks for the help!
0

strip_tags() will get rid of the tags, and trim() should get rid of the whitespace. I'm not sure if it will work with non-breaking spaces though.

Comments

0

First, you'll have to call trim() on the HTML to remove the white space. http://php.net/manual/en/function.trim.php

Then strip_tags, then html_entity_decode.

So: html_entity_decode(strip_tags(trim(html)));

Comments

0

Probably the nicest and most reliable way to do this is with genuine (X|HT)ML parsing functions like the DOMDocument class:

<?php

$str = "<p>&nbsp;Hello World, this is StackOverflow&#39;s question details page</p>";

$dom = new DOMDocument;
$dom->loadXML(str_replace('&nbsp;', ' ', $str));

echo trim($dom->firstChild->nodeValue);
// "Hello World, this is StackOverflow's question details pages"

This is probably slight overkill for this problem, but using the proper parsing functionality is a good habit to get into.


Edit: You can reuse the DOMDocument object, so you only need two lines within the loop:

$dom = new DOMDocument;
while ($rs = mysql_fetch_assoc($result)) { // or whatever
    $dom->loadHTML(str_replace('&nbsp;', ' ', $rs['description']));
    $TMP_DESCR = $dom->firstChild->nodeValue;

    // do something with $TMP_DESCR
}

1 Comment

seems a long method and as I am running a loop, so I think this will be extensive.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.