9

I have a string that might look like this

$str = "<p>Me & Mrs Jones <br /> live in <strong style="color:#FFF;">España</strong></p>";
htmlentities($str,ENT_COMPAT,'UTF-8',false);

How can I convert the text to HTML entities without converting the HTML tags?

note: I need to keep the HTML intact

1

5 Answers 5

8

Disclaimer: I would not encode any entities, except for <, > and &. That said, if you really want this, do this:

$str = '...';
$str = htmlentities($str,ENT_NOQUOTES,'UTF-8',false);
$str = str_replace(array('&lt;','&gt;'),array('<','>'), $str);
Sign up to request clarification or add additional context in comments.

3 Comments

I would go with this too, most of the times there isn't a need to encode " and '. And stuff like €, á, é should be handled by Unicode already.
@TravisO: I expect the input to be valid HTML already. If he has 2 > 5 in his markup, this won't be handled just like <<<img src="" &&&& /> isn't handled correctly.
Your third line should read: $str = str_replace(array('&lt;','&gt;'),array('<','>'), $str);
2

The problem, that you face, is that under circumstances you already have encoded '<' and '>' in your text, so you have to filter them out after conversion.

This is similar to Evert's answer, but adds one more step to allow for content like 1 < 2 in your markup:

$str = htmlentities($str,ENT_NOQUOTES,'UTF-8',false);
$str = str_replace(array('&lt;','&gt;'),array('<','>'), $str);
$str = str_replace(array('&amp;lt;','&amp;gt'),array('&lt;','&gt;'), $str);

Comments

1

A good answer was post by Pascal MARTIN

See this SO topic

To resume, you can use this piece of code, to retrieve a list of correspondances character => entity :

$list = get_html_translation_table(HTML_ENTITIES);
unset($list['"']);
unset($list['<']);
unset($list['>']);
unset($list['&']);

Comments

0

I haven't use htmlentities before, but it seems like a bit more robust version of urlencode (which I use a lot). You might want to try:

htmlentities(strip_tags($str,ENT_COMPAT),'UTF-8',false);

Just as a little nugget, if you want to preserve <br> as standard carrage returns, you could do this:

htmlentities(strip_tags(str_replace("<br>","\n",$str,ENT_COMPAT)),'UTF-8',false);

I know that's something I sometimes like to do.

Good Luck.

1 Comment

urlencode and htmlentities do different things: urlencode makes the string valid to put in a url (e.g. turning & into %26), htmlentities escapes a sring for use in HTML (e.g. turning < into &lt;).
-1

If you mean to convert only text, then try this:

$orig = "<p>Me & Mrs Jones <br /> live in <strong style="color:#FFF;">España</strong></p>";
$str = strip_tags($orig);

$str = htmlentities($str,ENT_COMPAT,'UTF-8',false);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.