0

I've got a string :

$source = '&
<script type="text/javascript">&</script>
&
<script type="text/javascript">&</script>
&';

The desired result is :

&amp;
<script type="text/javascript">&</script>
&amp;
<script type="text/javascript">&</script>
&amp;

I try with :

echo preg_replace("#&(?!amp;)(?!<\/script>)(?![^<]script.*?>)#i",
                  "&amp;", $source);

But I can only replace the first "&" or they are all replaced.

How can I get this result ?

Edit 1 :

Now if I've got a string :

$source = '&
<script type="text/javascript">text&text</script>
&
<script type="text/javascript">&</script>
&';

The desired result is :

&amp;
<script type="text/javascript">text&text</script>
&amp;
<script type="text/javascript">&</script>
&amp;
2
  • Why do you need to encode things that might contain a <script> tag? If that is user input, you're wide open to all sorts of XSS nastiness. Commented Jan 28, 2010 at 21:12
  • I use Yahoo Yui's library and "post request" in XmlHttpRequest for datasources don't work Commented Jan 28, 2010 at 21:16

3 Answers 3

2

Try this

$output = preg_replace("/&(?!amp;)(?!<\/script>)(?![^<]script.*?>)/", "&amp;", $source);
Sign up to request clarification or add additional context in comments.

3 Comments

@Kevin - I tried it on my server and it works as you would expect. What version are you using?
I found how it don't work, I have update my question. I add "text&text". When "&" is between other characters, the regex don't work.
Ok I found the answer for my last comment. It's "/^&(?!amp;)(?![^<]script(.*?)>)(?!<\/script>)/"
1

Stop it with the regexes already. Please. I can't take it anymore. My head hurts, but only because I'm banging it on my desk.

I would suggest using DOMDocument or SimpleXmlElement to parse the string and then loop through each non-script tag to encode each ampersand.

5 Comments

I totally understand what you mean, I plan to use XSLT but for now I'm forced to use this case... sorry for your head ;)
@Christina Toma Why not? If it's as small a document as he shows, then it will require minimal processing for parsing. If, however, the string grows (likelihood of which is inversely proportional to how much the dev insists it won't happen), then this solution will scale well. And what dev wants to come in later and maintain that regex?
@Lucas - But why not use the regex provided in the accepted answer, which is faster than all the DOMDocument processing ? What would be the advantages of using DOMDocument in your opinion ?
@Christina Toma I've already listed some good reasons in my previous comment, but here are a couple specific examples: What if he decides, later, that he also wants to escape angled brackets? Or what if he decides he also wants to skip embed tags? In a large application, maintainability and scalability are far more important than negligible performance improvements.
@Lucas - You are absolutely right about maintainability and scalability, but speed is also very important in a large application.
0

Using the g modifier replaces your match globally (every occurence).

echo preg_replace("#&(?!amp;)(?!<\/script>)(?![^<]script.*?>)#ig",
                  "&amp;", $source);

1 Comment

Don't work : preg_replace() [function.preg-replace]: Unknown modifier 'g'

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.