0

My [php executed] regex is terrible and I'm struggling with trying to isolate javascript scripting within HTML blocks. I have the following regex that works partially, but it's run into a problem if there's the word "on" in the text (as opposed to in a < tag >).

$regex = "/<script.*?>.*?<\/script.*?>(*SKIP)(*F)|((\\bon(.*?=)(.*?))(\'|\")(.*?)(\\5))/ism";

$html = preg_replace_callback($regex,
           function ($matches) {
               $mJS = $matches[2] . $matches[5] . myFunction($matches[6]) . $matches[5];
               return $mJS;
           },
           $html);

I think the issue is that the \bon.... part needs to be qualified to be inside a < tag > before being considered, but I just don't know how.

Running the following test...

$html= "<div id='content' onClick='abc()'>Lorem On='abc' ipsum on to</div>
<input id='a' type='range'>
<input id='b' type='range'>
<script>abc();</script>";

Returns...

<div id='content' onClick='****abc()****'>Lorem On='****abc****' ipsum on to</div>
<input id='****a****' type='range'>
<input id='b' type='range'>
<script>abc();</script>

but I wanted...

<div id='content' onClick='****abc()****'>Lorem On='abc' ipsum on to</div>
<input id='a' type='range'>
<input id='b' type='range'>
<script>****abc();****</script>

I have a sandbox running this if you want to have a play: https://onlinephp.io/c/a43b1

Does anyone have any suggestions?

5
  • You skip the <script...</script> but I wanted ...<script>****abc();****</script>. Doing hard to understand, can you clarifiy or recheck your desired output? Commented Nov 14, 2022 at 13:44
  • Btw. does not look like you need a callback, have a try with this PHP demo at tio.run - Regex explained at regex101. Guessing yet that's what intended. Commented Nov 14, 2022 at 13:56
  • Thanks BB - I didn't mean to skip <script>...</script>....yes I DID want <script>****abc();****</script>; I think I do need the callback as I actually need to call another PHP function once I've got the code isolated ( I've adjusted the code sample above to show this now) Commented Nov 14, 2022 at 14:15
  • 1
    Hmm, why use (*SKIP)(*F) then? Have a look at this regex101 demo. Commented Nov 14, 2022 at 14:49
  • Thanks BB - I think that's working for me. I've placed a working php on onlinephp.io/c/a249d. Commented Nov 14, 2022 at 15:06

1 Answer 1

1

With help from Bobble Bubble, I've been able to get this working...

((Edit Note (Jan'23) - the following is a revised version of the answer which had previously not taken into account of escaped or .replace(/'/g problems):

<?php

const regex = <<<'PATTERN'
/(<script\b[^><]*>)(.*?)(<\/script>)|\bon\w+\s*=\s*\K(?|(')([^'\\]*(?:(?:\\.|'(?=[^)(]*\)))[^'\\]*)*)'|(")([^"\\]*(?:(?:\\.|"(?=[^)(]*\)))[^"\\]*)*)")/ism
PATTERN;

const html=<<<'PATTERN'
<div id='content' onClick='abc()'>Lorem On='abc' ipsum on to</div>
<input id='a' type='range'>
<input id='b' type='range'>
<script>abc();</script>";

<div id='content'
         onClick='yyy("ere\'xyz\'").value=\'ewew\'; yyy("jhrhej")'
    >Lorem On='abc' ipsum on to</div>


    <input id='a' type='range'
           onPress="xxx(document.getElementById(\"abc\"))"
           onSomething="yyy(\'fehrje\')"
           onSomethingElse="document.getElementById('content').innerHTML.replace(/"/g, \"dq\")">
    <input id='b' type='range'>

PATTERN;

function myFunction($tx) {
    return "****$tx****";
}


$regex = regex;
$html  = html;

$result = preg_replace_callback($regex,
        function ($matches)  {
            if ( isset($matches[1])) $m1=$matches[1]; else $m1="";
            if ( isset($matches[2])) $m2=$matches[2]; else $m2="";
            if ( isset($matches[3])) $m3=$matches[3]; else $m3="";
            if ( isset($matches[4])) $m4=$matches[4]; else $m4="";
            if ( isset($matches[5])) $m5=$matches[5]; else $m5="";
            $mJS = $m1.$m4 . myFunction($m2.$m5) .$m3.$m4;
            return $mJS;
        },$html);


echo "Result=$result";
echo "\n\n";
?>

See https://onlinephp.io/c/ca781 for a running executable.

Sign up to request clarification or add additional context in comments.

6 Comments

I've just played with above demo (link too long for writing more), great you got it going! :)
Hi, I have run into a new problem with this solution - in that escaped quotes are not being handled. I have updated the test with regex101.com/r/sRNTVI/2 but is there an easy fix to the regex to have the portions it finds to continue until their corresponding quotes? (e.g. onClick='yyy("ere\'xyz\'").value=\'ewew\'; yyy("jhrhej")' should find yyy("ere\'xyz\'").value=\'ewew\'; yyy("jhrhej") ?) (also note the additional problem of if the portion includes a .replace(/" type situation that isn't escaped but also needs to be handled).
You would need to use a pattern that can deal with escaped quotes: $regex = '/(<script\b[^><]*>)(.*?)(<\/script>)|\bon\w+\s*=\s*\K(?|(\')([^\'\\\\]*(?:\\\\.[^\'\\\\]*)*)\'|(")([^"\\\\]*(?:\\\\.[^"\\\\]*)*)")/is'; like this regex101 demo
Your second requirement with the .replace( is more difficult. You can try to treat those quotes inside (...) like the escaped ones? For a very experimental idea see this updated demo: $regex = '/(<script\b[^><]*>)(.*?)(<\/script>)|\bon\w+\s*=\s*\K(?|(\')([^\'\\\\]*(?:(?:\\\\.|\'(?=[^)(]*\)))[^\'\\\\]*)*)\'|(")([^"\\\\]*(?:(?:\\\\.|"(?=[^)(]*\)))[^"\\\\]*)*)")/is';
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.