php : parse html : extract script tags from body and inject before </body>? [duplicate]

Question

I don't care what the library is, but I need a way to extract <.script.> elements from the <.body.> of a page (as string). I then want to insert the extracted <.script.>s just before <./body.>.

Ideally, I'd like to extract the <.script.>s into 2 types;
1) External (those that have the src attribute) 2) Embedded (those with code between <.script.><./script.>)

So far I've tried with phpDOM, Simple HTML DOM and Ganon.
I've had no luck with any of them (I can find links and remove/print them - but fail with scripts every time!).

Alternative to
https://stackoverflow.com/questions/23414887/php-simple-html-dom-strip-scripts-and-append-to-bottom-of-body (Sorry to repost, but it's been 24 Hours of trying and failing, using alternative libs, failing more etc.).

Based on the lovely RegEx answer from @alreadycoded.com, I managed to botch together the following;

$output = "<html><head></head><body><!-- Your stuff --></body></html>"
$content = '';
$js = '';

// 1) Grab <body>
preg_match_all('#(<body[^>]*>.*?<\/body>)#ims', $output, $body);
$content = implode('',$body[0]);

// 2) Find <script>s in <body>
preg_match_all('#<script(.*?)<\/script>#is', $content, $matches);
foreach ($matches[0] as $value) {
    $js .= '<!-- Moved from [body] --> '.$value;
}

// 3) Remove <script>s from <body>
$content2 = preg_replace('#<script(.*?)<\/script>#is', '<!-- Moved to [/body] -->', $content); 

// 4) Add <script>s to bottom of <body>
$content2 = preg_replace('#<body(.*?)</body>#is', '<body$1'.$js.'</body>', $content2);

// 5) Replace <body> with new <body>
$output = str_replace($content, $content2, $output);

Which does the job, and isn't that slow (fraction of a second)

Shame none of the DOM stuff was working (or I wasn't up to wading through naffed objects and manipulating).

"... This question may already have an answer here: ..." NO It doesn't! Thus Why I posted THIS ONE! (Maybe if you focused more on answering than policing, things would be better???) — theclueless1
– theclueless1, Commented May 2, 2014 at 13:34
If you are going to DownVote, at least have the stones to leave a comment explaining the reason. — theclueless1
– theclueless1, Commented May 2, 2014 at 18:43
This is NOT a duplicate. // This is a post about "any" php library/method, where as the "other" post was about a specific library being used at that time. // Unfortunately, as the title was changed........ :sigh: — theclueless1
– theclueless1, Commented May 8, 2014 at 14:17
Because it had been around a day, in which I'd tried various snippets etc. Then I opted to consider >>different<< libraries. The other post about [Specific], this post about [Any]. // Worse, it got pointed to a topic with No Answers (hardly helpful to anyone). — theclueless1
– theclueless1, Commented May 8, 2014 at 14:30

Rangad · Accepted Answer · 2014-05-02 13:35:33Z

8

To select all script nodes with a src-attribute

$xpathWithSrc = '//script[@src]';

To select all script nodes with content:

$xpathWithBody = '//script[string-length(text()) > 1]';

Basic usage(Replace the query with your actual xpath-query):

$doc = new DOMDocument();
$doc->loadHTML($html);

$xpath = new DOMXpath($doc);

foreach($xpath->query('//body//script[string-length(text()) > 1]') as $queryResult) {
    // access the element here. Documentation:
    // http://www.php.net/manual/de/class.domelement.php
}

edited May 2, 2014 at 13:35

answered May 2, 2014 at 13:29

Rangad

2,16024 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

theclueless1 Over a year ago

And the library is? (I'm going to assume "XPath"???). How does it handle possibly malformed HTML? (Thanks for the answer Amal Murali - just a bit peeved with the Overflow-Police and stressed with wasting 24 hours on parsing that doesn't do squat with Script tags.

Rangad Over a year ago

It's just phps default dom representation. It should be present in almost any php5 installation(as long libxml was present in any form at compile time). Handling of malformed html is possible, but it depends. If possible you should avoid it. Or sanatize your html beforehand.

theclueless1 Over a year ago

LOL - I cannot even get it to run without throwing errors. Where as the RegEx (I know, "yuk") above actually Works!

alreadycoded.com · Accepted Answer · 2014-05-02 13:47:25Z

5

$js = "";
$content = file_get_contents("http://website.com");
preg_match_all('#<script(.*?)</script>#is', $content, $matches);
foreach ($matches[0] as $value) {
    $js .= $value;
}
$content = preg_replace('#<script(.*?)</script>#is', '', $content); 
echo $content = preg_replace('#<body(.*?)</body>#is', '<body$1'.$js.'</body>', $content);

answered May 2, 2014 at 13:47

alreadycoded.com

3261 silver badge7 bronze badges

4 Comments

theclueless1 Over a year ago

That looks like it will grab the JS from the entire document, rather than only those contained in the <.body.> ?

theclueless1 Over a year ago

I've accepted as answer as it was the only "complete" provision, and the only thing I managed to get working. I've appended the "working" version (including </body.> only) to the bottom of my question. // Thank You!

pguardiario Over a year ago

It's messy, and it parses html with regex (which we all know is a no-no).

theclueless1 Over a year ago

@pguardiario - Yes, it's messy ... but It Works!!! That's more than I can say for my attempts with the DOM Libraries, not to mention doesn't involved includes and additional code etc. You don't like it? Then SHOW ME a library being included and doing the same job as that code does!

pguardiario · Accepted Answer · 2014-05-03 10:08:28Z

1

If you're really looking for an easy lib for this, I can recommend this one:

$dom = str_get_html($html);
$scripts = $dom->find('script')->remove;
$dom->find('body', 0)->after($scripts);
echo $dom;

There's really no easier way to do things like this in PHP.

answered May 3, 2014 at 10:08

pguardiario

55.2k21 gold badges130 silver badges169 bronze badges

Collectives™ on Stack Overflow

php : parse html : extract script tags from body and inject before </body>? [duplicate]

3 Answers 3

3 Comments

4 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

4 Comments

Comments

Linked

Related