1

I have searched but cannot find a solution that works. I have tried using DOM but the result is not identical (different spaces and tag elements - minor differences but I need identical for further pattern searches on the source) to the source, hence I would like to try regex. Is this possible (I know it isn't best solution but would like to try it)? For example is it possible to return all of the div class "want-this-entire-div-class" including inner:

$html = '<div class="not-want">
        <div class="also-not-want">
    <div class="want-this-entire-div-class">
<button class="dropdown-toggle search-trigger" data-toggle="dropdown"></button>
<div class="dropdown-menu j-dropdown">
<div class="header-search">
        <input type="text" name="search" value="" placeholder="Search entire site here..." 
class="search-input" data-category_id=""/>
  <button type="button" class="search-button" data-search-url="https://www.xxxxcom/index.php? 
route=product/search&amp;search="></button>
</div>
</div>
</div>
<div class="not-want-this-also">
<div class="or-this">';

The following stops after the first div>

preg_match('/<div class="want-this-entire-div-class"(.*?)</div>/s', $html, $match); Thanks

1
  • Maybe something like for every '<div', skip 1 '</div>, but I don't know how to do this with regex. Commented Nov 29, 2020 at 2:59

3 Answers 3

1

One way to tackle this is with a state machine. You enumerate all the possible states, then take action depending on what state you are in. In this case it's

  1. line to ignore
  2. target open div
  3. line to add
  4. extra open div
  5. extra close div
  6. target close div

I don't expect this is robust, but it does work for the given example:

<?php
function inner_div(string $html_s, string $cont_s): string {
   $html_a = explode("\n", $html_s);
   $div_b = false;
   $div_n = 0;
   foreach ($html_a as $tok_s) {
      # state 2: target open div
      if (str_contains($tok_s, 'want-this-entire-div-class')) {
         $div_b = true;
      }
      # state 1: line to ignore
      if (! $div_b) {
         continue;
      }
      # state 3: line to add
      $out_a[] = $tok_s;
      # state 4: extra open div
      if (str_contains($tok_s, '<div')) {
         $div_n++;
      }
      # state 5: extra close div
      if (str_contains($tok_s, '</div>')) {
         $div_n--;
      }
      # state 6: target close div
      if ($div_n == 0) {
         break;
      }
   }
   return implode("\n", $out_a);
}
Sign up to request clarification or add additional context in comments.

Comments

0

Have you thought of using an off the shelf html parsing library? And for context on using regex to parse html RegEx match open tags except XHTML self-contained tags

Comments

0

Input

$html = '<div class="not-want">
        <div class="also-not-want">
    <div class="want-this-entire-div-class">
<button class="dropdown-toggle search-trigger" data-toggle="dropdown"></button>
<div class="dropdown-menu j-dropdown">
<div class="header-search">
        <input type="text" name="search" value="" placeholder="Search entire site here..." 
class="search-input" data-category_id=""/>
  <button type="button" class="search-button" data-search-url="https://www.xxxxcom/index.php? 
route=product/search&amp;search="></button>
</div>
</div>
</div>
<div class="not-want-this-also">
<div class="or-this">';

Code

$document   = new DOMDocument();            // Create DOM object
$document->loadHTML($html);                 // Load html into object
$class_name = "want-this-entire-div-class"; // Set class name to be found
$xpath      = new DomXPath($document);      // Create XPath object
$node = $xpath->query("//div[@class='{$class_name}']")->item(0); // Run query on loaded html
echo $document->saveHTML($node);            // Print result to page

Output

<div class="want-this-entire-div-class">
<button class="dropdown-toggle search-trigger" data-toggle="dropdown"></button>
<div class="dropdown-menu j-dropdown">
<div class="header-search">
        <input type="text" name="search" value="" placeholder="Search entire site here..." class="search-input" data-category_id=""><button type="button" class="search-button" data-search-url="https://www.xxxxcom/index.php? 
route=product/search&amp;search="></button>
</div>
</div>
</div>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.