PHP Split html string into array

Question

I hope I can get some help from you guys.

This is what I'm struggling with, I have a string of HTML that will look like this:

<h4>Some title here</h4>
<p>Lorem ipsum dolor</p>
(some other HTML here)

<h4>Some other title here</h4>
<p>Lorem ipsum dolor</p>
(some other HTML here)

I need to split all the <h4> from the rest of the content, but for example the content after the first <h4> and before the second <h4> needs to be related to the first <h4>, something like this:

Array {
       [0] => <h4>Some title here</h4>
       [1] => <p>Lorem ipsum dolor</p>
}

Array {
       [0] => <h4>Some other title here</h4>
       [1] => <p>Lorem ipsum dolor</p>
}

This is to build an accordion (quite difficult to explain why I'm doing this way, but it has to be this way), and the <h4> will be the accordion panel headings and when clicked it will expand and show the content associated with them.

I hope I made my problem clear, let me know of your thoughts and how should I do this the better way.

I was looking into DOMDocument, but I also tried with explode() but with no success.

I have this working with JavaScript but I need to achieve the same thing with PHP, but it's quite complicated to play with the DOM with PHP.

Thank you in advance.

Yes, it will always have h4 followed by any kind of HTML code, except h4, the only h4 are the titles. So yes I'm sure it will always be like that. — Hugo Carneiro
– Hugo Carneiro, Commented Nov 12, 2014 at 18:25
Have a look at this question: stackoverflow.com/questions/18156164/… — Derek
– Derek, Commented Nov 12, 2014 at 18:49
@DerekS thanks, this helped, going on the right track, just need to modify the code a bit to work like I wanted it. Thanks. — Hugo Carneiro
– Hugo Carneiro, Commented Nov 12, 2014 at 19:06

Community · Accepted Answer · 2017-05-23 12:16:25Z

6

I was able to do what I wanted following the example that Derek S gave me.

This was the result:

$html_string = 'HTML string';
$dom = new DOMDocument();
$dom->loadHTML($html_string);

foreach($dom->getElementsByTagName('h4') as $node) {
   $title = $dom->saveHTML($node);
   $content[$title] = array();

   while(($node = $node->nextSibling) && $node->nodeName !== 'h4') {
      $content[$title] = $dom->saveHTML($node);
   }
}

This will save the titles inside $title and the correspondent content inside $content[$title].

edited May 23, 2017 at 12:16

CommunityBot

11 silver badge

answered Nov 13, 2014 at 16:06

Hugo Carneiro

791 gold badge1 silver badge5 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

zack.lore · Accepted Answer · 2014-11-12 19:11:38Z

2

You could try something like:

preg_split("/<h4>.+</h4>/i", $html);

edited Nov 12, 2014 at 19:11

answered Nov 12, 2014 at 18:35

zack.lore

5275 silver badges10 bronze badges

4 Comments

DarkBee Over a year ago

Your pattern is missing start- and enddelimeter. Beter to add the case insensitive tag as well

zack.lore Over a year ago

Oops! Must have been in a hurry... fixed!

Hugo Carneiro Over a year ago

Thank you @zack.lore I appreciate it, but found the solution with the previous code.

A Friend Over a year ago

preg_split is definitely not the desired function to use here, preg_match is way more suitable. It took me over an hour to realize that preg_split actually removes all matches from the source, whereas what I (and the OP) needed was an array of all matches.

DragonYen · Accepted Answer · 2014-11-12 18:36:47Z

1

This should do what you want -- though I'm sure there are other (and possibly better) ways

$aHTML = explode("<h4>", $cHTML);
foreach ($aHTML AS $nPos => $cPanel) {
  if ($nPos > 0) {
    $aPanel = explode("</h4>", $cPanel);
    $cHeader = "<h4>" . $aPanel[0] . "</h4>";
    $cPanelContent = $aPanel[1];
  }
}

It doesn't put it in the array format you stipulated -- though you could do that yourself inside the loop. Otherwise your content could be output/constructed inside the loop.

Edit: Added the h4 and /h4 back in for completeness

edited Nov 12, 2014 at 18:36

answered Nov 12, 2014 at 18:28

DragonYen

96810 silver badges23 bronze badges

8 Comments

UnskilledFreak Over a year ago

other ways is an regexp split on /<h4>.*</h4>/ for example

DragonYen Over a year ago

This won't work if your "h4" is actually "H4" (uppercase) or if you had a stray space (like "/h4 "). In other words, it works on good clean HTML that you can control.

UnskilledFreak Over a year ago

i know, it was only a hint for other ways, but thanks for that ;)

DragonYen Over a year ago

@UnskilledFreak Sorry, that was a comment on my post -- not your post which I hadn't seen yet. I agree regexp split could work too.

UnskilledFreak Over a year ago

ah well, but uppercase html tags are deprecated if im wrong?

|

abhishek jain · Accepted Answer · 2021-03-24 08:57:00Z

0

You can use the same code with little small changes, And it will apply to all kinds of HTML not normal ones.

        $html_string = 'HTML string';
        $dom = new DOMDocument();
        $dom->loadHTML($html_string);

        $content = [];
        $value = '';

        foreach($dom->getElementsByTagName('h4') as $node) {
           $title = $dom->saveHTML($node);
           $content[$k]['key'] = $title;

           while(($node = $node->nextSibling) && $node->nodeName !== 'h4') {
              $value .= $dom->saveHTML($node);
           }

           $content[$k]['value'] = $value;
        }
        
        echo '<pre>';
        print_r($content);die;

answered Mar 24, 2021 at 8:57

abhishek jain

292 silver badges8 bronze badges

Collectives™ on Stack Overflow

PHP Split html string into array

4 Answers 4

Comments

4 Comments

8 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

4 Comments

8 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related