PHP splitting a string of html into an array with class name of tag as key

Question

I need to take a string of html text like:

<p>This is a line with no spans<br>
This is a line <span class="second">This is secondary</span><br>  
This is another line <span class="third">And this is third</span> <span class="four">this is four</span></p>

And have it end up as an array in PHP like:

array(
    "This is a line with no spans",
    array(
      "This is a line",
      second => "This is secondary",
    ),
    array(
      "This is another line",
      third => "And this is third",
      four => "this is four"
    )
);

Getting each line into it's own value was easy, I just split the text on <br> and that works fine, but getting lines to be split with the class name I can't quite get. I feel like php's preg_split may hold the key, but I kind of suck with regular expressions and I can't get it figured out.

Any ideas?

JAAulde · Accepted Answer · 2011-08-13 22:10:38Z

3

You should not attempt to parse HTML with regex or other means. It is very complicated and will end up with terrible maintenance problems.

I highly recommend you look into how to read a chunk of markup into a DOM document [docs] and then use DOM methods to work with it just like you would browser side.

answered Aug 13, 2011 at 22:10

JAAulde

19.5k5 gold badges56 silver badges65 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

aron.duby Over a year ago

I've been using DomDocument to get to the point of getting the p tags, but I couldn't figure out a way to get it to split on the line breaks without it becoming text.

JAAulde Over a year ago

I wouldn't split on line breaks. Walk the nodes, checking their type and name (Do I have a text node? Do I have a BR element? ), and make decisions with that info.

aron.duby Over a year ago

I could have sworn I had tried that and it didn't work, but it did this time. Thanks man!

Community · Accepted Answer · 2017-05-23 12:11:56Z

1

It's not a good idea to use regular expressions to parse HTML (cite). It's just not a suitable tool; see @JAAulde's answer.

The best way is to do it purely with the DOM. Loop through all child nodes (including text nodes) to format the array the way you want. Like this:

$p = // get paragraph tag...
$lines = array();
$pChildren = $p->childNodes;
for ($i = 0; $i < $pChildren->length; $i++) {
    $line = array();
    $child = $pChildren->item($i);
    if ($child instanceof DOMText) {
        $line[] = $child->wholeText;
    } elseif ($child instanceof DOMElement) {
        if (strtolower($child->tagName) == 'br') {
            $lines[] = $line;
            $line = array();
        } elseif (strtolower($child->tagName) == 'span' && $child->hasAttribute('class')) {
            $line[$child->getAttribute('class')] = $child->nodeValue;
        }
    }
}

Warning: treat the above as pseudo-code, it has not been tested at all, just going from experience and the manual.

edited May 23, 2017 at 12:11

CommunityBot

11 silver badge

answered Aug 13, 2011 at 22:42

Jonah

10.1k5 gold badges49 silver badges80 bronze badges

3 Comments

aron.duby Over a year ago

I just finished writing this and came back and saw your answer. Almost identical.

JAAulde Over a year ago

For those who come along later with the same question, I do not dispute this being the correct answer. However it is important to point out that the missing step to get from what the OP has to what was accepted as an answer was the reading in of the markup to a PHP DOM Document. See my answer for links to docs on that.

Jonah Over a year ago

@JAAulde: excellent point, I'll allude to that and refer to your answer.

Cydonia7 · Accepted Answer · 2011-08-13 22:08:34Z

1

Maybe you can use an XML parser ? Here's the doc.

answered Aug 13, 2011 at 22:08

Cydonia7

3,8962 gold badges25 silver badges34 bronze badges

Collectives™ on Stack Overflow

PHP splitting a string of html into an array with class name of tag as key

3 Answers 3

3 Comments

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related