12

I've got a string with HTML attributes:

$attribs = ' id= "header " class = "foo   bar" style ="background-color:#fff; color: red; "';

How to transform that string into an indexed array, like:

array(
  'id' => 'header',
  'class' => array('foo', 'bar'),
  'style' => array(
    'background-color' => '#fff',
    'color' => 'red'
  )
)

so I can use the PHP array_merge_recursive() function to merge 2 sets of HTML attributes.

8 Answers 8

24

Use SimpleXML:

<?php
$attribs = ' id= "header " class = "foo   bar" style ="background-color:#fff; color: red; "';

$x = new SimpleXMLElement("<element $attribs />");

print_r($x);

?>

This assumes that the attributes are always name/value pairs...

Sign up to request clarification or add additional context in comments.

1 Comment

This answer does not produce a multidimensional array as desired.
8

You could use a regular expression to extract that information:

$attribs = ' id= "header " class = "foo   bar" style ="background-color:#fff; color: red; "';
$pattern = '/(\\w+)\s*=\\s*("[^"]*"|\'[^\']*\'|[^"\'\\s>]*)/';
preg_match_all($pattern, $attribs, $matches, PREG_SET_ORDER);
$attrs = array();
foreach ($matches as $match) {
    if (($match[2][0] == '"' || $match[2][0] == "'") && $match[2][0] == $match[2][strlen($match[2])-1]) {
        $match[2] = substr($match[2], 1, -1);
    }
    $name = strtolower($match[1]);
    $value = html_entity_decode($match[2]);
    switch ($name) {
    case 'class':
        $attrs[$name] = preg_split('/\s+/', trim($value));
        break;
    case 'style':
        // parse CSS property declarations
        break;
    default:
        $attrs[$name] = $value;
    }
}
var_dump($attrs);

Now you just need to parse the classes of class (split at whitespaces) and property declarations of style (a little bit harder as it can contain comments and URLs with ; in it).

3 Comments

Thank you Gumbo, your regex is cool. The only problem is $attrs['class'] or $attrs['style'] are returning strings: so it will be difficult to merge them with another $attribs string, for example merging that 2 sets of attribs: $attribs1 = 'class="foo bar"'; $attribs2 = 'class="lorem"'; into a 'class="foo bar lorem"' That's why I would like $attrs['class'] returns an array: array('foo', 'bar') Do you have an idea to enhance this?
I've just written an alternative regex which also parses HTML5 style boolean attributes (without an = sign) and uses a back reference for the quotes: (\w+)\s*(=\s*(["'])(.*?)\2\s)?
I will admit, I loathe regex. But something like this really does need a properly built function that parses things the good old way. the regex is simply too hard to read and will invariably make assumptions that are not correct. Can it handle single qoutes within a double qoute? can it handle special characters? what about attributes without qoutes, that's valid html too. Can it do multiple css classes within the same attribute? extra spaces here and there? Self closing tags? If it fails with one of them, how much of a pain would it be to fix it? Naaah, wrong approach here, mate.
8

Easy way could be also:

$atts_array = current((array) new SimpleXMLElement("<element $attribs />"));

2 Comments

This answer does not produce a multidimensional array as desired. 3v4l.org/beYWA
Why would it? It's not php's job to know the semantics of the attributes, further processing can be done on the resulting array via explode if necessary to get the desired output
6

You can't use a regular expression to parse html-attributes. This is because the syntax is contextual. You can use regular expressions to tokenize the input, but you need a state machine to parse it.

If the performance isn't a big deal, the safest way to do it, is probably to wrap the attributes in a tag and then send it through an html parser. Eg.:

function parse_attributes($input) {
  $dom = new DomDocument();
  $dom->loadHtml("<foo " . $input. "/>");
  $attributes = array();
  foreach ($dom->documentElement->attributes as $name => $attr) {
    $attributes[$name] = $node->value;
  }
  return $attributes;
}

You could probably optimize the above, by reusing the parser, or by using XmlReader or the sax parser.

5 Comments

Parse this: foo='bar' cuux="O'Reiley" zip="\"zap\""
@troelskn: The third attribute value declaration is invalid. The " need to be represented by character references.
You're right - I wasn't aware of that. I would still suggest using an xml/html parser, to account for all sorts of odd edge cases.
Warning: DOMDocument::loadHTML(): Tag foo invalid in Entity This answer doesn't seem to work after silencing parsing errors. 3v4l.org/0QaWI
None of the answers here can handle newer html 5 tags or lone attributes. Making them unsuitable for a lot of use case
3

May be this helps you .. What it does ..

  • A HTML DOM parser written in PHP5+ let you manipulate HTML in a very easy way!
  • Require PHP 5+.
  • Supports invalid HTML.
  • Find tags on an HTML page with selectors just like jQuery.
  • Extract contents from HTML in a single line.

http://simplehtmldom.sourceforge.net/

1 Comment

Note that the one reason I ended up here is because the DOMProcessingInstruction has a data field which is the text wihin the <?name and ?>. In case of a tag such as: <?xml-stylesheet type="text/xsl" href="https://sms.m2osw.com/sitemap.xsl"?> you get a plain string like: type="text/xsl" href="https://sms.m2osw.com/sitemap.xsl" which you need to parse as attributes.
2

A simple and effective function to solve this

function attrString2Array($attr) {
  $atList = [];

  if (preg_match_all('/\s*(?:([a-z0-9-]+)\s*=\s*"([^"]*)")|(?:\s+([a-z0-9-]+)(?=\s*|>|\s+[a..z0-9]+))/i', $attr, $m)) {
    for ($i = 0; $i < count($m[0]); $i++) {
      if ($m[3][$i])
        $atList[$m[3][$i]] = null;
      else
        $atList[$m[1][$i]] = $m[2][$i];
    }
  }

  return $atList;
}

print_r(attrString2Array('<li data-tpl-classname="class" data-tpl-title="innerHTML" disabled nowrap href="#" hide src = "images/asas.gif">'));
print_r(attrString2Array('data-tpl-classname="class" data-tpl-title="innerHTML" disabled nowrap href="#" hide src = "images/asas.gif"'));

//Array
//(
//    [data-tpl-classname] => class
//    [data-tpl-title] => innerHTML
//    [disabled] => 
//    [nowrap] => 
//    [href] => #
//    [hide] => 
//    [src] => images/asas.gif
//)

2 Comments

Welcome to StackOverflow! Please edit your answer to provide an explanation of your code. This will improve your answer's quality and make it more likely for it to get upvoted :)
Did you notice that the OP's question is seeking a multi-dimensional result?
0

DOMDocument is the answer.

Take your original list of attributes and wrap it in some kind of tag, then feed it through DOMDocument, and import it into simplexml.

If an array is what you want, you can cast the SimpleXMLElement as an array and access its @attributes property as an array key.

$attribs = ' id= "header " class = "foo   bar" style ="background-color:#fff; color: red; "';

$params = new DOMDocument;
$params->loadHTML('<br '.$attribs.'/>');
$paramsXML = simplexml_import_dom($params->documentElement);
$paramsArray = ((array)$paramsXML->body->br->attributes())['@attributes'];
print_r($paramsArray);
/*
Array
(
    [id] => header 
    [class] => foo   bar
    [style] => background-color:#fff; color: red; 
)
*/

/* parse the style elements into an array, collapse it back into a string
 * and feed it back into DOMDocument again to achieve your 2nd dimension.
 */

$styles = explode(';',$paramsArray['style']);
foreach($styles as $key=>$value) {
    $styles[$key] = str_replace(':','="',$value).'"';
}
$style = new DOMDocument;
$style->loadHTML('<br '.implode(' ',$styles).'/>');
$styleXML = simplexml_import_dom($style->documentElement);
$paramsArray['style'] = ((array)$styleXML->body->br->attributes())['@attributes'];
print_r($paramsArray);
/*
Array
(
    [id] => header 
    [class] => foo   bar
    [style] => Array
               (
                   [background-color] => #fff
                   [color]            => red
               )
)
*/

I find DOMDocument useful for a lot of really weird applications. YMMV

2 Comments

The asked question is seeking a multidimensional array. Are you able to expand your solution to parse out the style declaration? I don't mean to be this guy: richeyweb.com/blog/personal/…
You meant it, and I deserve to live up to my own expectations.
0

Here is my approach using DomDocument, to isolate attribute declarations, then splitting the style declarations into a subarray. I've expanded the input string to express how fringe cases will be stored in the result array. Demo

$attribs = ' id= "header " class = "foo   bar" style ="background-color:#fff; color: red; " disabled readonly checked hidden required data-test="just checking"';
$result = [];

$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML("<p $attribs>");
foreach ($dom->getElementsByTagName('p')->item(0)->attributes as $attr) {
    $name = $attr->name;
    if (isset($result[$name])) {
        continue;  // as HTML, JS, and CSS will behave, ignore subsequent duplicate attributes
    }
    $value = trim($attr->value);
    if ($name === 'style') {
        $value = array_reduce(
            explode(';', $attr->value),
            function ($res, $pair) {
                if (sscanf($pair, '%[^:]:%[^;]', $k, $v) === 2) {
                    $res[trim($k)] = trim($v);
                }
                return $res;
            }
        );
    }
    $result[$name] = $value;
}
var_export($result);

Output:

array (
  'id' => 'header',
  'class' => 'foo   bar',
  'style' => 
  array (
    'background-color' => '#fff',
    'color' => 'red',
  ),
  'disabled' => 'disabled',
  'readonly' => 'readonly',
  'checked' => 'checked',
  'hidden' => '',
  'required' => '',
  'data-test' => 'just checking',
)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.