0

I'm working on a regular expression pattern to extract tag and attributes from an html element. But I have some problems with matching the attributes :s. Only the last attribute is stored into the matches array.

Here is the code:

<?php
    $subject = '<font face="arial" size="1" color="red">hello world!</font>';
    $find= '/<(?P<tag>\w+)\s+((?P<attr>\w+)=(?P<value>[^\s""\'>]+|"[^"]*"|\'[^\']*\')\s*)*\/?>/si';

    preg_match_all( $find, $subject, $matches );
?>

Can someone help me out?

Many thanks

3
  • Drop that and use XPath instead. Commented Jul 12, 2010 at 15:55
  • You can't reliably parse HTML with regular expressions. See the awesome rant on this subject here: stackoverflow.com/questions/1732348/… Commented Jul 12, 2010 at 15:57
  • But what if I want to parse html to xhtml? I read that xpath is xhtml compatible. Commented Jul 12, 2010 at 16:11

1 Answer 1

1

Some important points:

  • You shouldn't use regex to parse HTML. PHP has many excellent HTML parsing libraries.
  • A group that captures repeatedly in a match only keeps the last capture.
    • One notable exception is .NET regex

References

Related questions

Sign up to request clarification or add additional context in comments.

1 Comment

This is the better read: regular-expressions.info/captureall.html - Capturing a repeated group vs repeating a capturing group.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.