4

I have html like this:

  <ul id="video-tags">
            <li><em>Tagged: </em></li>
                    <li><a href="/tags/sports">sports</a>, </li>
                            <li><a href="/tags/entertain">entertain</a>, </li>
                            <li><a href="/tags/funny">funny</a>, </li>
                            <li><a href="/tags/comedy">comedy</a>, </li>
                            <li><a href="/tags/automobile">automobile</a>, </li>
                    <li>more <a href="/tags/"><strong>tags</strong></a>.</li>
  </ul>

How can I extract the sports, entertain, funny, comedy, automobile into string

my php preg_match_all look like this:

preg_match_all('/<a href\="\/tags\/(.*?)\">(.*?)<\/a>, <\/li>/', $this->page, $matches);
echo var_dump($matches);    
echo implode(' ', $tags);  

It does not work.

4
  • 1
    How does it 'not work'? What are you getting? Errors? A different string than you expect? What IS it doing (or not doing)? What is $tags supposed to be, where is it set? Commented Dec 25, 2012 at 18:29
  • my var_dump look like this: array(3) { [0]=> array(0) { } [1]=> array(0) { } [2]=> array(0) { } } Commented Dec 25, 2012 at 18:31
  • im expecting something like: sports, entertain, funny, comedy, automobile showed inside array or string Commented Dec 25, 2012 at 18:31
  • stackoverflow.com/questions/1732348/… Commented Dec 25, 2012 at 19:08

3 Answers 3

4

I'm not sure how you're getting $this->page from, however the following should work as you're expecting:

http://ideone.com/KhWkEg

<?php
$page = 'subject string ...';

preg_match_all('/<a href\="\/tags\/(.*?)\">(.*?)<\/a>, <\/li>/', $page, $matches);

echo implode(', ', $matches[1]);  
?>

Substitute the $page variable for your $this->page so long as it is still a string.

However, I'd suggest not trying to parse HTML with Regular Expressions. Instead, use a library like PHP DOM document or SimpleHTMLdom to properly parse HTML.

Sign up to request clarification or add additional context in comments.

Comments

2

This small regex does the same thing too.

preg_match_all('|tags/[^>]*>([^<]*)|', $str, $matches);

Also using DOMDocuemnt.

$d = new DOMDocument();
$d->loadHTML($str);
$as = $d->getElementsByTagName('a');
$result = array();
for($i=0;$i<($as->length-1); $i++)
    $result[]=$as->item($i)->textContent;

echo implode(' ', $result);  

Comments

1

This worked perfectly for me:

preg_match_all('/<a href\="\/tags\/(.*?)\">.*?<\/a>, <\/li>/', $str, $matches);
echo implode(',', $matches[1]);

Prints: sports,entertain,funny,comedy,automobile

$this->page is probably empty, that's why you are not getting any data.

Why do you put the brackets twice in regexp? You have the same words both in url and text of the link.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.