Extracting multiple strong tags using PHP Simple HTML DOM Parser

Question

I have over 500 pages (static) containing content structures this way,

<section>
Some text 
<strong>Dynamic Title (Different on each page)</strong> 
<strong>Author name (Different on each page)</strong> 
<strong>Category</strong>
(<b>Content</b> <b>MORE TEXT HERE)</b>
</section>

And I need to extract the data as formatted below, using PHP Simple HTML DOM Parser

$title = <strong>Dynamic Title (Different on each page)</strong> 
$authot = <strong>Author name (Different on each page)</strong> 
$category = <strong>Category</strong>
$content = (<b>Content</b> <b>MORE TEXT HERE</b>)

I have failed so far and can't get my head around it, appreciate any advice or code snippet to help me going on.

EDIT 1, I have now solved the part with strong tags using,

$html = file_get_html($url);
$links = array();
foreach($html->find('strong') as $a) {
 $content[] = $a->innertext;
}

$title= $content[0];                
$author= $content[1];

the only remaining issue is --> How to extract content within parentheses? using similar method?

What code have you used so far that is failing? There might be a chance you almost had it. If you post it, folks here might be able to troubleshoot it or point out the problem. — Fluffeh
– Fluffeh, Commented Jun 10, 2014 at 12:42
The first problem is how to loop through those strong tags? I have this code but it select a random one, $html = file_get_html($url); foreach($html->find('strong') as $e) $field = $e->outertext; echo $field; — D_Guy13
– D_Guy13, Commented Jun 10, 2014 at 12:51
Don't post code in comments... Include it in your 1st qpost/question! — Enissay
– Enissay, Commented Jun 10, 2014 at 12:59

wbinky · Accepted Answer · 2014-06-10 14:00:37Z

2

OK first you want to get all of the tags Then you want to search through those again for the tags and tags Something like this:

// Create DOM from URL or file
$html = file_get_html('http://www.example.com/');
$strong = array();

// Find all <sections>
foreach($html->find('section') as $element) {

    $section = $element->src;

    // get <strong> tags from <section>
    foreach($section->find('strong') as $strong) {
        $strong[] = $strong->src;
    }
     $title = $strong[0];
     $authot = $strong[1];
     $category = $strong[2];

}

To get the parts in parentheses - just get the b tag text and then add the () brackets. Or if you're asking how to get parts in between the brackets - use explode then remove the closing bracket:

$pieces = explode("(", $title);
$different_on_each_page = str_replace(")","",$pieces[1]);

edited Jun 10, 2014 at 14:00

answered Jun 10, 2014 at 13:00

wbinky

1601 silver badge11 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

D_Guy13 Over a year ago

Thank you, Just before you posted your answer I ended up with -->Posted in my question above.

Sesertin · Accepted Answer · 2014-06-10 13:10:46Z

0

$html_code = 'html';
$dom = new \DOMDocument();
$dom->LoadHTML($html_code);
$xpath = new \DOMXPath($this->dom);
$nodelist = $xpath->query("//strong");
for($i = 0; $i < $nodelist->length; $i++){
    $nodelist->item($i)->nodeValue; //gives you the text inside
}

edited Jun 10, 2014 at 13:10

answered Jun 10, 2014 at 13:02

Sesertin

4622 silver badges12 bronze badges

2 Comments

wbinky Over a year ago

that isn't PHP Simple HTML DOM

Sesertin Over a year ago

that is php. It uses DomDocument class in php. php.net/manual/en/class.domdocument.php. Just take it into php file, substitute the html with your own string and put an echo in front of $nodelist->item($i)->nodeValue;. You will see it echoes all strong contents onto the screen.

D_Guy13 · Accepted Answer · 2014-06-10 14:01:35Z

0

My final code that works now looks like this.

$html = file_get_html($url);
$links = array();
foreach($html->find('strong') as $a) {
 $content[] = $a->innertext;
}

$title= $content[0];                
$author= $content[1];
$category = $content[2];


$details = file_get_html($url)->plaintext; 
$input = $details;
preg_match_all("/\(.*?\)/", $input, $matches);
print_r($matches[0]);

answered Jun 10, 2014 at 14:01

D_Guy13

5712 gold badges7 silver badges17 bronze badges

Collectives™ on Stack Overflow

Extracting multiple strong tags using PHP Simple HTML DOM Parser

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related