1

I have this code, i get the info but with this i get the data + the link for example

require_once('simple_html_dom.php');
set_time_limit (0);

$html ='www.domain.com';
    $html = file_get_html($url);
        // i read the first div
    foreach($html->find('#content') as $element){
     // i read the second
        foreach ($element->find('p')  as $phone){
            echo $phone;

Mobile Pixel 2 - google << there the link

But i need remove these link, the problem is the next, i scrape this:

<p>the info that i really need is here<p>
     <p class="text-right"><a class="btn btn-default espbott aplus" role="button"
      href="brand/google.html">Google</a></p>

I read this: Simple HTML Dom: How to remove elements? But i cant find the answer

update: if i use this:

foreach ($element->find('p[class="text-right"]');

It will select the links but can't remove scrapped data

2 Answers 2

1

You can use file_get_content with str_get_html and replace it :

include 'simple_html_dom.php';

$content=file_get_contents($url);

      $html = str_get_html($content);
    // i read the first div
foreach($html->find('#content') as $element){
 // i read the second
    foreach ($element->find('p[class="text-right"]')  as $phone){
        $content=str_replace($phone,'',$content);
                                                                }                                           
                                            }
print $content;
die;
Sign up to request clarification or add additional context in comments.

1 Comment

hi, with this code i get the same website (complete) what im trying to scrape
0

Or here a native version:

PHP-CODE

$sHtml = '<p>the info that i really need is here<p>
 <p class="text-right"><a class="btn btn-default espbott aplus" role="button"
  href="brand/google.html">Google</a></p>';

$sHtml = '<div id="wrapper">' . $sHtml . '</div>';
echo "org:\n";
echo $sHtml;

echo "\n\n";

$doc = new DOMDocument();
$doc->loadHtml($sHtml);

foreach( $doc->getElementsByTagName( 'a' ) as $element ) {
    $element->parentNode->removeChild( $element );
}

echo "res:\n";
echo $doc->saveHTML($doc->getElementById('wrapper'));

Output

org:
<div id="wrapper"><p>the info that i really need is here<p>
     <p class="text-right"><a class="btn btn-default espbott aplus" role="button"
      href="brand/google.html">Google</a></p></div>

res:
<div id="wrapper">
<p>the info that i really need is here</p>
<p>
     </p>
<p class="text-right"></p>
</div>

https://3v4l.org/RhuEU

2 Comments

I already edit the code, but i see only the plain text of the url
Take a look at the HTML-Source (CTLR+U)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.