how to remove link from simple dom html data

Question

I have this code, i get the info but with this i get the data + the link for example

require_once('simple_html_dom.php');
set_time_limit (0);

$html ='www.domain.com';
    $html = file_get_html($url);
        // i read the first div
    foreach($html->find('#content') as $element){
     // i read the second
        foreach ($element->find('p')  as $phone){
            echo $phone;

Mobile Pixel 2 - google << there the link

But i need remove these link, the problem is the next, i scrape this:

<p>the info that i really need is here<p>
     <p class="text-right"><a class="btn btn-default espbott aplus" role="button"
      href="brand/google.html">Google</a></p>

I read this: Simple HTML Dom: How to remove elements? But i cant find the answer

update: if i use this:

foreach ($element->find('p[class="text-right"]');

It will select the links but can't remove scrapped data

HamzaNig · Accepted Answer · 2018-06-29 16:58:56Z

1

You can use file_get_content with str_get_html and replace it :

include 'simple_html_dom.php';

$content=file_get_contents($url);

      $html = str_get_html($content);
    // i read the first div
foreach($html->find('#content') as $element){
 // i read the second
    foreach ($element->find('p[class="text-right"]')  as $phone){
        $content=str_replace($phone,'',$content);
                                                                }                                           
                                            }
print $content;
die;

answered Jun 29, 2018 at 16:58

HamzaNig

1,0271 gold badge10 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

AndrewPP Over a year ago

hi, with this code i get the same website (complete) what im trying to scrape

SirPilan · Accepted Answer · 2018-06-29 17:34:17Z

0

Or here a native version:

PHP-CODE

$sHtml = '<p>the info that i really need is here<p>
 <p class="text-right"><a class="btn btn-default espbott aplus" role="button"
  href="brand/google.html">Google</a></p>';

$sHtml = '<div id="wrapper">' . $sHtml . '</div>';
echo "org:\n";
echo $sHtml;

echo "\n\n";

$doc = new DOMDocument();
$doc->loadHtml($sHtml);

foreach( $doc->getElementsByTagName( 'a' ) as $element ) {
    $element->parentNode->removeChild( $element );
}

echo "res:\n";
echo $doc->saveHTML($doc->getElementById('wrapper'));

Output

org:
<div id="wrapper"><p>the info that i really need is here<p>
     <p class="text-right"><a class="btn btn-default espbott aplus" role="button"
      href="brand/google.html">Google</a></p></div>

res:
<div id="wrapper">
<p>the info that i really need is here</p>
<p>
     </p>
<p class="text-right"></p>
</div>

https://3v4l.org/RhuEU

answered Jun 29, 2018 at 17:34

SirPilan

4,8772 gold badges15 silver badges28 bronze badges

2 Comments

AndrewPP Over a year ago

I already edit the code, but i see only the plain text of the url

SirPilan Over a year ago

Take a look at the HTML-Source (CTLR+U)

Collectives™ on Stack Overflow

how to remove link from simple dom html data

2 Answers 2

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest