1

i want to remove the class "refs" that includes references. the page(http://www.sacred-destinations.com/mexico/palenque) from where i m getting the content looks like:

 <div class="col-sm-6 col-md-7" id="essay">
    <section class="refs">
    </section>
    </div><!-- end #essay -->

now i am not getting how to remove this 'refs' class as it is enclosed in "section" like something.. here is something that i have done so far...

<?php
$url="http://www.sacred-destinations.com/mexico/palenque";
 $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
    $html = curl_exec($ch);
    curl_close($ch);
    $newDom = new domDocument;
    libxml_use_internal_errors(true);
    $newDom->loadHTML($html);
    libxml_use_internal_errors(false);
    $newDom->preserveWhiteSpace = false;
    $newDom->validateOnParse = true;
    $sections = $newDom->saveHTML($newDom->getElementById('essay'));
$text=$sections->find('<section class="refs">');
$result=removeClass($text);
echo $result;
?>

1 Answer 1

2

DOMDocument has no find() method, you have to use DOMXPath::evaluate() with XPath expressions.

$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_use_internal_errors(false);
$dom->preserveWhiteSpace = false;
$xpath = new DOMXPath($dom);

$expression = 
  '//div[
     @id="essay"
   ]
   /section[
     contains(
       concat(" ", normalize-space(@class), " "), " refs "
     )
   ]';

foreach ($xpath->evaluate($expression) as $section) {  
  $section->removeAttribute('class');
}
echo $dom->saveHtml();

Class attributes can contain multiple values like classOne classTwo. With normalize-space() the whitespaces will be reduced to single spaces inside the string (start and end removed). concat() add spaces to the start and end. This avoid matching the class name as part of another class name.

In the example the whole class attribute will be removed. To modify it you can read it with DOMElement::getAttribute() and use string functions to change it.

Here are several DOM based libraries that can make HTML manipulation easier.

Sign up to request clarification or add additional context in comments.

3 Comments

the class is not removed..still 'refs' class is appearing in the output...:|
there is still problem in your code...i have tried but couldnt fix it.. can you please fix it
The source removes the class attribute, the demo shows that. Requests with curl are a different topic, and you did not ask about that part. Validate if that part works. If not, try to debug it. After you get the HTML you can edit it. At the moment you just say it doesn't work. I can't help without a real error description.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.