2

I want to get a specific tag from url, from example:

If I have this content:

<div id="hey">
   <div id="bla"></div>
</div>

<div id="hey">
   <div id="bla"></div>
</div>

And I want to get all divs with the id "hey", ( i think its with preg_match_all ), How can I do that?

  • The content inside the tag can be changed.
5
  • 2
    @Daniel just a note that id values should be unique Commented Aug 23, 2011 at 22:47
  • I'm not exactly sure what you're asking for here. Are you saying you want to pass in an ID via a query string parameter and then search for that parameter in the page content? Or are you passing that HTML string via a query parameter and you need to parse that? Also, your HTML markup is invalid so you'll likely have a tough time programmatically parsing it under any conditions. Commented Aug 23, 2011 at 22:47
  • To be clear "from URL", it appears you mean "Given a webpage (whose URL I know) how do I scrape the contents from inside a particular HTML tag?" As it is, it is hard to tell what is being asked, particularly because "GET" (in all caps) in relation to URLs normally refers to a method of form-data encoding. (e.g. http://example.org/?field1=value1 is a URL which could result from a GET form) Commented Aug 23, 2011 at 22:47
  • 1
    FYI, ids are supposed to be single use. If you want to apply styles to multiple elements, you should be defining them to have the same class. Having multiple elements with the same ID can cause issues with JavaScript, forms, etc. Commented Aug 23, 2011 at 22:48
  • To get url ( for example $url ) and to print only the content inside the divs which I want to print ( like "hey"). Commented Aug 23, 2011 at 22:57

1 Answer 1

3

I recommend use DOMDocument class instead of regular expressions (is less resource consumer and more clear IMHO).

$content = '<div id="hey">
   <div id="bla"></div>
</div>

<div id="hey">
   <div id="bla"></div>
</div>';

$doc = new DOMDocument();
@$doc->loadHTML($content); // @ for possible not standard HTML
$xpath = new DOMXPath($doc);
$elements = $xpath->query("//div[@id='hey']");

/*@var $elements DOMNodeList */
for ($i=0;$i<$elements->length;$i++) {
    /*@var $curr_element DOMElement */
    $curr_element = $elements->item($i);

    // Here do what you want with the element
    var_dump($curr_element);
}

If you want to get the content from an URL you can use this line instead to fill the variable $content:

$content = file_get_contents('http://yourserver/urls/page.php');
Sign up to request clarification or add additional context in comments.

4 Comments

You should probably also suppress errors on the loadHTML() call, otherwise the DOMDocument will complain loudly about the multiple elements with the same id.
@AgentConundrum, I test with the HTML with same ids and surprisingly none problems arise, but just in case I add the @ to that line.
What are your error reporting settings? It issues an E_WARNING for me: Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID hey already defined in Entity, line: 5
I set as the initial line error_reporting(E_ALL); and nothing appears. I will check again.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.