2

I am using the PHP Simple HTML DOM parser to scrap website data, but unfortunately not able to extract the data i want to. I have also tried to google and look in the documentation but could not solve the issue. The code structure of what i am trying to scrap is something like this.

<div id="section1">
   <h1>Some content</h1>
   <p>Some content</p>
   ............
    <<Not fixed number of element>>
   ............
   <script> <<Some script>></script>
   <video>
     <source src="www.exmple.com/34/exmple.mp4">
   </video>
</div>

I tried with JavaScript and i could do the same like this

document.getElementById("section1").getElementsByTagName("source")[0].getAttribute("src");

But when i tried with PHP Dom parser i m not getting any data. Here is how my code looks likes

require ''.$_SERVER['DOCUMENT_ROOT'].'/../lib/simplehtmldom/simple_html_dom.php';

 $html_content = get($url); //This is cURL function to get website content.
 $obj_content = str_get_html($html_content);
 $linkURL = $obj_content->getElementById('section1')->find('source',0)->getAttribute('src');
var_dump($linkURL); 

This results in an empty string. I also tried changing to code a bit here and there but none of those works every time came blank. But if i var dump $obj_content i get lot of dom element

I tried to follow these posts from stackoverflow which are similar to mine , but these did not help me.

  1. How do I get the HTML code of a web page in PHP?
  2. PHP Simple HTML DOM
  3. PHP Simple HTML DOM Parser Call to a member function children() on a non-object
  4. And their manual http://simplehtmldom.sourceforge.net/manual.htm

Can anyone please help me

Thank you

8
  • Is that part of the HTML added dynamically after page load? Commented Aug 13, 2018 at 16:48
  • No the page load once. There is no dynamically adding after that Commented Aug 13, 2018 at 16:52
  • So if you var_dump whatever is returned from your cURL request, do you see this source tag with a value in the src attribute? Commented Aug 13, 2018 at 16:55
  • 1
    OK then - look at the HTML from the var_dump, find the #section1 > source[0] path, and see if there's a value in the src attribute. Commented Aug 13, 2018 at 17:11
  • 1
    @WillardSolutions, you were correct. The source file url that i am trying to fetch is actually getting injected by the JS script that is above video tag. Extracting content of the script tag and striping the content i took out the url i wanted. Commented Aug 21, 2018 at 8:29

1 Answer 1

0

The code snippet is fine as it is. Problem was that the URL that I was targeting was not there at the time of page load. It was added by the <script> tag after page being loaded.

Thank you @WillardSolutions

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.