2

I'm trying to add a dynamic web scraping function to my website that gather data from another website automatically. Both websites have the same URL structure, and I use my website to generate the correct target url with a js script.

<script type="text/JavaScript">
  document.getElementById("demo").innerHTML = "https://www.website2.com" + window.location.pathname;
    </script>

Website 1. www.website1.com/test-123

Website 2. www.website2.com/test-123


I found the Simple HTML DOM Parser which allow me to go into a specific website and get HTLM elements.
However, it require a target URL. Is it possible to use the results from the script as a direct url?

Example: $html = file_get_html("#demo");?>

The code look like this:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Document</title>


    <?php include("simple_html_dom.php");
    $html = file_get_html("www.website2.com/test-123");?>


</head>
<body>
<h1>Företag</h1>
<?php echo $html->find("h1",0)->plaintext;?>

<h5><?php echo $html->find("h1",0)->plaintext;?></h5>

<?php
echo $html->find("h1",0)->plaintext;
echo $html->find("p",0)->plaintext;
echo $html->find("p",1)->plaintext;
echo $html->find("p",2)->plaintext;
?>


<?php
    echo "<div id='demo'></div>";
?>




</body>
<script type="text/JavaScript">
  
  document.getElementById("demo").innerHTML = "https://www.bolagsfakta.se" + window.location.pathname;
    </script>
</html>
4
  • Use an existing headless browser. This stuff is very complicated to get right, largely because web pages are almost infinitely complex. Don't re-invent the wheel (especially one which realistically is likely to have half the spokes missing). That's just my advice anyway. If you're only intending to target a specific site with a known structure then it might be simpler, of course Commented Oct 7, 2021 at 17:39
  • Anyway you could generate the URL structure with PHP just as easily as with JavaScript Commented Oct 7, 2021 at 17:41
  • The target website have the same structure on all of their pages. Without an API, is there any easier way to display the info on my website? How would the this string look like if i added the PHP code? Commented Oct 8, 2021 at 4:24
  • Well you can get the path of the request easily in PHP, see stackoverflow.com/a/16198831/5947043 . Then you can append that to the URL. No need for JavaScript. Commented Oct 8, 2021 at 7:56

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.