0

I am a beginner in scraping and I am using PHP simple_html_dom to scrape data from a website. My current code is not displaying any results. Maybe I don't target proper html tag. Second thing is I need that if there are no results for searched query that code displays message: "Results not found" or something like that. Any help is appreciated.

Here are sample queries:

3lnhl2gc9br764854 1J4FF28SXXL550156

  <?php 



require "simple_html_dom.php";

$trazi=$_POST['trazi'];

  $url="http://lookupvin.com/check/";



$ch = curl_init();

curl_setopt($ch, CURLOPT_URL,$url);

curl_setopt($ch, CURLOPT_POST, 1);

curl_setopt($ch, CURLOPT_POSTFIELDS,

            "VIN=$trazi");

curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

$server_output = curl_exec ($ch);

curl_close($ch);

    $html = str_get_html($server_output);

    foreach($html->find('p.nmar') as $element)

  echo $element->innerText();


?>

index.php
<form action="vin.php" method="POST">
    <input type="text" name="trazi">
    <input type="submit">
</form>
7
  • what is the POST trazi here? Commented Jan 27, 2016 at 17:28
  • sorry, updated the script Commented Jan 27, 2016 at 17:31
  • Can you figure it out? Commented Jan 27, 2016 at 17:40
  • what kind of value you input in trazi. post an example of it. Commented Jan 27, 2016 at 17:43
  • You can take this one for example: 1J4FF28SXXL550156 Commented Jan 27, 2016 at 17:43

2 Answers 2

1
include "simple_html_dom.php";
    $trazi="1J4FF28SXXL550157";
    $url="http://lookupvin.com/check/";
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL,$url);
    curl_setopt($ch, CURLOPT_POST, 1);
    curl_setopt($ch, CURLOPT_POSTFIELDS, "VIN=$trazi");
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $server_output = curl_exec ($ch);
    curl_close($ch);

$html = new simple_html_dom();
$html->load($server_output);
$items = $html->find('.nmar');

if(count($items)!=0) {
    foreach($items as $post) {
        echo $post->children(0);
        echo "<br>";
    }
}
else {
    echo "Wrong Input";
}

found a better class online for better html parsing. http://code.tutsplus.com/tutorials/html-parsing-and-screen-scraping-with-the-simple-html-dom-library--net-11856 you can download it from here.

the result what i received. enter image description here

Sign up to request clarification or add additional context in comments.

13 Comments

i've removed it and it still doesn't show anything. can you please check if html tag is properly set?
actually server returns some error.. if user triggers the same url from same ip.. something like this.. ERROR: Too many attempts. Please contact us in order to get access to the VIN decoder API.
Ok let's try another site where i have been actually able to pull results but i haven't managed to figure out how to print not found message if there is no reord for that query. I've updated the script.
Can you figure it out?
yup trying it.. but the new site excepts post requests. trying with those aspects now..
|
0

Try removing CURLOPT_POST and CURLOPT_POSTFIELDS parameters. The server responses to GET request, no need to do POST. you pass the parameter properly as part of the query string. Also check out http://php.net/manual/en/function.curl-error.php, this may help you check if your request was successful.

10 Comments

are you getting any response whatsoever? print the $server_output, check the error codes like this if(curl_exec($ch) === false) { echo 'Curl error: ' . curl_error($ch); }
It seems that html tag is not properl set
let's ensure your selector works. instead of echo $element->innerText(); do smth like echo 'found'. if it works it should display 'found' several times. maybe 'table' element itself has no innertext, so you need to loop through its children.
it still prints out entire page. Can you run script on your machine to check it out? You need index.php to accpet $trazi and vin.php to preform search and print results.
` foreach($html->find('table.table-striped') as $element) { foreach ($element->find('td') as $row) { echo $row->innerText(); } break; } ` sorry, not sure how to format. as I suspected, the table itself has no contents, only its children do. the break is because there are 4 tables with same class, you only need first.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.