0

I am looking to get the exact list of a url that has a list of items to store in a database and use it after. The thing is that I get only the first item of this. I want to have the list of this page and then go to page 2, then 3 then 4 ... and scrape all the links if possible.

I want to get the http:..............html of the post and the title, then go to the next page and get all the pages and so on and store them in database.

Here is the code I used:

$url ='http://newyork.craigslist.org/search/jjj?addFour=part-time';

$timeout = 10; 
$ch = curl_init($url); 

curl_setopt($ch, CURLOPT_FRESH_CONNECT, true); 
curl_setopt($ch, CURLOPT_TIMEOUT, $timeout); 
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);


  $data = curl_exec($ch);
  curl_close($ch);

 function get_matched($pattern,$data)
 {
 preg_match($pattern,$data,$match);
 return $match[1];
  }

  $pattern= "/<p>(.*?)<\/p>/";
  $caty= get_matched($pattern,$data);


 echo "$caty";

How can I do this?

2 Answers 2

1
  1. Wrong use of preg_*

    preg_match will only try to find one match, and then return - you are looking for preg_match_all since you'd want more than one match.

  2. Where is the loop/recursion?

    If you'd like to do this right you'll need some sort of loop or recursive function to keep fetching data from the new links found, and the data there should be fetch following the same pattern.

    There are many resources online for how to write a simple scraper, among them are:

Sign up to request clarification or add additional context in comments.

6 Comments

if i use the preg_match_all like this way function get_matched($pattern,$data) { preg_match_all($pattern,$data,$match); return $match[1]; } it gives me array as echo ?!!! not the items
thanks for the starting , will try your advice buddy and let you know !
@samanta you wanted links to the manual, they are in the post. If you want to find more than one item you will get them back as an array, and you'll need to iterate this to get the values. Haven't worked with arrays before? php.net/manual/en/language.types.array.php
@samanta foreach ($match as $val) {echo $val[1];} should be sufficient, try it out and then try to understand it using the links provided.
do you know what is the regex to get http;
|
0

this is the best link:

http://php.net/manual/en/book.curl.php

4 Comments

i would like something more clear , i have been triyng to do this since a week in my part time , but without any success
i'm not having error but the result is only 1 item from what i try to scrape , i want the whole page it s about 100 ithem / page then i want to go to the second page and do the same and so !
some advice on how i should echo the result ?
do : echo "<pre>"; print_r($data); and see what is coming??

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.