PHP and curl result for screen scraping

Question

I am looking to get the exact list of a url that has a list of items to store in a database and use it after. The thing is that I get only the first item of this. I want to have the list of this page and then go to page 2, then 3 then 4 ... and scrape all the links if possible.

I want to get the http:..............html of the post and the title, then go to the next page and get all the pages and so on and store them in database.

Here is the code I used:

$url ='http://newyork.craigslist.org/search/jjj?addFour=part-time';

$timeout = 10; 
$ch = curl_init($url); 

curl_setopt($ch, CURLOPT_FRESH_CONNECT, true); 
curl_setopt($ch, CURLOPT_TIMEOUT, $timeout); 
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);


  $data = curl_exec($ch);
  curl_close($ch);

 function get_matched($pattern,$data)
 {
 preg_match($pattern,$data,$match);
 return $match[1];
  }

  $pattern= "/<p>(.*?)<\/p>/";
  $caty= get_matched($pattern,$data);


 echo "$caty";

How can I do this?

Community · Accepted Answer · 2017-05-23 11:48:06Z

1

Wrong use of preg_*

preg_match will only try to find one match, and then return - you are looking for preg_match_all since you'd want more than one match.
- PHP: preg_match - Manual
- PHP: preg_match_all - Manual
Where is the loop/recursion?

If you'd like to do this right you'll need some sort of loop or recursive function to keep fetching data from the new links found, and the data there should be fetch following the same pattern.

There are many resources online for how to write a simple scraper, among them are:
- How do I make a simple web-crawler in PHP?
- Build a basic web crawler to pull information off a page

edited May 23, 2017 at 11:48

CommunityBot

11 silver badge

answered Dec 24, 2011 at 7:53

Filip Roséen

64.2k20 gold badges154 silver badges201 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

samanta Over a year ago

if i use the preg_match_all like this way function get_matched($pattern,$data) { preg_match_all($pattern,$data,$match); return $match[1]; } it gives me array as echo ?!!! not the items

samanta Over a year ago

thanks for the starting , will try your advice buddy and let you know !

Filip Roséen Over a year ago

@samanta you wanted links to the manual, they are in the post. If you want to find more than one item you will get them back as an array, and you'll need to iterate this to get the values. Haven't worked with arrays before? php.net/manual/en/language.types.array.php

Filip Roséen Over a year ago

@samanta foreach ($match as $val) {echo $val[1];} should be sufficient, try it out and then try to understand it using the links provided.

samanta Over a year ago

do you know what is the regex to get http;

|

Community · Accepted Answer · 2020-06-20 09:12:55Z

0

this is the best link:

http://php.net/manual/en/book.curl.php

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Dec 24, 2011 at 7:37

xkeshav

54.2k47 gold badges183 silver badges256 bronze badges

4 Comments

samanta Over a year ago

i would like something more clear , i have been triyng to do this since a week in my part time , but without any success

samanta Over a year ago

i'm not having error but the result is only 1 item from what i try to scrape , i want the whole page it s about 100 ithem / page then i want to go to the second page and do the same and so !

samanta Over a year ago

some advice on how i should echo the result ?

xkeshav Over a year ago

do : echo "<pre>"; print_r($data); and see what is coming??

Collectives™ on Stack Overflow

PHP and curl result for screen scraping

2 Answers 2

6 Comments

http://php.net/manual/en/book.curl.php

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related