5

I have a php script that needs to run for quite some time.

What the script does:

  • connects to mysql
  • initiates anywhere from 100 to 100,000 cURL requests
  • each cURL request returns compact-decoded data of 1 to 2000 real estate listings - i use preg-match-all to get all the data and do one mysql insert per listing. each query never exceeds more than 1mb of data.

So there are a lot of loops, mysql inserts, and curl requests going on. php safe mode is off and I am able to successfully ini_set the max-execution-time to something ridiculous to allow my script to run all the way through.

Well, my problem is the script or apache or something is having a stroke in the middle of the script and the screen goes to the "connection to the server has been reset" screen.

Any ideas?

6 Answers 6

4

Well, disregarding the fact that attempting 100,000 cURL requests is absolutely insane, you're probably hitting the memory limit.

Try setting the memory limit to something more reasonable:

ini_set('memory_limit', '256M');

And as a side tip, don't set the execution time to something ludicrous, chances are you'll eventually find a way to hit that with a script like this. ;]

Instead, just set it to 0, it functionally equivalent to turning the execution limit off completely:

ini_set('max_execution_time', 0);
Sign up to request clarification or add additional context in comments.

4 Comments

yes, i see now that i need to increase the memory limit, but is this a bad idea?
@John: Yes and no. Don't set it higher than you need it all the time as it prevents script errors from running forever. Imagine if you turned off the script execution time limiter and memory limit and accidentally ran a script with an infinity loop! Moral of the story here: use it sparingly for situations like this where nothing else would really work short of writing it to be distributed or executed over time. By the way, I second timdev's comment about setting up a job queueing system, that really is the way to do this.
This was a better answer than mine -- I forgot you could override memory_limit using ini_set
just wanted to add that not only can you set memory_limit at runtime, you can adjust it multiple times in the same call. coupled with memory_get_peak_usage/memory_get_usage, you can actually dynamically increase your memory limit as needed throughout the execution.
3

Lots of ideas:

1) Don't do it inside an HTTP request. Write a command-line php script to drive it. You can use a web-bound script to kick it off, if necessary.

2) You should be able to set max_execution_time to zero (or call set_time_limit(0)) to ensure you don't get shut down for exceeding a time limit

3) It sounds like you really want to refactor this into a something more sane. Think about setting up a little job queueing system, and having a php script that forks several children to chew through all the work.

As Josh says, look at your error_log and see why you're being shut down right now. Try to figure out how much memory you're using -- that could be a problem. Try setting the max_execution_time to zero. Maybe that will get you where you need to be quickly.

But in the long run, it sounds like you've got way too much work to do inside of one http request. Take it out of http, and divide and conquer!

1 Comment

didn't know about that 0 trick, good to know. not sure how to go about doing this outside of having it in a php script.
1

You can set the timeout to be indefinate by modifying your PHP.ini and setting the script execution variable.

But you may also want to consider a slight architecture change. First consider a "Launch and forget" approach at getting 100,000 curl requests. Second, consider using "wget" instead of curl.

You can issue a simple "wget URL -o UniqueFileName &" This will retrieve a web page, save it to a "unique" filename and all in the background.

Then you can iterate over a directory of files, greping (preg_matching) data, and making your DB calls. Move the files as you process them to an archive and continue to iterate until there are no more files.

Think of the directory as a "queue" and have one process just process the files. Have a second process simply go out and grab web-page data. You could add a third process that can be you "monitor" which works independently and simply reports snap-shot statistics. The other two can just be "web services" with no interface.

This type of multi-threading is really powerful and greatly under-utilized IMHO. To me this is the true power of the web.

Comments

1

I had the same problem when getting data from MySQL via PHP that contained special characters like umlauts ä,ö,ü, ampersands etc. The connection was reset and I found no errors in either the apache log nor the php logs. First I made sure in PHP that I accessed the characters set on the DB correctly with:

mysql_query("SET NAMES 'latin1' COLLATE 'latin1_german2_ci'");

mysql_query("SET CHARACTER SET 'latin1'");

Then, finally, I resolved the problem with this line in PHP:

mysql_query("SET character_set_connection='latin1'");

Comments

0

100,000 cURL requests??? You are insane. Break that data up!

7 Comments

everytime the client adds a new MLS it has to get anywhere from 1000 to 10,000 listings - i can get all of the listings in about 5 cURL requests.. but i have to do 1 cURL request per listing to get the images for it.
@John: What about writing a class then which contains the functionality to retrieve one item at a time. You could loop over all the listings and instantiate the class once for each one, ensuring in the process that when the class is destroyed the cURL memory gets freed too.
@John: Basically, you just want to make sure that you're not retrieving the same data over and over, wasting cycles and bandwidth in the process. By setting up a job queue of some description, and storing each retrieved page in a database, you can prevent this easily.
That would make the script run even longer because right now it does one cURL request to login and then it does all the cURL requests in a loop then it does a cURL request to logout. so instead of login (loop 20 times) logout //22 curl requests it's going to do 20*3 //60 curl requests - your suggestion would defintely help with the memory problem though :( there's got to be a way to free up the part of the memory i don't need anymore isn't there? after it does one thing idk why php trys to remember it until the end, seems excessive.
I never retrieve the same data twice.
|
0

What's in the apache error_log? Are you reaching the memory limit?

EDIT: Looks like you are reaching your memory limit. Do you have access to PHP.ini? If so, you can raise the memory_limit there. If not, try running curl or wget binaries using the exec or shell_exec functions, that way they run as separate processes, not using PHP's memory.

7 Comments

Yes. I'm a noob sorry: Allowed memory size of 100663296 bytes exhausted (tried to allocate 2975389 bytes)
This might sound even more noobish but can't i just ob_flush/flush throughout the script or at certain parts of it?
@John: No, the buffer is only one part of the memory being used. The cURL functions use quite a bit of memory all to themselves.
Well why does it keep every request in the memory? Wouldn't it get rid of the old request in the memory once it starts a new one?
@John: No. Because cURL is an outside library, it makes it very tricky for the memory management model of PHP to dispose of correctly. Often this means that if the cURL calls are not enclosed inside a block (a class, even a function), they will not be disposed of correctly.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.