0

I have a PHP script that pulls down a bunch of RSS feeds. To prevent overloading the publishers' servers, I use the PHP Sleep function to slow things down.

The entire script could last for a couple of hours.

If I run this from a Cron job on GoDaddy, it will happily work for 5 - 10 minutes and then return a server error. I checked and the PHP maximum execution time is 30 seconds, so I'm not sure if this is the cause of the problem.

If I run the job on my Mac, my local PHP also has a default maximum execution time of 30 seconds, but this script does work if I run it from the terminal, but I don't understand why.

How do I loop a script that will exceed 30 seconds without running into unreliability problems?

Help appreciated.

1 Answer 1

2

Short answer is use set_time_limit(0) to allow for a long-running script. Your terminal (CLI) PHP probably has it set to 0. You could also be running out of memory, especially on PHP 5.2 or older. Log all errors to a file, and inspect it.

You could rewrite your program to be able to work on a subset of the data during one run. The benefit of that approach is you could use it to run 24/7 or to run every five minutes, depending on what the PHP environment supports. You could also run multiple instances at a time, each working on their own data.

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks Matt. Trouble is GoDaddy (and I understand most other hosts, don't allow changes to time_limit.) Your idea on subset is a good one as each RSS download is very quick. But how to I create a schedule to download one at time over, say, two hours? Do I have to create hundreds of Cron jobs?
@Jeremy Maybe you can keep track of whats been downloaded and whats left in a database, and run your script at fixed interval and then let the script decide which feeds to download based on their previous download time. Just an idea, though might seem a bit vague :)
@Jeremy, a simple approach is to put a list in the database. Add a status/pid column. On start, UPDATE job SET pid=$pid WHERE status IS NULL LIMIT 50. Then select those with a matching pid and update their status when complete. Run that every X minutes. You don't really want overlapping jobs, although it won't hurt as long as you are keeping track of pid (or job number, etc). More robust solutions exist, but this is simple and effective.
Intriguing idea. Thanks Matt. I'll try that :)
Note that query is missing something like SET status='pending' to prevent concurrent scripts from claiming the same records.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.