1

I am developing an application in PHP for which I need to implement a big file handler. Reading and writing the file is not a problem, but checking the content of the file is a problem.

I built a recursive function which checks whether or not a variable is already used in the same document.

private function val_id($id){
    if(!isset($this->id)){
            $this->id = array();
        }
    if(in_array($id, $this->id)){
        return $this->val_id($id+1);
    }else{
        $this->id[] = $id;
        return $id;
    }
}

When in_array($id,$this->id) returns FALSE, the $id will be added to $this->id (array which contains all used ids) and returns a valid id.

When this returns TRUE, it returns the same function with parameter $id+1

Since we are talking about over 300000 records a time, PHP won't not to be able to store such big arrays. It seems to quit writing lines in the documents I generate when this array gets too big. But I don't receive any error messages like that.

Since the generated documents are SQL files with multiple rows INSERT another solution could be to check if the id already exists in the database. Can MySQL catch these exceptions and try these entries again with adding 1 to id? How?

How do you think I need to solve this problem?

Kind regards,

Wouter

4
  • If you are searching for a more compact way for PHP to store arrays, then check out PHP Judy. The results are impressive space-wise, however it appears to be twice slower than regular array implementation (to fill it in). Commented Jan 15, 2013 at 15:20
  • could you show some more of your code? where are you writing to the file? it's hard to tell what you're trying to do here - i have a feeling what you're trying to do could be solved by using mysql's auto_increment. Commented Jan 15, 2013 at 15:20
  • I just read something, maybe it can help you: blog.webspecies.co.uk/2011-05-31/lazy-evaluation-with-php.html Commented Jan 15, 2013 at 15:21
  • 2
    30K entries in an array is nothing, you've likely got some other issue going on. Commented Jan 15, 2013 at 15:23

3 Answers 3

1
  1. make error messages to appear.
  2. increase memory_limit
  3. instead of values store the parameter in the key - so you'll be able to use isset($array[$this->id]) instead of in_array()
Sign up to request clarification or add additional context in comments.

3 Comments

looking for the key is O(ln(n)), because the key is stored in a hashmap, in_array needs O(n) (looking in the whole array) just to explain
@FabianBlechschmidt - why don't you finish your comment explaining the reasoning behind what you posted? This way it looks like e-peen slapping without actual intent to help anyone.
sorry, too much things to do, just spend a few seconds to write it down, but you are right. O is a function in the theoretical computer science en.wikipedia.org/wiki/Big_O_notation O(n) means, you have to touch every key<->value pair in your array one time to achieve what you want (find a value), O(ln(n)) means you only have to touch ln(n) values, means you just have to went down a tree and can therefore save a lot of time, touching keys you don't need on your way to the searched one.
0

Use INSERT IGNORE to disable duplicate key check in mysql and remove your key check in php. Your statement could look like this.

INSERT IGNORE INTO tbl_name SET key1 = 1, col1 = 'value1'

If you want to add 1 to the id always you could use ON DUPLICATE KEY to increment your key by one:

INSERT INTO table (a,b,c) VALUES (1,2,3)
    ON DUPLICATE KEY UPDATE c=c+1;

Comments

0

Why should 30.000 records be a problem? Each record in a standard PHP array takes 144 bytes, for 30.000 that would mean 4218,75 kByte. No big deal.

Otherwise, Your Common Sense's idea with the array-key is worth a thought, because it's faster.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.