How to speed up PHP to MySQL query with large query data

Question

Before code : for loop should run at least 143,792,640,000 times and create the table at least produce 563,760 rows without duplicated I want to know how to speed up or something parallel computing like Hadoop that could accelerate between php and MySQL.

Code below:

MySQL connection

$link=mysql_connect($servername,$username,$password);
mysql_select_db($dbname);
$sql= "INSERT INTO EM (source,target) VALUES ";

for loop read data into MySQL check function if duplicate not insert and update count=count+1

for($i=0;$i<$combine_arr_size;$i++){
    for($j=0;$j<$combine_arr_size;$j++){

//below check if find the duplicated like a,b we recognize b,a is same thing

if(check($combine_words_array[$i],$combine_words_array[$j])) {
                $update_query="UPDATE EM SET count = count+1 where (source='$combine_words_array[$i]' AND target='$combine_words_array[$j]') OR (source='$combine_words_array[$j]' AND target='$combine_words_array[$i]');";
                mysql_query($update_query);
            } else {
                if (!$link) {
                    die("Connection failed: " . mysql_error());
                }

//else using insert into table () value to concatenate the string

    $sql.="('$combine_words_array[$i]','$combine_words_array[$j]'),";     
            mysql_query(substr($sql,0,-1));
            $sql= "INSERT INTO EM (source,target) VALUES ";        
        }
    }
}

read the all vector align from comebine_word_array[] to combine_word_array[]

below is check function , check if find the pair return value

function check($src, $trg) {
    $query = mysql_query("SELECT * FROM EM WHERE (source='$src' AND target='$trg') OR (source='$trg' AND target='$src');");
    if (mysql_num_rows($query) > 0) {
        return 1;
    } else {
        return 0;
    }
}

table

+--------+--------------+------+-----+---------+-------+
| Field  | Type         | Null | Key | Default | Extra |
+--------+--------------+------+-----+---------+-------+
| source | varchar(255) | YES  |     | NULL    |       |
| target | varchar(255) | YES  |     | NULL    |       |
| count  | int(11)      | NO   |     | 0       |       |
| prob   | double       | NO   |     | 0       |       |
+--------+--------------+------+-----+---------+-------+

now the php code just influence the source ,target and count

143B rows, phew! How long does this take on your production hardware presently? :-) I imagine that a lot of this could be converted to a stored procedure, and so would run a lot faster. Try that first, maybe? — halfer
– halfer, Commented Jul 21, 2015 at 7:44
Also, can you add to your question an explanation of the pseudocode of this algorithm and what it is doing? Maybe you are doing something really inefficiently, and there is a better/faster way to do it. — halfer
– halfer, Commented Jul 21, 2015 at 7:45
(Correction: 143B iterations, not rows. Still a lot of work though!) — halfer
– halfer, Commented Jul 21, 2015 at 7:51
Please also provide your mySQL scheme. You are doing lookups on quite a large set, so I do hope you have index fields. You could also consider different queries (like REPLACE) using some KEYS which would also clean up your code. Also, you treate source and target as interchangable values in your checks and updates, but not your inserts. If the fields are indeed interchangable, you could try inserting values where source and target are automatically assigned the lesser or greater value. (actually, are you dealing with a graph of some sort?) — Filou
– Filou, Commented Jul 21, 2015 at 8:48
oh, and actually before going to the SQL bit, you could preprocess the data directly in php. I mean, try to reduce the actual datasets by eliminating duplicates (which you treat as counts) by directly computing the counts and doing the combination afterwards which you also could simplify, I guess.. and knowing about the actual problem space of your data might help reduce the complexity even a bit more.. — Filou
– Filou, Commented Jul 21, 2015 at 9:01

Kickstart · Accepted Answer · 2015-07-21 09:49:16Z

It is difficult to know exactly what you want to do with duplicate combinations. For example you are getting every combination of the array, which is going to get lots of duplicates which you will then count twice.

However I would be tempted to load the words into an table (possibly a temp table) and then do a cross join of the table against itself to get every combination, and use this to do an INSERT with an on duplicate key clause.

Very crudely, something like this:-

<?php

$sql = "CREATE TEMPORARY TABLE words
        (
            word varchar(255),
            PRIMARY KEY (`word`),
        )";

$link = mysql_connect($servername,$username,$password);
mysql_select_db($dbname);
$sql = "INSERT INTO words (word) VALUES ";
$sql_parm = array();

foreach($combine_words_array AS $combine_word)
{
    $sql_parm[] = "('".mysql_real_escape_string($combine_word)."')";
    if (count($sql_parm) > 500)
    {
        mysql_query($sql.implode(',', $sql_parm));
        $sql_parm = array();
    }
}

if (count($sql_parm) > 0)
{
    mysql_query($sql.implode(',', $sql_parm));
    $sql_parm = array();
}

$sql = "INSERT INTO EM(source, target)
        SELECT w1.word, w2.word
        FROM words w1
        CROSS JOIN words w2
        ON DUPLICATE KEY UPDATE `count` = `count` + 1
        ";

mysql_query($sql);

This does rely on having a unique key covering both the source and target columns.

But whether this is an option depends on the details of the records. For example with your current code if there were 2 words (say A and B) you would find the combination A / B and the combination B / A. But both combinations would update the same records

josh.thomson · Accepted Answer · 2015-07-21 08:24:29Z

1

Put a better processor on your server and increase the RAM, then go to your php.ini settings and raise the maximum allocated memory for all the various memory/processor relative configurations.

This will empower the server further and improve the running efficiency.

If you cannot find your php.ini file. Create a new php file with the following contents and open it in the browser:

<?php phpinfo(); ?>

Make sure you delete this file after finding out where php.ini is... as an unwanted user (hacker) could find this file and it would give them detailed information leading to vulnerabilities in your server configuration.

Once you've found php.ini, do some looks online to determine settings that are not obvious and increase the memory allocations in various areas.

answered Jul 21, 2015 at 8:24

josh.thomson

9059 silver badges25 bronze badges

3 Comments

yihang hwang Over a year ago

memory_limit has been set in value -1 , so that will be no limit , but still run more than 3 months

halfer Over a year ago

"an unwanted user (hacker) could find this file [phpinfo script]" - I don't imagine such a script would be web-accessible. It should be run from the console, since the OP's script would also be console-based.

josh.thomson Over a year ago

You'll be surprised how many people leave a phpinfo.php file on their webroot by accident. I haven't ran it in the console before, but will take a look at it. Thanks for the advice.

Collectives™ on Stack Overflow

How to speed up PHP to MySQL query with large query data

2 Answers 2

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related