Before code : for loop should run at least 143,792,640,000 times and create the table at least produce 563,760 rows without duplicated I want to know how to speed up or something parallel computing like Hadoop that could accelerate between php and MySQL.
Code below:
MySQL connection
$link=mysql_connect($servername,$username,$password);
mysql_select_db($dbname);
$sql= "INSERT INTO EM (source,target) VALUES ";
for loop read data into MySQL check function if duplicate not insert and update count=count+1
for($i=0;$i<$combine_arr_size;$i++){
for($j=0;$j<$combine_arr_size;$j++){
//below check if find the duplicated like a,b we recognize b,a is same thing
if(check($combine_words_array[$i],$combine_words_array[$j])) {
$update_query="UPDATE EM SET count = count+1 where (source='$combine_words_array[$i]' AND target='$combine_words_array[$j]') OR (source='$combine_words_array[$j]' AND target='$combine_words_array[$i]');";
mysql_query($update_query);
} else {
if (!$link) {
die("Connection failed: " . mysql_error());
}
//else using insert into table () value to concatenate the string
$sql.="('$combine_words_array[$i]','$combine_words_array[$j]'),";
mysql_query(substr($sql,0,-1));
$sql= "INSERT INTO EM (source,target) VALUES ";
}
}
}
read the all vector align from comebine_word_array[] to combine_word_array[]
below is check function , check if find the pair return value
function check($src, $trg) {
$query = mysql_query("SELECT * FROM EM WHERE (source='$src' AND target='$trg') OR (source='$trg' AND target='$src');");
if (mysql_num_rows($query) > 0) {
return 1;
} else {
return 0;
}
}
table
+--------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+--------------+------+-----+---------+-------+
| source | varchar(255) | YES | | NULL | |
| target | varchar(255) | YES | | NULL | |
| count | int(11) | NO | | 0 | |
| prob | double | NO | | 0 | |
+--------+--------------+------+-----+---------+-------+
now the php code just influence the source ,target and count
:-)I imagine that a lot of this could be converted to a stored procedure, and so would run a lot faster. Try that first, maybe?