0

I'm having problems debugging a failing mysql 5.1 insert under PHP 5.3.4. I can't seem to see anything in the mysql error log or php error logs.

Based on a Yahoo presentation on efficient pagination, I was adding order numbers to posters on my site (order rank, not order sales).

I wrote a quick test app and asked it to create the order numbers on one category. There are 32,233 rows in that category and each and very time I run it I get 23,304 rows updated. Each and every time. I've increased memory usage, I've put ini setting in the script, I've run it from the PHP CLI and PHP-FPM. Each time it doesn't get past 23,304 rows updated.

Here's my script, which I've added massive timeouts to.

include 'common.inc'; //database connection stuff
ini_set("memory_limit","300M");
ini_set("max_execution_time","3600");
ini_set('mysql.connect_timeout','3600');
ini_set('mysql.trace_mode','On');
ini_set('max_input_time','3600');
$sql1="SELECT apcatnum FROM poster_categories_inno LIMIT 1";
$result1 = mysql_query($sql1);
while ($cats = mysql_fetch_array ($result1)) {
$sql2="SELECT poster_data_inno.apnumber,poster_data_inno.aptitle FROM poster_prodcat_inno, poster_data_inno WHERE poster_prodcat_inno.apcatnum ='$cats[apcatnum]' AND poster_data_inno.apnumber = poster_prodcat_inno.apnumber ORDER BY aptitle ASC";
$result2 = mysql_query($sql2);
$ordernum=1;
while ($order = mysql_fetch_array ($result2)) {
$sql3="UPDATE poster_prodcat_inno SET catorder='$ordernum' WHERE apnumber='$order[apnumber]' AND apcatnum='$cats[apcatnum]'";
$result3 = mysql_query($sql3);
$ordernum++;
} // end of 2nd while
}

I'm at a head-scratching loss. Just did a test on a smaller category and only 13,199 out of 17,662 rows were updated. For the two experiments only 72-74% of the rows are getting updated.

6
  • You don't seem to be using neither error_reporting nor mysql_error() ? Commented Dec 24, 2010 at 14:31
  • Are there some rows with invalid apcatnum values (ones not in the categories table)? Also, you're limiting the first query to 1, so there's no need for your first loop. But I think you want more than 1, so why not adjust your limit...?? Commented Dec 24, 2010 at 14:36
  • it's only limited to 1 because there are several thousand categories and I just wanted to initially test one. The limit will be stripped off for production. Commented Dec 24, 2010 at 14:43
  • 1
    Ahh, fair enough. Are your sure both tables (prodcat and data) have all the corresponding rows as well (since you're doing a full join on them)? Commented Dec 24, 2010 at 14:48
  • @ircmaxell I think you hit the nail squarely on the head. I checked a couple of the apnumbers (item#) that didn't get updated and they were listed in poster_prodcat_inno (category & item #) but not the list of items (poster_data_inno). The datafeed's coming from an outside company. So I guess the question becomes hot to strip the apnumbers in poster_prodcat_inno that don't exist in poster_data_inno. Commented Dec 24, 2010 at 14:52

2 Answers 2

1

I'd say your problem lies with your 2nd query. Have you done an EXPLAIN on it? Because of the ORDER BY clause a filesort will be required. If you don't have appropriate indices that can slow things down further. Try this syntax and sub in a valid integer for your apcatnum variable during testing.

SELECT d.apnumber, d.aptitle 
FROM poster_prodcat_inno p JOIN poster_data_inno d
  ON poster_data_inno.apnumber = poster_prodcat_inno.apnumber 
WHERE p.apcatnum ='{$cats['apcatnum']}' 
ORDER BY aptitle ASC; 

Secondly, since catorder is just an integer version of the combination of apcatnum and aptitle, it's a denormalization for convenience sake. This isn't necessarily bad, but it does mean that you have to update it every time you add a new title or category. Perhaps it might be better to partition your poster_prodcat_inno table by apcatnum and just do the JOIN with poster_data_inno when you need the actually need the catorder.

Sign up to request clarification or add additional context in comments.

2 Comments

That syntax gave me an "ERROR 1054 (42S22): Unknown column 'poster_data_inno.apnumber' in 'on clause'" even though there is a poster_data_inno.apnumber. Also, yes, the catorder is a bit of denormalization as suggested in Yahoo's Percona presentation "Efficient Pagination Using MySQL" to get around the performance hit with pagination by limits in huge tables. I think my main performance goal right now is to work out how to check what's in the 17 million prodcat row combinations that is no longer in the 500,000 poster_data listing. Eeks.
just a reference, I had to rewrite the statement as "SELECT d.apnumber, d.aptitle FROM (poster_prodcat_inno p) JOIN poster_data_inno d ON (d.apnumber = p.apnumber) WHERE p.apcatnum ='{$cats['apcatnum']}' ORDER BY aptitle ASC;" to get it to work.
1

Please escape your query input, even if it does come from your own database (quotes and other characters will get you every time). Your SQL statement is incorrect because you're not using the variables correctly, please use hints, such as:

while ($order = mysql_fetch_array($result2)) {
    $order = array_filter($order, 'mysql_real_escape_string');
    $sql3 = "UPDATE poster_prodcat_inno SET catorder='$ordernum' WHERE apnumber='{$order['apnumber']}' AND apcatnum='{$cats['apcatnum']}'";
}

6 Comments

also, he should explictly JOIN ON rather than relying on ,
First, that might not be the issue since it is updating some rows (but without error checking, it could be). Second, don't do it like that. integers should be neither quoted or escaped. And when m_fetch_array returns false after the last row, the array_filter call will error out. Don't try to squeeze it all into one line...
@dnagirl: that's perfectly valid syntax. While most people like to explicitly specify their joins, it's not "wrong" not to...
@ircmaxell - You're right about array_filter, my bad, edited. Also, I don't see any considerable harm in quoting integers or filtering them.
@andre: it's not standard SQL, and can alter the behavior of how MySQL interprets the number (rounding will be different for quoted vs unquoted numbers for example). it's not "wrong", but it's also not best practice. The best practice is to cast the variable to an int (int) $number prior to appending it to the query...
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.