I have trouble understanding how string comparisons work in MySQL. (I am using MySQL Community Server version 8.0.14)
I have an elmah_error table and a query that filters rows based on an "application" column.
"application" is defined as a varchar(60) column with a collation of utf8_general_ci
If I run the query below, I get the following results:
SELECT application, COUNT(*) FROM elmah.elmah_error GROUP BY application;
/LM/W3SVC/3/ROOT 3330
/LM/W3SVC/4/ROOT 350
This indicates that there are only two distinct values for application, with a total row count of 3,680. Just in case, I've double checked this with the following queries.
SELECT DISTINCT application FROM elmah.elmah_error;
/LM/W3SVC/3/ROOT
/LM/W3SVC/4/ROOT
and
SELECT COUNT(*) FROM elmah.elmah_error;
3680
However, if I run the following queries, I do not get the results I expect.
SELECT COUNT(*) FROM elmah.elmah_error
where application = '/LM/W3SVC/3/ROOT';
984
SELECT COUNT(*) FROM elmah.elmah_error
where application <> '/LM/W3SVC/3/ROOT';
350
I would expect the first query to return 3,330, and the two queries to add up to 3,680, but it does not.
However, if I were to run any of the following queries, I get the expected results.
SELECT COUNT(*) FROM elmah.elmah_error
WHERE UPPER(application) = '/LM/W3SVC/3/ROOT';
SELECT COUNT(*) FROM elmah.elmah_error
WHERE TRIM(application) = '/LM/W3SVC/3/ROOT';
SELECT COUNT(*) FROM elmah.elmah_error
WHERE application LIKE '%/LM/W3SVC/3/ROOT';
3330
Based on the trim and like variants working, I initially suspected that there might be a hidden character before the '/LM'
However, SELECT DISTINCT LENGTH(application) FROM elmah.elmah_error indicates that all values have a length of 16. This, and the results of the GROUP BY application and DISTINCT application queries seem to suggest invisible characters are probably not the case.
Could anyone please shed light on what's going on here?
SELECT BINARY application, COUNT(*) FROM elmah.elmah_error GROUP BY BINARY application;?applicationis returned as a blob. According to the MySQL Workbench value viewer, the blob contents are 2f-4c-4d-2f-57-33-53-56-43-2f-33-2f-52-4f-4f-54 and 2f-4c-4d-2f-57-33-53-56-43-2f-34-2f-52-4f-4f-54, respectively.SELECT COUNT(*) FROM elmah.elmah_error where BINARY application = '/LM/W3SVC/3/ROOT';- orSELECT COUNT(*) FROM elmah.elmah_error where BINARY application = '/LM/W3SVC/4/ROOT';? I see no reason those should differ from theGROUP BYcount.BINARYto the left side seems to return the expected results. Just triedSELECT COUNT(*) FROM elmah.elmah_error WHERE application = CONVERT(CAST('/LM/W3SVC/3/ROOT' as BINARY) USING utf8mb4)and that works as expected too, although changingUSING utf8mb4toUSING utf8mb3causes it to return 984 instead.SELECT COUNT(*) FROM elmah.elmah_error WHERE CONVERT(CAST('/LM/W3SVC/3/ROOT' as BINARY) USING utf8mb3) = '/LM/W3SVC/3/ROOT';)