2

I have trouble understanding how string comparisons work in MySQL. (I am using MySQL Community Server version 8.0.14)

I have an elmah_error table and a query that filters rows based on an "application" column. "application" is defined as a varchar(60) column with a collation of utf8_general_ci

If I run the query below, I get the following results:

SELECT application, COUNT(*) FROM elmah.elmah_error GROUP BY application;

/LM/W3SVC/3/ROOT    3330
/LM/W3SVC/4/ROOT    350

This indicates that there are only two distinct values for application, with a total row count of 3,680. Just in case, I've double checked this with the following queries.

SELECT DISTINCT application FROM elmah.elmah_error;

/LM/W3SVC/3/ROOT
/LM/W3SVC/4/ROOT

and

SELECT COUNT(*) FROM elmah.elmah_error;

3680

However, if I run the following queries, I do not get the results I expect.

SELECT COUNT(*) FROM elmah.elmah_error 
where application = '/LM/W3SVC/3/ROOT';

984


SELECT COUNT(*) FROM elmah.elmah_error 
where application <> '/LM/W3SVC/3/ROOT';

350

I would expect the first query to return 3,330, and the two queries to add up to 3,680, but it does not.

However, if I were to run any of the following queries, I get the expected results.

SELECT COUNT(*) FROM elmah.elmah_error 
WHERE UPPER(application) = '/LM/W3SVC/3/ROOT';

SELECT COUNT(*) FROM elmah.elmah_error 
WHERE TRIM(application) = '/LM/W3SVC/3/ROOT';

SELECT COUNT(*) FROM elmah.elmah_error 
WHERE application LIKE '%/LM/W3SVC/3/ROOT';

3330

Based on the trim and like variants working, I initially suspected that there might be a hidden character before the '/LM'

However, SELECT DISTINCT LENGTH(application) FROM elmah.elmah_error indicates that all values have a length of 16. This, and the results of the GROUP BY application and DISTINCT application queries seem to suggest invisible characters are probably not the case.

Could anyone please shed light on what's going on here?

8
  • Can you try SELECT BINARY application, COUNT(*) FROM elmah.elmah_error GROUP BY BINARY application; ? Commented May 26, 2021 at 7:52
  • I get the same two rows (count 3,330 and 350) although this time application is returned as a blob. According to the MySQL Workbench value viewer, the blob contents are 2f-4c-4d-2f-57-33-53-56-43-2f-33-2f-52-4f-4f-54 and 2f-4c-4d-2f-57-33-53-56-43-2f-34-2f-52-4f-4f-54, respectively. Commented May 26, 2021 at 7:56
  • and SELECT COUNT(*) FROM elmah.elmah_error where BINARY application = '/LM/W3SVC/3/ROOT'; - or SELECT COUNT(*) FROM elmah.elmah_error where BINARY application = '/LM/W3SVC/4/ROOT'; ? I see no reason those should differ from the GROUP BY count. Commented May 26, 2021 at 8:14
  • The first returns 3,330, the latter returns 350, so adding BINARY to the left side seems to return the expected results. Just tried SELECT COUNT(*) FROM elmah.elmah_error WHERE application = CONVERT(CAST('/LM/W3SVC/3/ROOT' as BINARY) USING utf8mb4) and that works as expected too, although changing USING utf8mb4 to USING utf8mb3 causes it to return 984 instead. Commented May 27, 2021 at 1:47
  • Strangely, casting the column side works regardless of whether it's using utf8mb4 or utfmb3, but casting the right side only gives the expected results when using utf8mb4. (SELECT COUNT(*) FROM elmah.elmah_error WHERE CONVERT(CAST('/LM/W3SVC/3/ROOT' as BINARY) USING utf8mb3) = '/LM/W3SVC/3/ROOT'; ) Commented May 27, 2021 at 1:49

1 Answer 1

1

I speculate that perhaps your table is using a case insensitive collation. As a result, the value /LM/W3SVC/3/ROOT might have other variants in which not all characters are uppercase. To test this, try your first aggregation query a case-sensitive collation:

SELECT application COLLATE SQL_Latin1_General_CP1_CS_AS, COUNT(*) AS cnt
FROM elmah.elmah_error
GROUP BY application COLLATE SQL_Latin1_General_CP1_CS_AS;
Sign up to request clarification or add additional context in comments.

2 Comments

Does this account for the value of 350 when searching for application <> '/LM/W3SVC/3/ROOT'?
I tried the first aggregation query again using utf8_bin but got the same results. (I'm using MySQL rather than MSSQL)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.