0

I have a website connected to a MySQL database of songs, and am working to improve full-text search on the site. A simple query with MATCH() works as expected, but as soon as I add an aggregate function to the SELECT statement, MATCH() no longer returns relevant results.

The query I'm using is somewhat complicated, but I was able to boil it down to a minimal example:

  1. Create a table with an FTS index, and insert three songs:
CREATE TABLE `Song` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `title` varchar(300) DEFAULT NULL,
  PRIMARY KEY (`id`),
  FULLTEXT KEY `FTS_Songs` (`title`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

INSERT INTO Song (`title`)
VALUES ('The Morning Breaks'), ('The Spirit of God'), ('Now Let Us Rejoice');
  1. Run a simple query with MATCH(). This works correctly, bringing the most relevant song (“The Spirit of God”) to the top:
SELECT id, title,
  MATCH(title) AGAINST ('"the spirit of god"' IN BOOLEAN MODE) AS exactTitleRelevance,
  MATCH(title) AGAINST ('+spirit* +god*' IN BOOLEAN MODE) AS titleRelevance
FROM Song
GROUP BY id
ORDER BY exactTitleRelevance DESC, titleRelevance DESC;

Correct result: The row for "The Spirit of God" has a high relevance score, and other rows have a relevance of 0. enter image description here

  1. Add any aggregate function (my real query has a GROUP_CONCAT() to aggregate data from a joined table; in this example, I’m using COUNT() without any joins for simplicity). This does not work correctly. As soon as I add an aggregate function, MATCH() breaks and returns inaccurate results:
SELECT id, title,
  MATCH(title) AGAINST ('"the spirit of god"' IN BOOLEAN MODE) AS exactTitleRelevance,
  MATCH(title) AGAINST ('+spirit* +god*' IN BOOLEAN MODE) AS titleRelevance,
  COUNT(*) AS aggregateColumn
FROM Song
GROUP BY id
ORDER BY exactTitleRelevance DESC, titleRelevance DESC;

Incorrect result: The row for "The Spirit of God" has a relevance score of 0, and a different row that doesn't match the query has a high relevance score. enter image description here

What am I doing wrong? Is it possible to use MATCH() and aggregate functions like COUNT() or GROUP_CONCAT() together in the same query?

I’ve tried and gotten the same results on two different servers – one with MySQL 8.0.37, and the other with MySQL 8.3.0.

7
  • 3
    Why are you using aggregation on the primary key? The "groups" will by definition be exactly 1 row. Commented Sep 15, 2024 at 2:34
  • @BillKarwin, my original query (a complicated 40-line query) has JOINs and uses GROUP_CONCAT() to get aggregated values from the joined tables. I simplified the query to a minimum reproducible example using only one table for this question. The same problem occurs in my original 40-line query and in this simple example query. Commented Sep 15, 2024 at 2:56
  • 1
    This bug report appears to be the same problem: bugs.mysql.com/bug.php?id=114666. The MySQL verification team has reproduced the bug using MySQL 8.0.36, but there is no fix mentioned. I suggest you log into that bug site and click the "Affects Me" button. Commented Sep 15, 2024 at 3:04
  • what is you expected result @SamuelBradshaw ? Commented Sep 15, 2024 at 16:03
  • @ArtBindu, I expect to see "The Spirit of God" (hymn 2) have a high relevance score from the MATCH() function. Instead, the MATCH() function gives it a score of 0. Commented Sep 15, 2024 at 18:41

1 Answer 1

1

You can use EXPLAIN to check how MySQL executes the query:

EXPLAIN SELECT id, title,
  MATCH(title) AGAINST ('"the spirit of god"' IN BOOLEAN MODE) AS exactTitleRelevance,
  MATCH(title) AGAINST ('+spirit* +god*' IN BOOLEAN MODE) AS titleRelevance,
  COUNT(*) AS aggregateColumn
FROM Song
GROUP BY id
ORDER BY exactTitleRelevance DESC, titleRelevance DESC;

This will give:

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE Song NULL index PRIMARY,FTS_Songs PRIMARY 4 NULL 3 100.00 Using temporary; Using filesort

The aggregate function COUNT() along with GROUP BY changes how MySQL processes the query. MySQL uses "id" (PRIMARY) as an index instead of FTS_Songs (FULLTEXT) causing MATCH() to not work as expected.

To make sure MATCH() calculates relevance accurately:

  • Use CTE/subquery to perform the full-text search first.
  • Then apply any aggregate functions.

e.g.

WITH lookup AS (
SELECT 
  id,
  title,
  MATCH(title) AGAINST ('"the spirit of god"' IN BOOLEAN MODE) AS exactTitleRelevance,
  MATCH(title) AGAINST ('+spirit* +god*' IN BOOLEAN MODE) AS titleRelevance
FROM Song
WHERE MATCH(title) AGAINST ('"the spirit of god"' IN BOOLEAN MODE)
)
SELECT 
  l.id, 
  l.title,
  l.exactTitleRelevance,
  l.titleRelevance,
  COUNT(*) AS aggregateColumn
FROM  lookup l
GROUP BY  l.id,  l.title, l.exactTitleRelevance,  l.titleRelevance;
Sign up to request clarification or add additional context in comments.

1 Comment

Learn new thing in MySQL, EXPLAIN statement ...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.