1
SELECT SQL_NO_CACHE link.stop, stop.common_name, locality.name, stop.bearing, stop.latitude, stop.longitude
FROM service
JOIN pattern ON pattern.service = service.code
JOIN link ON link.section = pattern.section
JOIN naptan.stop ON stop.atco_code = link.stop
JOIN naptan.locality ON locality.code = stop.nptg_locality_ref
GROUP BY link.stop

The above query takes roughly 800ms - 1000ms to run.

If I append a group_concat statement the query then takes 8 - 10 seconds:

SELECT SQL_NO_CACHE link.stop, link.stop, stop.common_name, locality.name, stop.bearing, stop.latitude, stop.longitude, group_concat(service.line) lines

How can I change this query so that it runs in less than 2 seconds with the group_concat statement?

SQL Fiddle: http://sqlfiddle.com/#!9/414fe

EXPLAIN statements for both queries: https://i.sstatic.net/lN176.png

10
  • Can you please post the result of EXPLAIN for this query? (I noticed that one of your tables is in MyISAM and has a geospatial index.) You might want to read this: use-the-index-luke.com Commented Mar 20, 2015 at 12:48
  • @OllieJones Hi Ollie, I have added a EXPLAIN to the question, apologies for not mentioning the MyISAM table Commented Mar 20, 2015 at 12:50
  • To my mind, and to a rough approximation, there is no problem in SQL for which GROUP_CONCAT is the solution. Commented Mar 20, 2015 at 12:51
  • 1
    When you're optimizing a GROUP BY query, all the columns of the result set are relevant. That's because compound indexes can make a big performance difference. Commented Mar 20, 2015 at 12:55
  • 1
    @OllieJones Based on that information I have made sure that all of the selected fields are now in the query Commented Mar 20, 2015 at 12:59

2 Answers 2

3

How long does this query take?

SELECT p.section, GROUP_CONCAT(s.line)
FROM pattern p join
     service s
     ON p.service = s.code
GROUP BY p.section

I am thinking that you can do the group_concat() in a subquery, so the outer query does not need an aggregation. This can speed queries when there is one table in the subquery. In your case, there are two.

The final results would be something like:

link.section = pattern.section

SELECT SQL_NO_CACHE . . .,
       (SELECT GROUP_CONCAT(s.line)
        FROM pattern p join
             service s
             ON p.service = s.code
        WHERE p.section = link.section
       ) as lines
FROM link JOIN
     naptan.stop
     ON stop.atco_code = link.stop JOIN
     naptan.locality
     ON locality.code = stop.nptg_locality_ref;

For this query, you want the following additional indexes: pattern(section, service) and service(code, line).

I don't know if this will work, but it is worth a try.

Note: this is assuming that you really don't need the group by for the rest of the columns.

Sign up to request clarification or add additional context in comments.

2 Comments

The first query you put takes 0.064 seconds. The query in total takes around 1.5 seconds on average! Thanks very much :-)
@jskidd3 . . . Cool. I don't like the fact that MySQL will not use indexes efficiently for group by (perhaps in some future release). I wish it implemented your original version as if written like above, but I'm happy that this works for you.
1

A remark: You're using the nonstandard MySQL extension to GROUP BY. It happens to work for you because link.stop is joined to stop.atco_code, which itself is a primary key. But you need to be very careful with this.

I suggest you add some compound indexes. You join in to pattern on service and join out based on section. So add this index.

ALTER TABLE pattern ADD INDEX service_section (service, section, line);

This will let the query use just the index, and not have to hit the table itself to retrieve the information needed for the JOIN or your GROUP_CONCAT() operation. (You might also delete the index on just service, this new index makes it redundant).

Similarly, you want to create an index (section, stop) on the link table, and get rid of the index on just section.

On stop, you're using most of the columns, and you already have an index (PK) on atco_code, so let this one be.

Finally, on locality put an index on (code,name).

All this indexing monkey business should cut down the amount of work MySQL must do to satisfy your query.

Now look, as soon as you add WHERE anything = anything to the query, you may need to add a column to one or more of these indexes. You definitely should read up on multi-column indexing and grouping; good indexing is a critical success factor for your kind of data.

You should also run ANALYZE TABLE xxxx on each of your tables after inserting lots of rows, to make sure the query optimizer can see appropriate information about the content of the table and indexes.

1 Comment

Thanks for your answer Ollie. I will read up on what you have suggested. Regarding your suggestions, the query now takes roughly 10 seconds to execute and 1.5 seconds to fetch. For what it's worth, there are 4,000 rows being returned. I have created another SQL Fiddle to show the changes: sqlfiddle.com/#!9/e0db4

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.