1

When querying a non-partitioned table, the query optimizer can leverage indexes for sorting and limit the data read based on the LIMIT clause. For example, in a non-partitioned table my_table with a single-column index idx_a on column a, the following

SELECT *  
FROM my_table  
ORDER BY a DESC  
LIMIT 100;

This query can scan the idx_a index from the end and stop after reading 100 rows, regardless of the total number of rows in my_table.

Now, consider a partitioned table my_partition_table partitioned by the primary key. Suppose the same query is run:

SELECT *  
FROM my_partition_table  
ORDER BY a DESC  
LIMIT 100;

In this case, the query does not use filesort, as confirmed by the EXPLAIN plan. However, since the table is partitioned by the primary key, column a spans across all partitions.

How does MySQL handle sorting in this situation? Specifically, how does it retrieve and merge the data from all partitions to produce the sorted result for column a efficiently?

1
  • I don't know about MySQL but if you have N partitions and each of them has its own index on a then a merge join algorithm would be able to produce a globally sorted result. What does the EXPLAIN say? Commented Nov 30, 2024 at 8:01

1 Answer 1

3

You have found one of many ways that PARTITIONing (in MySQL) is no better, possibly worse than non-partitioning.

  • Partitioning by the PRIMARY KEY is never(?) better then the equivalent non-partitioned table.

  • Partitioning by the PK, then looking up by the PK (either point query or range query) will do "partition pruning". But such pruning is no faster than using the BTree.

  • Partition by PK, then looking up by a secondary index -- all partitions need to be looked at. Then, depending on other things, it will probably still have to gather and sort the found rows. (I doubt if it is smart enough to merge; I have not heard that it will do the partitions in parallel.)

  • If the WHERE clause needs a 2-dimensional index, partitioning can sort of provide such -- pruning for one of the dimensions, then one ore more index look ups for the other dimension.

  • In general, the only way to hope to get any benefit if by having the "partition key" be something other than the PK or any other index.

  • I see no case in which partitioning will speed up your query (ORDER BY a LIMIT 100 with INDEX(a)). An equivalent non partitioned will:

    find the first `a` in the index
    reach into the data for the corresponding row
    move on to the next `a` and get its row
    stop at 100.
    

More discussion: Partition

Sign up to request clarification or add additional context in comments.

1 Comment

If the table has a primary key, you can't make the partition key be independent of that primary key. The partition key column(s) must be included in all primary or unique keys of the table.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.