0

I have this bundles table and each bundles can have between 0 and n items from the items table, this is my query to get the median amount of items

SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY r.count) AS median
FROM (
    SELECT bundle.id, COUNT(*)
    FROM items
    JOIN bundle ON bundle.id=items.bundle_id
    GROUP BY bundle.id
) AS r;

But this query takes too long and times out, what would be the best way to split this query into batches using postgreSQL?

1

1 Answer 1

1

I think the only sensible way of splitting the query is to create a temporary table with the result of select bundle_id,count(*) ct from items (adding the bundle table in that first query gains you nothing), and then possibly create an index on ct.

If you did it in batches, what would you measure, and how would you come up with your result? For example, if you have two groups: 1,2,8,9,10 and 1,2,2,2,3 then the median of the first group is 8 and the median of the second group is 2. The median of the whole group is 2.

Sign up to request clarification or add additional context in comments.

1 Comment

I figured this was the answer, and that is exactly what I need, given groups 1,2,8,9,10 and 1,2,2,2,3 , them combined and sorted => 1,1,2,2,2,2,3,8,9,10 and the median would be 1,1,2,2,2,2,3,8,9,10

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.