1

I have the following query:

select "houses"."id", 
"houses"."uuid", 
COUNT(1) OVER() as full_count from "houses" 
CROSS JOIN LATERAL jsonb_array_elements(houses.types) house_types 
inner join "hoods" on "hoods"."id" = "houses"."hood_id" and "hoods"."owner_id" = 2 
inner join "groups" on "groups"."hood_id" = "hoods"."id" and "groups"."manager_id" = 54 
where house_types->>'type' = 'big' 
group by "houses"."id", "houses"."uuid" 
order by lower(houses.name) asc 
limit 20

Which properly gives me the first 20 houses which has a type 'big' that are in the hood which owner_id is 2 AND which hood has an associated group which manager is 54.

Now, the problem is that sometimes I will have houses which name will be the same, and I want to just keep one of those and removing the rest. So for example:

If my houses table looks like:

id, types, name
1, [{ type: 'rating' }], 'white house'
2, [{ type: 'rating' }], 'white house'
3, [{ type: 'rating' }], 'red house'

I would just get the rows with id 1 and 3.

What is a good way to do that in PostgreSQL assuming that I can have both offsets and limits applied to the query, and I want to remove the duplicates.

1 Answer 1

1

Instead of group by, use distinct on:

select distinct on (lower(h.name)) h.id, h.uuid
      COUNT(*) OVER() as full_count
from houses h cross join lateral
     jsonb_array_elements(h.types) ht inner join
     "hoods" ho
     on ho.id = h.hood_id and
        ho.owner_id = 2 inner join
     "groups" g
     on g.hood_id = ho.id and
        g.manager_id = 54 
where house_types->>'type' = 'big' 
order by lower(houses.name) asc  
limit 20;

Edit:

select h.*, count(*) over ()  as full_count
from (select distinct on (lower(h.name)) h.id, h.uuid
      from houses h cross join lateral
           jsonb_array_elements(h.types) ht inner join
           "hoods" ho
           on ho.id = h.hood_id and
              ho.owner_id = 2 inner join
           "groups" g
           on g.hood_id = ho.id and
              g.manager_id = 54 
      where house_types->>'type' = 'big' 
      order by lower(houses.name) asc  
     ) h
limit 20
Sign up to request clarification or add additional context in comments.

7 Comments

Thanks Gordon. Why do you need both lower(h.name) and h.id and houses.uuid in the DISTINCT ON, if we just want them to be distinct by name?
@HommerSmith . . . I think you are right. That is not needed.
Gordon, there is a problem with this approach. Even though the distinct ON works, the COUNT(*) OVER() is still counting the removed rows. How can I have both distinct ON and count over all the results using the window function so I can properly know all the potential unique rows that exist besides the limit?
@HommerSmith . . . I would recommend a subquery with distinct on to reduce the number of rows. Then use the count() window function.
What do you mean a subquery with DISTINCT ON? I was thinking on actually having a subquery that would do the COUNT. Do you mind expanding your answer?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.